Jennifer Ding, Senior Researcher, Turing Institute, "Building Better AI in the open" from State of Open: The UK in 2023 Phase Two, Part Two

Jennifer Ding, Senior Researcher, Turing Institute

State of Open: The UK in 2023

Phase Two “Show us the Money”

Part 2: “AI Openness”

Jennifer explores the concept of Open AI and its definitions, emphasising the alignment with FAIR principles and ethical practices. The Turing’s Tools, Practices, and Systems (TPS) research program aims to clarify the meaning of “open AI”. Examples from AI collaboratives like BigScience and EleutherAI demonstrate openness beyond major corporations, challenging the notion that only a few companies can produce top-tier models. The benefits of open AI include faster innovation, equitable data practices, the establishment of open standards for building public trust, and a more inclusive definition of “better” AI. Open AI encourages diverse participation, challenging traditional norms and fostering sustainable, beneficial AI models for a broader audience.

Thought Leadership: Building Better AI in the open

Jennifer Ding, Senior Researcher, Turing Institute

Current approaches to openness in AI build upon years of development within the Open Source, open science, and open data movements by public and private institutions. Open principles have enabled alternative pathways to emerge for how AI is produced41 compared to methods used in closed, for-profit AI labs. These approaches expand who can be part of the process and shape the values embedded in the technology. But how do we define the “open” in “open AI”?

Does open refer to un-gated access to the artefacts of machine learning, such as the data, trained weights, and model outputs? Or to the process of production – the code – and governance over decisions made by humans and machines? The latest AI advancements have obfuscated our understanding of how the technology functions and its impact on society, including what kinds of outcomes openness can catalyse.

As more projects and organisations begin to use the term “open AI” or “Open Source AI”, the Turing’s Tools, Practices and Systems (TPS)43 research programme is convening our community to produce clearer definitions so that we know what to expect from AI systems badged as ‘open’.

As part of the Open Source Initiative call for defining Open Source AI,44 we are contributing a definition of openness that aligns with our approach to AI through safe, ethical, responsible, inclusive practices that follow the FAIR principles (Findable, Accessible, Interoperable, and Reusable) which we call “open AI.”

Through AI collaboratives such as BigScience and EleutherAI, we have examples of openness beyond the sphere of major technology corporations in the US, which dominate AI narratives. While there are valid concerns about potential harms45 that unrestricted openness can lead to, we’d like to highlight four beneficial outcomes that open AI can enable.

Open AI leads to faster innovation and better performance

Open AI models like Stable Diffusion and Falcon have been shown to match or beat performance of their closed counterparts like DALLE-2 and GPT-3, challenging long held beliefs that state-of-the-art models can only be produced by a handful of private technology companies in even fewer countries. In a leaked memo, a member of Google staff has noted46 how open AI developers, distributed around the world, are proving that smaller, fine-tuned models can outperform larger models, or in their words “eating our lunch”.

We see that open AI is upending the misconception that “bigger is better” and that in order to build state-of-the-art one needs Internet-size datasets, deeper architectures, and ever longer training times. This phenomenon, embodied by the flurry of development sparked by the leaking of Meta’s LLaMA47 but also characteristic of past open releases for models like YOLO and BERT, demonstrates the pace and scale of innovation in the open AI community.

Open AI enables more equitable and empowering data practices

In 2021, Hugging Face co-organised the year-long Big Science Workshop48, a collaboration of over 1,000 volunteer researchers from around the world to create the BLOOM49 (short for BigScience Large Open-science Open-access Multilingual) Large Language Model (LLM). With its publicly documented progress, and its open invite to be part of the model licensing and data governance process, the BigScience team has distributed the power to shape, criticise and run an LLM to communities outside big tech. BigScience has also incorporated local needs through regional working groups that extend model localisation beyond just inclusion of a language to context-specific decision-making and evaluation. This transparent, collaborative approach to data governance may also be a way for AI collaboratives to avoid costly lawsuits, as seen through the examples of Stable Diffusion, Chat GPT and CoPilot.

Open Standards for AI build public trust

The confusion and hype surrounding AI releases may lead to building attention and usage for certain models, but do not support building public trust and understanding in the technology. Regulation like the EU AI Act50 and initiatives like the Turing’s AI Standards Hub51, can counter this effect by contributing clear frameworks and accessible information about what characteristics of AI systems should be expected for use in different situations, particularly for safety-critical domains and public-facing decisions.

In recognition of the need for social scaffolding to enable widespread AI adoption, groups like Responsible AI Licences have been working on new forms of AI licences that balance openness and ethical concerns, driven by similar motivations as the ethical source community52. The Open-RAIL licence53 has been adapted and adopted for open AI models like Stable Diffusion, BLOOM and StarCoder to address some of the challenges in licensing AI artefacts and maintaining the principles of openness for public transparency and access while addressing concerns about use for harmful purposes. Open-RAIL allows for the licensing of specific artefacts (e.g. Open RAIL-DS focuses on data and source code) and includes a use-based restrictions clause to specify what the prohibited uses include (e.g. illegal activity, generating misinformation). The RAIL community has worked with a range of organisations to evolve the licence to a trusted, usable form by many different kinds of actors to address concerns around flexibility, legibility, and liability.

Open AI enables more expansive and inclusive definitions of “better” AI

AI ethics experts have questioned54 over the years whether models that are biassed against certain groups or require enormous expenditures of energy to train among other social harms should be considered “good enough” to deploy or make available for general use.

Beyond the performance metrics the AI world uses to compare and rank models55, open AI introduces new ways of deciding what “better” AI looks like and who is able to have a say. When more people – users, developers, and impacted and historically marginalised groups – can get involved in defining what characteristics make a model worthy of public attention and use, we will have better definitions and versions of AI in the world.

By challenging the status quo and expanding participation to invite artists, civil servants, philosophers, students, and more people into the conversation about and production of AI, open AI leads to more sustainable, more beneficial, and better performing models for the benefit of more people. Through open AI, we see a practice of openness that goes beyond expanding access, embodied through the work of projects like The Turing Way57.

We see openness for faster innovation, for broadening participation, and as a response to addressing power imbalance, so that more people can be part of the creation process, and hence positively benefit from our AI futures.

View all Thought Leadership