Luis Villa, Founder Tidelift and Open(ish), Machine Learning News, "AI and Open: where are we now?" from State of Open: The UK in 2023 Phase Two, Part Two

Luis Villa, Founder Tidelift and Open(ish), Machine Learning News

State of Open: The UK in 2023

Phase Two “Show us the Money”

Part 2: “AI Openness”

The nexus of open and AI is rapidly evolving, marked by undefined parameters and ongoing efforts by organisations like the Open Knowledge Foundation to provide clarity. Contrary to past skepticism, the feasibility of open AI is increasing due to reduced training costs and innovative techniques. The scope extends beyond open source software, encompassing open culture, digital privacy, and antitrust considerations. While open AI presents opportunities, including diverse investment and institutional innovation, regulatory uncertainties loom. The potential for swift ML regulation, varying across regions, poses challenges, though the transparency inherent in open approaches may offer regulatory advantages. Overall, the intersection of open and AI promises a dynamic landscape with substantial opportunities, provided regulatory hurdles can be navigated.

Thought Leadership: AI and Open: where are we now?

Luis Villa, Founder Tidelift and Open(ish), Machine Learning News

I’ve been trying to closely follow the nexus of open and AI for about six months now, through my newsletter at openml.fyi.28 This is simultaneously the most exciting, and the most complex, thing to hit ‘open’ in two decades — so there has been a lot to digest! In mid-June I put together a lengthy summary of the past six months of news29, to help people analyse where open AI is and where it could go. Since that is 4,000+ words, I’m excited to pull out a few of the most important parts for this report.

Open AI is not yet fully defined

No major open organisation or organised movement has formally defined “open AI”. However, that could change soon: the UK-based Open Knowledge Foundation is working on a a definition, and the Open Source Initiative is also working on a definition31. Nevertheless, the term is being used (and abused). Many independent developers are working on AI “in the open”—by using open tools, open collaboration, and open data.

Open AI is plausible

Just a few years ago, I would have said that meaningfully open AI would never happen, because of the high cost of training any large model would make distributed development impossible. That has significantly changed.

All block quotes below are from my openml.fyi “state of open” summary.

[T]he cost of training is dropping (from ten of millions of dollars to hundred of thousand even for large models)33 and new techniques like Low-Rank Adaptation (LoRA) and QLoRA are making some types of training possible on desktop machines. Continued interest in lower-resource training from academics, hobbyists, and non-FAANG companies will likely accelerate creation of tools and techniques that enable distributed model development.

Open AI is broad

Critically, this moment is not just about Open Source Software. Open AI also implicates open culture, like Wikipedia, and many values that have traditionally paralleled, but not been formally associated with, Open Source Software, like digital privacy and antitrust law.

This has both:

Upsides: We know that public, high quality data sets can be created by volunteer communities, whether hosted by non-profits like Wikimedia and Archive or by for-profits like Reddit, Flickr, GitHub, and Stack Overflow. This gives volunteer, collaborative communities a standing that they lack in many other areas of tech policy; and
Downsides: Open data and open creative communities are totally unprepared for the trust and safety burden that has been thrust on them by their use in training. Compare how complex Wikipedia’s trust and safety efforts are, compared to how non-existent LAION or C4’s trust and safety efforts are. Similarly, many proprietary ML models are moderated by large, expensive, traumatised teams in places like Kenya.”34 ‘Open’ has no equivalent, or alternative, approach at this time, which may contribute to bias issues.35

Open AI is going to be regulated

The biggest unknown for open AI is the nature of regulation. Because the work of understanding and influencing policy is specialised and expensive, open communities are at a significant disadvantage in any regulatory endeavour, as has been demonstrated recently in the UK, demonstrated recently in the UK by Signal’s threat to leave the UK in response to the proposed “Online Safety Bill.”

There are other challenges as well: ML regulation and judicial decisions will not be consistent from country to country. This is bad for open, which benefits from globally-sized communities. If communities need one open model for the US, another for Europe, another for China, etc., then many of the collaborative benefits of open will be lost.

ML regulation will move surprisingly quickly, with the EU already having voted on in-depth proposals on AI liability.

Perhaps the biggest upside is open’s transparency, which could give it a regulatory advantage: Transparency is a big emphasis of many ML regulatory proposals, particularly around training data and techniques. Open-native models and approaches are much better positioned to meet transparency requirements than closed models.

Open AI has a tonne of opportunities

The good news is that, unless regulation completely quashes it, Open AI has a tonne of opportunities at all levels. I expect we’ll see vast amounts of investment in open AI, led by the model of London-based Stability.ai.37 We’ll also see innovation in the institutional space, with the distinct possibility of a European non-profit aggregating many data sources or models, to increase accountability—and reduce developer liability.

So interesting times are ahead!

View all Thought Leadership