Public AI, Built On Open Source, Is The Way Forward In The EU
from the open-source-is-the-way dept
Aquarter of a century ago, I wrote a book called “Rebel Code”. It was the first – and is still the only – detailed history of the origins and rise of free software and open source, based on interviews with the gifted and generous hackers who took part. Back then, it was clear that open source represented a powerful alternative to the traditional proprietary approach to software development and distribution. But few could have predicted how completely open source would come to dominate computing. Alongside its role in running every aspect of the Internet, and powering most mobile phones in the form of Android, it has been embraced by startups for its unbeatable combination of power, reliability and low cost. It’s also a natural fit for cloud computing because of its ability to scale. It is no coincidence that for the last ten years, pretty much 100% of the world’s top 500 supercomputers have all run an operating system based on the open source Linux.
More recently, many leading AI systems have been released as open source. That raises the important question of what exactly “open source” means in the context of generative AI software, which involves much more than just code. The Open Source Initiative, which drew up the original definition of open source, has extended this work with its Open Source AI Definition. It is noteworthy that the EU has explicitly recognized the special role of open source in the field of AI. In the EU’s recent Artificial Intelligence Act, open source AI systems are exempt from the potentially onerous obligation to draw up a range of documentation that is generally required.
That could provide a major incentive for AI developers in the EU to take the open source route. European academic researchers working in this area are probably already doing that, not least for reasons of cost. Paul Keller points out in a blog post that another piece of EU legislation, the 2019 Copyright in the Digital Single Market Directive (CDSM), offers a further reason for research institutions to release their work as open source:
Article 3 of the CDSM Directive enables these institutions to text and data-mine all “works or other subject matter to which they have lawful access” for scientific research purposes. Text and data mining is understood to cover “any automated analytical technique aimed at analysing text and data in digital form in order to generate information, which includes but is not limited to patterns, trends and correlations,” which clearly covers the development of AI models (see here or, more recently, here).
Keller’s post goes through the details of how that feeds into AI research, but the end-result is the following:
as long as the model is made available in line with the public-interest research missions of the organisations undertaking the training (for example, by releasing the model, including its weights, under an open-source licence) and is not commercialised by these organisations, this also does not affect the status of the reproductions and extractions made during the training process.
This means that Article 3 does cover the full model-development pathway (from data acquisition to model publication under an open source license) that most non-commercial Public AI model developers pursue.
As that indicates, the use of open source licensing is critical to this application of Article 3 of EU copyright legislation for the purpose of AI research.
What’s noteworthy here is how two different pieces of EU legislation, passed some years apart, work together to create a special category of open source AI systems that avoid most of the legal problems of training AI systems on copyright materials, as well as the bureaucratic overhead imposed by the EU AI Act on commercial systems. Keller calls these “public AI”, which he defines as:
AI systems that are built by organizations acting in the public interest and that focus on creating public value rather than extracting as much value from the information commons as possible.
Public AI systems are important for at least two reasons. First, their mission is to serve the public interest, rather than focusing on profit maximization. That’s obviously crucial at time when today’s AI giants are intent on making as much money as possible, presumably in the hope that they can do so before the AI bubble bursts.
Secondly, public AI systems provide a way for the EU to compete with both US and Chinese AI companies – by not competing with them. It is naive to think that Europe can ever match levels of venture capital investment that big name US AI startups currently enjoy, or that the EU is prepared and able to support local industries for as long and as deeply as the Chinese government evidently plans to do for its home-grown AI firms. But public AI systems, which are fully open source, and which take advantage of the EU right of research institutions to carry out text and data mining, offer a uniquely European take on generative AI that might even make such systems acceptable to those who worry about how they are built, and how they are used.
Follow me @glynmoody on Mastodon and on Bluesky. Originally published to the Walled Culture blog.
Filed Under: ai, cdsm, copyright, eu, generative ai, llms, open source, open source ai, public ai, text and data mining


Comments on “Public AI, Built On Open Source, Is The Way Forward In The EU”
And yet, as with happens to open source, particularly under the sort of licenses which are endlessly evangelized, and just like in the very examples provided, commercial entities use it, modify it to their own ends, and lock it up in their products and services with endless rent seeking and surveillance. On top of “AI” being a fucking slow motion train wreck alrwady.
Re:
The whole point of open source is that you don’t get to tell other people how they can use their software. That is what freedom means. Complaining about companies having business models built upon it seems like the left-wing logical equivalent of ‘but terrorists can use it!’ applied to encryption.
Re: Re:
The previous comment was complaining about commercial entities “locking it up”, which is the type of thing that some, but not all, licenses prohibit. Such licenses can meet both the Open Source Definition and Free Software Definition.
Re:
The free stuff is still available. If the general public keep preferring and supporting the locked-up versions, I suppose someone needs to make a good case why it’s a terrible idea. The makers of “internet-of-things” products seem to be trying really hard to prove that.
And it being open-source won’t really do much to change that. Sure, anyone will be able to disable the “don’t tell people how to build bombs” feature. But in terms of result comprehensibility, debuggability, and “hallucinations”, it may as well be proprietary. It’s not likely to be much less resource-intensive, either.
The future is not AI.
AI is not inevitable.
The Excuse
I love mentioning to those running Servers, Why Not Linux? And they say its TO HARD, Windows is Easy(and Crap).
Linux based machines are EASY to Backup the system, Burn a DVD or Flash drive to Boot an OS, then Tell it to COPY the WHOLE HD configurations you made, and its almost done.
Easy Backups, Easy recovery. NO hiding the files, unless you want to.
Linux has so many versions/types that you can make it look like Apple or windows or what ever. Load it and run.. OR Dive into it and play with the configurations. DO keep backups to Original. And putting Data and Files and Games on ANOTHER drive isnt hard. Windows(has gotten better) love loosing the locations.
OH! I forgot, CURRENT windows versions from 10-11, Started adding LINUX configurations… ONE of those is being able to use a HD/SSD larger then 2Terabytes. You can do 8T. WHICH IS REALLY FUN, as Windows Install may not understand it(I aint done this yet). I prefer to Multi Section LARGE drives, But windows IF YOU DONT SET IT UP, Wont see that 8T drive.
There is no acceptable use of generative AI, there is no “but this time it’s different”.
Re:
Meh. Generative AI can have positive uses. You’re just not using your imagination.
It shouldn’t replace artists and writers. But it can be used to generate fake posts for your fake social media account so that the Trump administration will think you love America when they review your social media accounts during your immigration process. Doing that manually would be really time-consuming.
Generate fake selfies to get around oppressive, privacy-violating ID verification systems.
Fill ExTwitter with more AI slop so more advertisers and users leave the platform.
Write bullshit work emails so your asshole boss thinks you really care about his latest project while you’re working on developing your side gig so you can quit one day.
Use AI to socially mask for you so you don’t have to do the emotional labor of interacting directly with exhausting narcissists.
Re: Re:
Wow, bullshit generators sure are solving a lot of the problems caused by bullshit generators!
Re: Re: Re:
To be fair, humans are the original bullshit generators. We’re just outsourcing the effort and increasing the cost and chaos.
Re: Re: Re:2
This is a profoundly anti-human perspective and you should be embarrassed to have said it in public.
Re: Re: Re:3
So I was going to say, “It’s anti-human to be honest about the nature of human beings? Have you met human beings before?”
But then I realized that you have to be facetious here and you’re spewing bullshit to ironically prove my point. so in that regard, bravo!
Re: Re:
And which of those is worth destroying the environment and the economy over?
Re: Re: Re:
Quote me where I said we should be doing that. The topic was about the possible uses of LLMs, not their costs. The costs don’t justify the use, but it’s a juggernaut that our random comments on a website are not going to derail.