Public AI, Built On Open Source, Is The Way Forward In The EU

from the open-source-is-the-way dept

Aquarter of a century ago, I wrote a book called “Rebel Code”. It was the first – and is still the only – detailed history of the origins and rise of free software and open source, based on interviews with the gifted and generous hackers who took part. Back then, it was clear that open source represented a powerful alternative to the traditional proprietary approach to software development and distribution. But few could have predicted how completely open source would come to dominate computing. Alongside its role in running every aspect of the Internet, and powering most mobile phones in the form of Android, it has been embraced by startups for its unbeatable combination of power, reliability and low cost. It’s also a natural fit for cloud computing because of its ability to scale. It is no coincidence that for the last ten years, pretty much 100% of the world’s top 500 supercomputers have all run an operating system based on the open source Linux.

More recently, many leading AI systems have been released as open source. That raises the important question of what exactly “open source” means in the context of generative AI software, which involves much more than just code. The Open Source Initiative, which drew up the original definition of open source, has extended this work with its Open Source AI Definition. It is noteworthy that the EU has explicitly recognized the special role of open source in the field of AI. In the EU’s recent Artificial Intelligence Act, open source AI systems are exempt from the potentially onerous obligation to draw up a range of documentation that is generally required.

That could provide a major incentive for AI developers in the EU to take the open source route. European academic researchers working in this area are probably already doing that, not least for reasons of cost. Paul Keller points out in a blog post that another piece of EU legislation, the 2019 Copyright in the Digital Single Market Directive (CDSM), offers a further reason for research institutions to release their work as open source:

Article 3 of the CDSM Directive enables these institutions to text and data-mine all “works or other subject matter to which they have lawful access” for scientific research purposes. Text and data mining is understood to cover “any automated analytical technique aimed at analysing text and data in digital form in order to generate information, which includes but is not limited to patterns, trends and correlations,” which clearly covers the development of AI models (see here or, more recently, here).

Keller’s post goes through the details of how that feeds into AI research, but the end-result is the following:

as long as the model is made available in line with the public-interest research missions of the organisations undertaking the training (for example, by releasing the model, including its weights, under an open-source licence) and is not commercialised by these organisations, this also does not affect the status of the reproductions and extractions made during the training process.

This means that Article 3 does cover the full model-development pathway (from data acquisition to model publication under an open source license) that most non-commercial Public AI model developers pursue.

As that indicates, the use of open source licensing is critical to this application of Article 3 of EU copyright legislation for the purpose of AI research.

What’s noteworthy here is how two different pieces of EU legislation, passed some years apart, work together to create a special category of open source AI systems that avoid most of the legal problems of training AI systems on copyright materials, as well as the bureaucratic overhead imposed by the EU AI Act on commercial systems. Keller calls these “public AI”, which he defines as:

AI systems that are built by organizations acting in the public interest and that focus on creating public value rather than extracting as much value from the information commons as possible.

Public AI systems are important for at least two reasons. First, their mission is to serve the public interest, rather than focusing on profit maximization. That’s obviously crucial at time when today’s AI giants are intent on making as much money as possible, presumably in the hope that they can do so before the AI bubble bursts.

Secondly, public AI systems provide a way for the EU to compete with both US and Chinese AI companies – by not competing with them. It is naive to think that Europe can ever match levels of venture capital investment that big name US AI startups currently enjoy, or that the EU is prepared and able to support local industries for as long and as deeply as the Chinese government evidently plans to do for its home-grown AI firms. But public AI systems, which are fully open source, and which take advantage of the EU right of research institutions to carry out text and data mining, offer a uniquely European take on generative AI that might even make such systems acceptable to those who worry about how they are built, and how they are used.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to the Walled Culture blog.

Filed Under: , , , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Public AI, Built On Open Source, Is The Way Forward In The EU”

Subscribe: RSS Leave a comment
14 Comments
Anonymous Coward says:

And yet, as with happens to open source, particularly under the sort of licenses which are endlessly evangelized, and just like in the very examples provided, commercial entities use it, modify it to their own ends, and lock it up in their products and services with endless rent seeking and surveillance. On top of “AI” being a fucking slow motion train wreck alrwady.

Anonymous Coward says:

Re:

The whole point of open source is that you don’t get to tell other people how they can use their software. That is what freedom means. Complaining about companies having business models built upon it seems like the left-wing logical equivalent of ‘but terrorists can use it!’ applied to encryption.

Anonymous Coward says:

Re: Re:

The whole point of open source is that you don’t get to tell other people how they can use their software. […] Complaining about companies having business models built upon it

The previous comment was complaining about commercial entities “locking it up”, which is the type of thing that some, but not all, licenses prohibit. Such licenses can meet both the Open Source Definition and Free Software Definition.

Anonymous Coward says:

Re:

commercial entities use it, modify it to their own ends, and lock it up in their products and services with endless rent seeking and surveillance.

The free stuff is still available. If the general public keep preferring and supporting the locked-up versions, I suppose someone needs to make a good case why it’s a terrible idea. The makers of “internet-of-things” products seem to be trying really hard to prove that.

On top of “AI” being a fucking slow motion train wreck alrwady.

And it being open-source won’t really do much to change that. Sure, anyone will be able to disable the “don’t tell people how to build bombs” feature. But in terms of result comprehensibility, debuggability, and “hallucinations”, it may as well be proprietary. It’s not likely to be much less resource-intensive, either.

ECA (profile) says:

The Excuse

I love mentioning to those running Servers, Why Not Linux? And they say its TO HARD, Windows is Easy(and Crap).
Linux based machines are EASY to Backup the system, Burn a DVD or Flash drive to Boot an OS, then Tell it to COPY the WHOLE HD configurations you made, and its almost done.
Easy Backups, Easy recovery. NO hiding the files, unless you want to.
Linux has so many versions/types that you can make it look like Apple or windows or what ever. Load it and run.. OR Dive into it and play with the configurations. DO keep backups to Original. And putting Data and Files and Games on ANOTHER drive isnt hard. Windows(has gotten better) love loosing the locations.
OH! I forgot, CURRENT windows versions from 10-11, Started adding LINUX configurations… ONE of those is being able to use a HD/SSD larger then 2Terabytes. You can do 8T. WHICH IS REALLY FUN, as Windows Install may not understand it(I aint done this yet). I prefer to Multi Section LARGE drives, But windows IF YOU DONT SET IT UP, Wont see that 8T drive.

MrWilson (profile) says:

Re:

Meh. Generative AI can have positive uses. You’re just not using your imagination.

It shouldn’t replace artists and writers. But it can be used to generate fake posts for your fake social media account so that the Trump administration will think you love America when they review your social media accounts during your immigration process. Doing that manually would be really time-consuming.

Generate fake selfies to get around oppressive, privacy-violating ID verification systems.

Fill ExTwitter with more AI slop so more advertisers and users leave the platform.

Write bullshit work emails so your asshole boss thinks you really care about his latest project while you’re working on developing your side gig so you can quit one day.

Use AI to socially mask for you so you don’t have to do the emotional labor of interacting directly with exhausting narcissists.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...