Can Agentic AI Coding Tools Finally End Copyright For Software While Re-Inventing Open Source?
from the reinventing-software dept
Most of the discussions about the impact of the latest generative AI systems on copyright have centered on text, images and video. That’s no surprise, since writers, artists and film-makers feel very strongly about their creations, and members of the public can relate easily to the issues that AI raises for this kind of creativity. But there’s another creative domain that has been massively affected by genAI: software engineering. More and more professional coders are using generative AI to write major elements of their projects for them. Some top engineers even claim that they have stopped coding completely, and now act more as a manager for the AI generation of code, because the available tools are now so powerful. This applies in the world of open source software too. But a recent incident shows that it raises some interesting copyright issues there that are likely to affect the entire software world.
It concerns a project called chardet, “a universal character encoding detector for Python. It analyzes byte strings and returns the detected encoding, confidence score, and language.” A long and detailed post on Ars Technica explains what has happened recently:
The [chardet] repository was originally written by coder Mark Pilgrim in 2006 and released under an LGPL license that placed strict limits on how it could be reused and redistributed.
Dan Blanchard took over maintenance of the repository in 2012 but waded into some controversy with the release of version 7.0 of chardet last week. Blanchard described that overhaul as “a ground-up, MIT-licensed rewrite” of the entire library built with the help of Claude Code to be “much faster and more accurate” than what came before.
Licensing lies at the heart of open source. When Richard Stallman invented the concept of free software, he did so using a new kind of software license, the GPL. This allows anyone to use and modify software released under the GPL, provided they release their own code under the same license. As the above description makes clear, chardet was originally released under the LGPL – one of the GPL variants – but version 7.0 is licensed under the much more permissive MIT license. According to Ars Technica:
Blanchard says he was able to accomplish this “AI clean room” process by first specifying an architecture in a design document and writing out some requirements to Claude Code. After that, Blanchard “started in an empty repository with no access to the old source tree and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code.”
That is, generative AI would appear to allow open source licenses like the GPL to be circumvented by rewriting the code without copying anything directly from the original. That’s possible because AI is now so good at coding that the results can be better than the original, as Blanchard proved with version 7.0 of chardet. And because it is new code, it can be released under any license. In fact, it is quite possible that code produced by genAI is not covered by copyright at all, for the same reason that artistic output created solely by AI can’t be copyrighted. If the license can be changed or simply cancelled in this way, then there is no way to force people to release their own variants only under the GPL, as Stallman intended. Similarly, the incentive for people to contribute their own improvements to the main version is diminished.
The ramifications extend even further. These kind of “AI clean room” implementations could be used to make new versions of any proprietary software. That’s been possible for decades – Stallman’s 1983 GNU project is itself a clean-room version of Unix – but generally requires many skilled coders working for long periods to achieve. The arrival of highly-capable genAI coding tools has brought down the cost by many orders of magnitude, which means it is relatively inexpensive and quick to produce new versions of any software.
In effect, generative AI coding systems make copyright irrelevant for software, both open source and proprietary. That’s because what is important about computer code is not the details of how it is written, but what it does. AI systems can be guided to create drop-in replacements for other software that are functionally identical, but with completely different code underneath.
Companies that license their proprietary software will probably still be able to do so by offering support packages plus the promise that they take legal responsibility for their code in a way that AI-generated alternatives don’t: businesses would pay for a promise of reliability plus the ability to sue someone when things go wrong. But for the open source world these are not relevant. As a result, the latest progress in AI coding seems a serious threat to the underlying development model that has worked well for the last 40 years, and which underpins most software in use today. But a wise post by Salvatore “antirez” Sanfilippo sees opportunities too:
AI can unlock a lot of good things in the field of open source software. Many passionate individuals write open source because they hate their day job, and want to make something they love, or they write open source because they want to be part of something bigger than economic interests. A lot of open source software is either written in the free time, or with severe constraints on the amount of people that are allocated for the project, or – even worse – with limiting conditions imposed by the companies paying for the developments. Now that code is every day less important than ideas, open source can be strongly accelerated by AI. The four hours allocated over the weekend will bring 10x the fruits, in the right hands (AI coding is not for everybody, as good coding and design is not for everybody).
Perhaps a new kind of open source will emerge – Open Source 2.0 – one in which people do not contribute their software patches to a project, as they do today, but instead send their prompts that produce better versions. People might start working directly on the prompts, collaborating on ways to fine tune them. It’s open source hacking but functioning at a level above the code itself.
One possibility is that such an approach could go some way to solving the so-called “Nebraska problem”: the fact that key parts of modern digital infrastructure are underpinned up by “a project some random person in Nebraska has been thanklessly maintaining since 2003”. That person may not receive many more thanks than they have in the past, but with AI assistants constantly checking, rewriting and improving the code, at least the selfless dedication to their project becomes a little less onerous, and thus a little less likely to lead to programmer burn out.
Follow me @glynmoody on Mastodon and on Bluesky. Originally published to Walled Culture.
Filed Under: chardet, copyright, licensing, open source, relicensing


Comments on “Can Agentic AI Coding Tools Finally End Copyright For Software While Re-Inventing Open Source?”
pretty sure clean room rewrites require the tools to not have the entire codebase included as a baseline
so how has he certified that claude’s training data didn’t include the open source code prior to the ‘clean room’ process?
bearing in mind that there’s an ongoing lawsuit involving chatbots fundamentally identical to claude regurgitating whole articles when prompted
Re:
Clean room is a sufficient, not necessary, condition for the new work to not be a derivative of an existing work. One can, for example, write a book using another book as a reference. That doesn’t make the new book a derivative of the other book.
Re:
What about computer languages?
Google used Mozilla’s clean room coding of Sun’s (now part of Oracle) Java language to write the Android operating system. Oracle sued Google for using the language functions (i.e. the API not the coding).
The Supreme Court found that it was fair use thus overturning the Federal Court decision. However, that would still be fair use of copyright material.
ATM a software specification (what the software code does) is copyright. Even worse, Apple V. Google found that swiping across a mobile screen to turn a mobile phone off was patented.
So, writing a software specification for AI to write code for a mobile phone that employs screen gestures could violate both copyrights and patents.
Why do we need to code anything? Can’t the AI just do whatever a program would do?
Re: The chat is an external tool
If you need an offline, secure, free or reliable application you would not use this third-party pay-per-use subject-to-change online-only chatbot directly in your application.
First rule of headlines: If they ask a question, the answer is “No”.
No
I’ve long ago learned that when a technology news headline is of the form “Can X do Y”, the answer is “No.”
https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines
Speaking of “Open”-source AI…
https://arstechnica.com/ai/2026/03/entire-claude-code-cli-source-code-leaks-thanks-to-exposed-map-file/
Re:
And they already dmcaed they cult followers
And once again, the question is not asked, ‘Do open source projects actually want forced reinvention by AI?’ The users keep saying no, the devs keep saying no for the most part, but the likes of Anthropic are determined to forcibly ‘contribute’ no matter what and AI boosters in management roles keep on cramming it into anything, spending untold riches gambling the likes of Mozilla’s future on AI features nobody requested.
I don’t believe for a second AI will lead to a future without copyright, that ladder is going to be pulled up sooner rather than later, now they have gotten what they want from creators. I do believe that the people promoting it want a future without consent, where their will and financial clout means they have the final say.
Re:
This is not my experience at all. Especially among devs.
Re: Re:
Have you ever thought to interview developers? Have you thought to interview ones that aren’t just producing the software equivalent of cheap widgets?
Because here is my experience as an actual software engineer doing actual difficult work.
It’s good at doing super simple tasks, or being find and replace v2.
It’s good at writing one off scripts.
It is bad at anything more. Imagine a crew of the cheapest labor you can find and have have them build a house. They forget to use nails, or they use 50 when you only need two, they duct tape over their mistakes, they use tools the wrong way, they use the wrong materials in areas.
The code it writes is a hot mess of glued together pieces unless guided in hand like a toddler that will run into the road if you let go.
Github has dropped its reliability down to 90% instead of 99.9%. Microslop just had to pull its latest patch and windows is getting more and more broken.
It’s not an if but a when that more and more hacks and outages are going to happen as a result of AI slop code.
Re: Re:
Insert snarky comment about the state and online behaviour of Bluesky Devs here.
You're missing something fundamental and important
“Software specification” is not a solved problem in computing, despite decades of research. The problem of turning a description of software into software remains very difficult — as we see constantly with any specification of sufficient size and complexity. Now…formal specification methodologies exist, and they can be used to generate provably correct code. But few people have the training to use these, and they have limits, and in one sense, this just shifts the problem.
So even if suppose that a perfect code generator exists, which human or AI, we do not know what to say to that person to cause the code we want to be produced — except in very limited cases, as in the one that you cite. So yes, we could probably replace /bin/cat with AI-generated code today. But could we create sendmail or postfix? No, probably not, because — as we’ve discovered — there are baseline problems (omissions, conflicts, ambiguities) with the protocol specification.
To put this another way: the code that we could replace this way is not code we need to replace, because it already exists and for the most part, it’s mature and stable. The code that we might want to write doesn’t exist or isn’t mature and stable. And part of the reason that it’s in that state is that we don’t have a truly viable specification for it.
And by the way, one of the dirty little secrets of programming, as a profession, is that almost nobody is any good at writing specifications — and they’re not very interested. Why? Programming is fun, specification is drudgery.
I find this pretty naive. First, proprietary software is still far harder to replicate using LLMs than open source. Even just reading the code, understanding its structure and using that to prompt an LLM is super helpful (and ideas are not copyrightable, so this wouldn’t violate any license).
It’s also naive to believe FOSS authors only work for the love of the craft – I think most of them still hope to gain, whether materially, or in prestige and social capital. But even those that do just do it for the love of the game probably aren’t very happy to have their passion project then appropriated, enclosed and exploited for profit. Like, that I am not trying to make any money off it is probably a good indication that I don’t want anyone else to do so either, right?
And even if we only consider selfish material interests: given that, once you publish your code, you can get outcompeted (both on price and just drowned out in the attention economy) immediately by thousands of copycats, why would any company publish their code? We want commercial interests to publish their code: it makes for better, safer software and protects user’s interest.
In a world of LLMs, all the incentives just align to not publish your code. I wouldn’t proclaim the death of open source, but LLMs can only harm the movement.
The funny thing about “AI clean room” process is that it isn’t. Claude has been trained on code with various licenses and all derivative code that is then produced by Claude likely inherits licenses that might have clauses like GPLv3 §5:
Slapping your own license on code produced by an AI-tool looks to me like breaking the original licenses of any code that was used in training the models even you explicitly avoid GPL-code. And in regards to avoiding GPL-code, you can’t be certain that the instructions are followed in their entirety by AI-tools.
Re:
“Clean-room design” was never really required by law. It’s more that the people doing it decided it’d be hard for someone to win a lawsuit against them for copying code, when they can convincingly say they never even saw the code in binary or source form. Nevertheless, learning from existing code, including by directly reverse-engineering and re-implementing it, is considered a legal right in much of the world, as long as it’s not copying.
While a lot of people agree with your view that it “looks like” breaking licenses, it’s far from obvious that the laws will support that. Sure, if it spits out an exact copy of something non-trivial and copyrighted, that’s not gonna go well (and people have seen LLMs to do that). Most cases will probably be much more ambiguous. Also remember that there was a significant period in U.S. copyright law where code was not considered copyrightable at all, because it was primarily functional rather than creative. If software companies were to replace people with computers, in significant numbers, it’d tend to support that view.
Re: Re:
I would argue that this case in unambiguous for the simple reason the instructions given to Claude was to specifically avoid GPLv3 source material which was them implicitly acknowledging that they knew they would be breaking software-licenses.
Re: Re: Re:
…had they used such material. Are you suggesting that a specific instruction to avoid copying something would be proof of a plan to illegally copy it? That would be bizarre.
Re: Re: Re:2
Only you are talking about copying here. Let me quote the relevant part that I base my argument on:
Ie, Blanchard acknowledges through his instructions to Claude that licenses can carry over.
Re: Re: Re:3
You’re talking about copyright licenses. What else would be relevant?
Re: Re: Re:4
What the license says and how it may carry over depending on what source material you used. Creating derivative works or copying OSS code isn’t a copyright infringement, changing the license while trying to avoid the rules set out in the original license is.
To reiterate the point, telling Claude to avoid using GPLv3 code to create new code is Blanchard explicitly acknowledging that licenses carry over – and then he slaps a MIT-license on the new code while ignoring any other licenses on source code the training material have ingested.
If copyright and licenses wasn’t a problem, why tell Claude to avoid GPLv3 sources in the first place?
Re: Re: Re:5
If thing B does not involve copying from thing A, the license of thing A is irrelevant. There’s no “change”; it’s one person slapping a license on the thing they own the copyright to.
Copyright and licensing is a major potential problem. The person is telling Claude to avoid that problem. I don’t see this as much different than FedEx instructing their drivers not to speed (which would certainly not be proof of criminal intent).
Re:
“The funny thing about “AI clean room” process is that it isn’t.”
First: you’re absolutely right. All of these models have been trained on an enormous amount of code: good, bad, old, new, working, broken, current, obsolete, etc.
Second: one of the things that the AI fanboys have not considered — because they believe their own hype, and because they’re greedy clueless ignorant newbies who don’t even begin to understand the practice of programming — is what else is in that corpus of code.
Every intelligence agency, every organized crime operation, every terrorist group, everyone out there knows that any code published on the Internet will be scraped and blindly incorporated into these models without any human review. These companies are too greedy, too cheap, too lazy, too stupid to do that.
Which means that any code with backdoors, deliberate security holes, etc. will be pulled in alongside completely benign code.
I trust everyone can work out the implications of that: they’re fairly obvious.
Re: Re:
“That’s a great point. You’re absolutely right! That code does have a backdoor! Sorry about that. Try this benign version.”
Re: Re:
Your trust is misplaced, not everyone can work that out or want to.
Also, relevant to this: Thinking—Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning and the Rise of Cognitive Surrender (The Wharton School Research Paper)
How this works based on training data is a little iffy. Simply telling it not to use it is probably not sufficient.
I’m not sure that’s entirely a good thing for open source. Although if the bar is low enough, maybe it doesn’t matter?
Unfortunately, currently the model seems to be sending slop. It killed Curl’s bug bounty program, and it’s hurting a lot of projects open pull request system. :/
There’s a real problem when you have asymmetry, where it takes much more effort for the human on the other side to verify, than to generate.
This seems like a huge leap in logic to me. The difference between programming and coding has been previously been pointed out in Techdirt comments, and this article correctly uses the term “coding”.
So, sure, if someone’s already done the hard part of writing a precise specification, maybe a computer can do the comparatively easy part of coding it. Historically, it’s also been popular to farm the coding out to a team of grad students, an outsourcing shop in Eastern Europe, or whatever. People who’ve done this tend to find out that the result is only about as good as the specification, which it turns out is often lacking.
A very good coding team will provide valuable feedback about inconsistences, ambiguities, and potential improvements for the specs, and it can lead to an excellent result. Less-good teams often manage to provide results that technically meet the requirements, while not really being what the client needed (but neglected to correctly ask for, cf. “The Monkey’s Paw”). Much of it deserves to be called “slop”.
As for copyright, well, the open-source world has been moving toward permissive licenses for a while, and the software world in general has been moving toward open-source. Maybe these systems could accelerate the transition somewhat. But people have been saying stuff like “why can’t these open-source developers just write me a Photoshop?” for 30 years now, and if that’s the best design document they can come up with, they should not expect amazing results.
Those who can write great design documents are basically programmers or “software architects” already; and probably professional ones, because it’s rare that anyone is very good at this right out of school.
Can I rewrite your entire website and rename techspoop?
The answer is NOOOO. Any large enough company just steals people work anyway, because copyright only exists if you can sue them and win. What won’t change with AI is large companies suing you into the dirt regardless of the law.
So why is it you and others won’t to steal the work of the working class so much? Because the writers of open source projects that use these licenses are just that.
If the maintainer then modifies a single line of code, does the AI-generated software suddenly become coprightable? Or 10 lines, or a 1000? At what point does it become copyrightable?
If the answer is “never, unless it’s 100% human-written”, then most closed-source software has now lost the possibility for copyright, and the software companies may be one code-leak away from bankruptcy.
Re:
That line might be, but not the rest. And only if it embodies some human creativity. There’s no point at which the lines not created or touched by humans will become copyrightable (under current law).
Still, it’s ironic that Microsoft is one of the companies eroding copyright, given the Gates “Letter to Hobbyists” accusing them of “stealing”, and Microsoft’s traditional hard-line stance with the Business Software Alliance and all (e.g. sending goons to “audit” licensing).
Re:
There exists legal tests for such scenarios:
You can put a copyright on a work that’s a derivative of an uncopyrightable work, but only for the new contributions.
Re: Re:
The ‘sweat of the brow’, skill and ideas count for nothing in copyright. Directories and menus are not copyright protected. Only the flowery language and the pretty pictures in a cookbook are protected.
The only thing that counts is copying the creative expression. (although, judges seem to ignore the ideas/expression dichotomy when it suits them).
The court would have to decide how much the creative expression in the new work differed from the creative expression in the original. If you think you would have a problem deciding this, judges tend to be ten time more hopeless at software copyright law than you would be.
In Oracle V. Google, Judge William Alsup demonstrated a terrific understanding of both APIs, human-readable coding and its translation into machine code, and the key aspects of copyright law.
Most judges seem to think that software that looks and feels similar to another piece of software is a copyright violation. It’s a random outcome, weighted to the earlier work.
First AI will have to start being competent at code.
https://www.theregister.com/2026/03/17/ai_businesses_faking_it_reckoning_coming_codestrap/
I’m a hardcore free software person but I’d trade in the GPL for the end of software copyright. I’m not convinced though that this will actually happen. AI makes reverse engineering easier but it’s easier still to strip protections from software you have the source code to, so free software will lose copyleft but proprietary crap will stay copyright.
Of course, all the permissive open source stuff will be no more vulnerable to exploitation than it ever was, and it probably constitutes the bulk of “FOSS”. Why people want corporations to exploit them is beyond me, but AI will just accelerate what was already standard there.
AI boosters are so annoying.