Glyn Moody's Techdirt Profile

Glyn Moody

About Glyn Moody

Posted on Techdirt - 11 June 2026 @ 11:05am

Why Google’s New AI-Saturated Search Page Will Be A Disaster

Google didn’t invent full-text search of the Internet – that honor belongs to early pioneers such as WebCrawlerLycos and AltaVista. But for the last 25 years or so, Google has been synonymous with online searching, providing the quickest and most effective way to find things online (although its results may be getting worse.) More recently, it has been adding to its search engine more features based on generative AI, first with its AI Overviews in 2024, and then a year later with its AI Mode in Search. Now it has announced the latest stage in that evolution with what it calls “A new era for AI Search”:

It’s more intuitive than ever, dynamically expanding to give you space to describe exactly what you need. Designed to anticipate your intent, it also helps you formulate your question with AI-powered suggestions that go beyond autocomplete. And you can search across modalities, using text, images, files, videos or Chrome tabs as inputs.

This new incarnation effectively turns search into a chatbot:

You can easily ask a follow-up question right from an AI Overview, and flow into a conversational back and forth with AI Mode. Your context stays with you, and as you explore more deeply, the links and supporting articles get even more relevant. This seamless experience is live today across desktop and mobile, worldwide.

As the the screenshot of the new interface above shows, the traditional search result links that are currently placed under the AI Overview have now been confined to a small panel on the right-hand side of the screen, which shows a cut-down version of today’s list. Users are encouraged to ask follow-up questions from the AI search chatbot, rather than exploring the links themselves.

What this is likely to mean in practice is that even fewer people will follow links to sites, something that was already happening last year; instead, they will engage with Google’s chatbot to gather information indirectly. This is terrible news for access to knowledge because it frames the Google AI search engine as the fount of all knowledge – one that will do all the hard work of finding information and combining it into an easily digested answer that can be interrogated further. It can do that because it has already ingested billions of Web pages and other information sources as part of the Large Language Model (LLM) training process. But search engine users will no longer know what some of those sources are unless they painstakingly click on the links in the new panel.

Most people will not bother, because the AI-generated results will be good enough – or at least will appear to be good enough. Unless visitors to the site take the trouble to follow the links to the sources they won’t really know how reliable those results are. For example, it is possible that the sources are wrong, or misleading; moreover, Google’s LLM may itself introduce new errors and distortions. There is also the question of how Google will insert ads into this AI-generated information, and to what extent advertisers will be able to buy preferential treatment in results.

This new mediated approach is clearly terrible news for Wikipedia – an issue already discussed on Walled Culture earlier this year – and for creators. Google will use the information found in their works, but will not actively encourage people to visit the originals. For many people, summaries will be good enough, and they will never discover the greater riches of the sites and creations that Google’s LLM is based on. Worse still, the original creators such as Wikipedia may not even be mentioned in answers that involve aggregating information from a large number of sources.

Similarly, the new Google search is the publishing industry’s worst nightmare. Not only is Google drawing on material they have published, but it is pushing links to those sources into the background. It seems inevitable that the Web traffic to publishers will fall yet further, making already struggling business models based on advertising even more precarious. That will have knock-on consequences for the funding of many sites – particularly newspapers and magazines – and for the commissioning of work from journalists and other creative professionals. Users won’t even need to visit Google Search much in order to keep up-to-date with topics of interest thanks to Google Search’s new agentic capabilities that will do the work for them in advance:

With information agents, you can stay updated on whatever matters most to you. Your agent will intelligently look across everything on the web, like blogs, news sites and social posts, plus our freshest data, such as real-time info on finance, shopping and sports, to monitor for changes related to your specific question.

In this case, not only will people not visit sites, but the latter will be constantly bombarded by various AI bots seeking information on behalf of users – increasing site running costs, and making sites less usable by humans. Another key announcement from Google will lead to a further flood of agentic activities that will pose new challenges to businesses:

We’re also expanding agentic booking capabilities in Search to a wide range of new tasks, including local experiences and services. Just share your specific criteria — like finding a private karaoke room for six on a Friday night that serves food late — and Search brings together the latest pricing and availability with direct links to finish booking through the provider of your choice. And for select categories like home repair, beauty or pet care, you can ask Google to call businesses on your behalf.

What emerges from Google’s latest announcements is less of a search engine, and more of an immersive virtual environment that is designed to keep people engaging with Google’s services, asking them for information, advice and even delegating actions to them. There is no doubt that many users will find these new features attractive, not least because they can use “conversational voice features” in Gmail, Docs and elsewhere. These are the digital assistants that have been promised for many years, able to understand spoken commands, provide information verbally, and carry out complex operations on behalf of users without the need for any complex training. For many people, that will be a boon, and they will doubtless migrate from the traditional search page, which will still be the default – at least for now – to the latest AI-infused version.

But these impressive technical features come at a high price, even leaving aside issues such as the environmental impact of the huge server farms they require. With the latest incarnation of its search engine, Google is making the World Wide Web as we have known it for over 30 years invisible, and therefore increasingly irrelevant to most people, who will be happy to let Google become their universal user interface to everything. And yet Google still depends on the Internet to supply all the information it is analyzing and repackaging. It risks killing the very thing that sustains it.

There’s another, more subtle issue. The new Google search features make finding information and carrying out actions very easy in many ways. Leaving aside the problem that this will require people to trust what is in effect a huge black box, where the internal workings cannot be examined, with all the loss of control this implies, there is another danger. People who use Google’s powerful new AI search services to offload many of their day-to-day actions may gradually lose the ability to understand the world and to act within it without that constant help. Such a dependence may be great for Google and its advertisers, but it surely cannot be a good thing for the future of society.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to WalledCulture.

Posted on Techdirt - 22 May 2026 @ 01:14pm

France’s Terrible Copyright Law, Hadopi, Is Not Quite Dead

One of the best demonstrations that an obsession with protecting copyright’s intellectual monopoly drives politicians insane is the French law known as Hadopi, an acronym for ‘Haute Autorité pour la diffusion des oeuvres et la protection des droits sur internet’ (High Authority for the Dissemination of Works and the Protection of Rights on the Internet). The Hadopi mechanism has been trying – and failing – to police copyright’s intellectual monopoly in France for 15 years now, and it is one of the main villains in the Walled Culture book (free digital versions available).

Here’s how Hadopi’s “graduated response” approach worked when a revised version came into operation in 2010. Alleged infringers were warned twice; if another allegation was made within a year of the second warning, the subscriber’s Internet connection could be suspended. A fine of €1,500 could also be imposed. The first notices were sent out in September 2010; by December of that year, copyright companies were issuing between 25,000 and 50,000 infringement allegations per day. At the end of July 2013, Hadopi had issued 2 million first notices and 200,000 second notices. There were 710 investigations to ascertain whether those who had been accused three times should be referred to the prosecutors.

That gives an idea of the scale of the investigations into people’s everyday use of the Internet in France, and of the databases of personal data that were created. And yet the first and only disconnection order, issued in June 2013, turned out to be unenforceable, because the disconnection only applied to Web access – other services like email, private messaging, the telephone line or TV services had to be preserved somehow – and was later dropped.

By 2020, Hadopi had been in existence in various forms for a decade. Working from Hadopi’s annual report for that year, the French magazine Next INpact calculated that in total the agency had imposed €87,000 in fines. The cost of running Hadopi was picked up entirely by French taxpayers and came to €82 million. In other words, a system that had failed to discourage people downloading unauthorized copies of copyright material, had also cost nearly a thousand times more to run than it generated in fines.

As Walled Culture reported at the time, in 2023 the French digital rights organization La Quadrature du Net brought a challenge to the Hadopi system, still running in theory, on the grounds that it was incompatible with the two EU laws defining Europe’s data protection regime, the General Data Protection Regulation and the ePrivacy DirectiveShockingly, in 2024 the Court of Justice of the European Union (CJEU), the EU’s top court, ruled that “the general and indiscriminate retention of [Internet Protocol] addresses does not necessarily constitute a serious interference with fundamental rights”. La Quadrature du Net did not give up. Alongside the case at the CJEU, it was also taking legal action in France:

In 2019, we asked the Conseil d’État to overturn Hadopi’s central decree, which authorises the storage of personal data needed for the graduated response system (IP addresses, civil identity and downloaded material). The case was referred to the Constitutional Council and in 2020 we had our first partial victory: the Constitutional Council restricted Hadopi’s broad access to personal data (the law at the time provided that it could access “all documents”). However, despite to our initial assessment, this did not necessarily mark the end of the Hadopi.

The defeat handed down by the CJEU in 2024 offered a glimmer of hope:

The outcome was disappointing, as we lost on the principle: the CJEU agreed to weaken its case law. It accepted that access to metadata might, in certain cases, not be subjected to prior independent review. However, it required numerous conditions to this possibility, relating to both the retention of such data and the requirements for prior independent review.

Those two issues – retention of metadata and the requirement for prior independent review – have now been acknowledged as problematic by the Conseil d’État in a new ruling:

the Conseil d’État finally agreed with us on these two points. Firstly, it found that the retention of metadata is not carried out in a manner that safeguards civil liberties. The CJEU required “watertight separation” of IP addresses and civil identity data (which can be understood as two distinct databases, or files, that can only be technically correlated after a formal request for access by Arcom). The Conseil d’État notes that “no legal provision imposes such retention, under these conditions, on electronic communications operators”.

Secondly, it also notes that access to this data is not subject to independent review. It fully endorses the conclusions already made by the CJEU, that Arcom [the body that took over Hadopi’s role] cannot be both judge and jury: it cannot request access and then review the legality of that access itself, even though it is an independent authority. However, like the CJEU, the Conseil d’État considers that this lack of review is only an issue from the third access to the data onwards, the stage at which a registered letter is sent.

As La Quadrature du Net notes, in practical terms, this latest ruling means that Hadopi is “stalled”:

The Arcom can no longer take you to court, as the requirements set by the CJEU are not satisfied. And it can only send you an email if it has first ensured that your internet service provider has stored your metadata with a “watertight separation”. It has now been downgraded to the function of a giant spam machine.

Hadopi is not quite dead yet: the French government could try to solve the two problems pointed out by the CJEU and confirmed by the Conseil d’État, by setting up yet more independent bodies to handle these specific aspects of Hadopi. That would involve throwing even more taxpayers’ money at an approach that has not only failed completely, but which is fundamentally misguided. Clearly, trying to keep the moribund Hadopi alive in this way would be an irrational and wasteful thing for the French government to contemplate; but given this is the world of copyright, it might well try to do it anyway.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Posted on Techdirt - 15 May 2026 @ 03:24pm

Why The US Can’t Adopt Ukraine’s Innovative Approach To Unmanned Warfare Systems

It is widely accepted that drones have changed the conduct of modern war dramatically. The war in Ukraine, in particular, is driving the rapid evolution of drone technology. Evidence of how far things have come was provided recently by the following claim from Ukraine, reported here on The Next Web (TNW):

In April, Ukrainian President Volodymyr Zelensky announced that his forces had, for the first time in the history of warfare, seized an enemy position using only unmanned systems. No infantry. No human soldiers entering the contested ground. Drones and ground robots identified the target, suppressed defensive fire, and captured the position without a single Ukrainian casualty. The claim has not been independently verified in detail, and Ukraine’s military has declined to provide specifics.

The TNW article goes on to give some details about the company that apparently played a major role in that unmanned assault:

a Ukrainian-British defence technology startup called UFORCE, has conducted more than 150,000 combat missions since Russia’s full-scale invasion in 2022, achieved unicorn status with a valuation exceeding one billion dollars, and is now scaling production from a discreet London headquarters designed, the company says, to protect it from Russian sabotage. The age of unmanned warfare is no longer a conference-circuit prediction. It is a line item on a defence contractor’s balance sheet.

Politico interviewed the Ukrainian commander in charge of the Third Assault Brigade’s ground robotic systems unit, the one which carried out the attack. Mykola Zinkevych provided some interesting indications of what robotic systems were already doing today, and what Ukraine’s future plans were for unmanned warfare systems. For example, Zinkevych said:

Delivery of important cargo, evacuation of the wounded, conducting surveillance in open areas, destruction of enemy fortifications, sabotage operations behind enemy lines, laying minefields — all this is now performed by ground robotic systems

In the short term:

Infantrymen can and should be taken out of direct fire. Our goal for 2026 is to replace up to 30 percent of personnel in the most difficult areas of the front with technology

In a post on Facebook (in Ukrainian), Zinkevych gave details of the ambitious longer-term goals (via Google Translate), which will involve the wider deployment of unmanned ground vehicles (UGV):

In March alone, 9,000+ missions were completed by the military. Our goal is for 100% of front-line logistics to be performed by robotic systems.

In the first half of 2026, due to increased demand, we will contract 25,000 UGVs, which will be gradually delivered to the front. This is twice as much as in the entire year 2025.

A new paper from the Carnegie Endowment for International Peace, written by the former defense minister of Ukraine, Andriy Zagorodnyuk, explores what he calls “The New Revolution in Military Affairs”, which is being brought about by “rapid innovation and adaptation, introducing new types of unmanned systems, countermeasures, and operating methods at unprecedented speed.” A key element of this is “affordable precise mass” the highly effective deployment of cheap, long-range drones on a massive scale. He calls this transformation:

a structural shift in warfare in which new technologies drive the development of novel operational concepts and doctrines, fundamentally altering how military power is generated and employed, and forcing enduring changes in military organizations. These trends include the emergence of affordable precise mass, the fragmentation of the air domain, the growing difficulty of maneuver, the centrality of networked warfare, and the elevation of rapid adaptation as a core military capability. This transformation is still in its early stages, but countries that fail to recognize and adapt to it risk preparing for a form of war that has lost its decisiveness.

One important aspect of this shift touches on an area that will be familiar to Techdirt readers. As noted in the quotation above, Zagorodnyuk underlines the importance of rapid adaptation for this new kind of warfare:

The decisive advantage lies with those who can shorten the loop between combat experience, technical adaptation, and redeployment. As a result, ultra-fast adaptation becomes a paramount requirement for survival—and directly shapes force organization.

In Ukraine, this has led to drone operators being deeply involved in the technology’s evolution:

Units maintain their own repair facilities, component stocks, and small-scale production capabilities. Some operate informal research-and-development cells. Successful adaptations spread laterally through personal networks, messaging platforms, and volunteer communities rather than through centralized bureaucratic channels.

But Zagorodnyuk points out a key reason why the important lessons emerging from the wars in Ukraine and Iran are unlikely to be learned in many Western countries, including the US:

legal, contractual, and technical restrictions often prevent units from modifying or repairing their own equipment. In the United States, for example, defense contractors frequently retain control over maintenance data, software, and diagnostics, limiting what military personnel can do independently. The debate around the “right to repair” reflects this tension. While intended to protect intellectual property and safety standards, such restrictions can slow adaptation cycles and reduce operational flexibility—precisely the opposite of what high-intensity, technology-driven warfare now demands.

In other words, today’s obsession with protecting intellectual monopolies above all else could one day prove a major obstacle to fighting and winning future wars.

Follow me @glynmoody on Mastodon and on Bluesky

Posted on Techdirt - 29 April 2026 @ 03:43pm

Leading Cancer Charity Stops Funding Open Access Publishing Because It’s Just Not Working

As numerous posts on this blog have emphasised, the underlying idea of open access (OA) – allowing anyone to read and share published academic research for free – is great in principle, but in practice has failed in important ways. That’s because traditional academic publishers have subverted the open access model to such an extent that the costs for research institutions of publishing in OA journals have barely changed at all. And yet one of the other key aims of open access was to save money while widening availability. Against that background, a natural question to ask is: if open access has failed to deliver savings, why bother supporting it? Cancer Research UK, the world’s leading cancer charity, has evidently asked itself that question and come up with an answer, which it explains in a post entitled “Why we won’t be funding open access publishing any more”:

We need efficient scholarly communications to spread scientific ideas via a fair economic model. We currently don’t have that. The open access movement was bold and promising, but ultimately disappointing. Now is the time to stop and call for a new way to make publishing work…

Ceasing to fund open access in the way we currently do will save us £5.2m of donors’ money over the next three years. That’s a substantial amount which can be put towards cancer research.

The post by Dan Burkwood, Director of Research Operations and Communications at Cancer Research UK, explains what exactly the problem is:

We currently fund open access publishing for our researchers in a number of ways. Despite hopes that this would enable a flourishing of open access dissemination of science, most of the growth has occurred in hybrid journals. These are publications that combine OA articles with those behind a paywall – this means the publishers will still charge for university and institute libraries to access them, even though researchers have paid for their work to be published. For us, this means we currently use donated money to fund our researchers, institutes and centres to publish OA research articles, yet they still have to pay to access the majority of journals in which those articles appear. The publishers are – so to speak – having their cake whilst also eating it.

These so-called “hybrid models” are discussed at length in Chapter 3 of Walled Culture the book (free digital versions available). They were presented as a transitional approach towards journals that were fully open access, but in many cases that transition hasn’t happened, not least because the hybrid model is so profitable for publishers, who therefore have little incentive to move to fully open access titles. Burkwood rightly points to a key reason why academic publishers continue to wield such power: the academic world’s insistence on using published articles in prestigious titles as a metric of success.

Cancer Research UK are working to widen the way we evaluate research in order to mitigate the heavy focus on publication outputs. It’s clear to us that a broader view of an applicant’s career is vital to gauge potential success. By signing up to DORA (San Francisco Declaration on Research Assessment), we encourage our reviewers to assess the quality and impact of research through means other than just journal impact factor. Additionally, we invite applicants to submit a narrative CV, allowing a more holistic view of their track record, research outputs and career progression.

But as he acknowledges, “Despite our, and others, attempts to limit the emphasis of the ‘publish-or-perish’ mindset, it will take time for the culture to change.” In the meantime, he suggests:

If researchers have no access to publishing funds they can publish their work for open access at no cost, but the publication will sit behind a paywall for 6 months (under embargo) before being deposited on Europe PMC open access – this is known as green open access.

Green open access provides full and free access to papers, but only after an embargo period, typically six months, but sometimes longer (gold open access provides instant access, but requires payment by researchers’ institutions.) That makes green OA a poor substitute for real, immediate open access.

The problem here is that such embargo periods have long been accepted as the norm, but that is only because a terrible blunder was made over two decades ago by the Research Councils UK (RCUK). In 2005, the RCUK stipulated that the work it funded would require open access publication. However, when the final version of the RCUK’s policy appeared in June 2006, it had a significant flaw, expressed in the following provision: ‘Full implementation of these requirements must be undertaken such that current copyright and licensing policies, for example embargo periods or provisions limiting the use of deposited content to non-commercial purposes, are respected by authors.’ As the leading open access scholar Peter Suber wrote at the time, this was a completely unnecessary concession:

Researchers sign funding contracts with the research councils long before they sign copyright transfer agreements with publishers. Funders have a right to dictate terms, such as mandated open access, precisely because they are upstream from publishers. If one condition of the funding contract is that the grantee will deposit the peer-reviewed version of any resulting publication in an open-access repository [immediately], then publishers have no right to intervene.

At the root of the issue of embargoes lies copyright. If researchers retained full control of the copyright of their articles, rather than assigning it to publishers, they could prevent any embargoes being applied to them.

Cancer Research UK’s decision is regrettable but understandable. The fear has to be that others will follow suit. While the hybrid model is not universal, it is widespread enough to undermine the open access idea. Until researchers refuse to publish in such hybrid titles, publishers will continue to profit from them. Given the unnecessary embargoes imposed on articles released under green open access, that leaves alternatives such as diamond open access, where there are no charges for anyone, an approach that has long been espoused on this blog.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Posted on Techdirt - 27 April 2026 @ 03:19pm

The Risks Of Anonymity In The Age Of Generative AI

As its name suggests, generative AI is designed to generate material in response to prompts by drawing on its probabilistic database built up through analyzing huge quantities of training input. But it can draw on those patterns to analyze other files, and that’s also a widely used application. Writing in The Argument, Kelsey Piper encountered an interesting variant of that approach:

Recently, Anthropic released a new version of Claude, Opus 4.7. I did what I usually do when a new AI model is released by Google, OpenAI, or Anthropic and ran a bunch of tests on it to see what it can do. One of those tests is to paste in some text from unpublished drafts of mine and ask it to guess the author.

From only the above text [not shown here], 125 words, Claude Opus 4.7 informed me that the likeliest author is Kelsey Piper. This is an Opus 4.7-specific power; ChatGPT guessed Yglesias, and Gemini guessed Scott Alexander. I did not have memory enabled, nor did I have information about me associated with my account; I did these tests in Incognito Mode.

As Piper admits:

this is far from an impossible feat of style identification — a lot of my writing is public on the internet, and this is clearly the start of a political column, narrowing the possible authors down dramatically.

She went on to input less obvious material. For example, an “unpublished draft of a school progress report in a completely different register”:

“Kelsey Piper,” said Claude. (ChatGPT guessed Freddie deBoer. Gemini guessed Duncan Sabien.)

An unpublished fantasy novel produced a similar result, although:

in that case it took more like 500 words for Claude to inform me that it’s the work of Kelsey Piper (whereas ChatGPT flattered me by guessing that I’m real fantasy novelist K.J. Parker).

And finally, “a college application essay I wrote 15 years ago, when my prose style was vastly worse and frankly embarrassing to reread”:

“Kelsey Piper,” said Claude, and in this case, also ChatGPT.

Piper comments:

Right now, today’s AI tools probably can be used to deanonymize any writer who has a large public corpus of writing under their real name and also writes anonymously, unless they have been extremely careful, for years, to make sure that nothing written under their secondary account has the stylistic fingerprints of their primary one. Many academics and industry researchers, for instance, have reported being identified from a draft or in the middle of a chat.

And she concludes:

Whatever goods anonymity ever offered us, we will have to do without them. I don’t want the anonymous posters to all go away and for everyone to frantically delete all their old internet presence before it surfaces, but more than anything, I don’t want them to be surprised.

Those links to other cases of unpublished material being recognized by AI show that Piper’s experience was not a one-off, although the results remain in the realm of anecdata. But even if imperfect, the ability of generative AI to carry out this kind of analysis quickly and often accurately represents an important new option for the well-established field of stylometry. Wikipedia explains:

Stylometry may be used to unmask pseudonymous or anonymous authors, or to reveal some information about the author short of a full identification. Authors may use adversarial stylometry to resist this identification by eliminating their own stylistic characteristics without changing the meaningful content of their communications. It can defeat analyses that do not account for its possibility, but the ultimate effectiveness of stylometry in an adversarial environment is uncertain: stylometric identification may not be reliable, but nor can non-identification be guaranteed; adversarial stylometry’s practice itself may be detectable.

The limitations of stylometry were demonstrated in John Carreyrou’s attempt to reveal the true identity of Bitcoin’s pseudonymous creator, Satoshi Nakamoto, published in The New York Times a few weeks ago. Carreyrou concluded that various real-world coincidences plus linguistic evidence indicated that Bitcoin was created by the 55-year-old British computer scientist Adam Back, something Back denies. Carreyrou’s attempts to use computerized stylometry (not the AI services Piper drew on) were unsatisfactory, and he eventually adopted a more hands-on approach to text analysis, which involved looking at Satoshi’s vocabulary, grammatical hyphenation mistakes and the use of British spellings.

Despite Carreyrou’s lack of success, stylometric analysis by generative AI is likely to become more common in many disciplines for the simple reason it is so quick, easy and cheap to carry out. Even if its results are unreliable, people may find it useful as a stimulus for further investigations. And as we know, the fact that generative AI systems can churn out nonsense hasn’t stopped hundreds of millions of people from using and trusting them anyway.

Follow me @glynmoody on Mastodon and on Bluesky.

Posted on Techdirt - 3 April 2026 @ 01:08pm

Can Agentic AI Coding Tools Finally End Copyright For Software While Re-Inventing Open Source?

Most of the discussions about the impact of the latest generative AI systems on copyright have centered on text, images and video. That’s no surprise, since writers, artists and film-makers feel very strongly about their creations, and members of the public can relate easily to the issues that AI raises for this kind of creativity. But there’s another creative domain that has been massively affected by genAI: software engineering. More and more professional coders are using generative AI to write major elements of their projects for them. Some top engineers even claim that they have stopped coding completely, and now act more as a manager for the AI generation of code, because the available tools are now so powerful. This applies in the world of open source software too. But a recent incident shows that it raises some interesting copyright issues there that are likely to affect the entire software world.

It concerns a project called chardet, “a universal character encoding detector for Python. It analyzes byte strings and returns the detected encoding, confidence score, and language.” A long and detailed post on Ars Technica explains what has happened recently:

The [chardet] repository was originally written by coder Mark Pilgrim in 2006 and released under an LGPL license that placed strict limits on how it could be reused and redistributed.

Dan Blanchard took over maintenance of the repository in 2012 but waded into some controversy with the release of version 7.0 of chardet last week. Blanchard described that overhaul as “a ground-up, MIT-licensed rewrite” of the entire library built with the help of Claude Code to be “much faster and more accurate” than what came before.

Licensing lies at the heart of open source. When Richard Stallman invented the concept of free software, he did so using a new kind of software license, the GPL. This allows anyone to use and modify software released under the GPL, provided they release their own code under the same license. As the above description makes clear, chardet was originally released under the LGPL – one of the GPL variants – but version 7.0 is licensed under the much more permissive MIT license. According to Ars Technica:

Blanchard says he was able to accomplish this “AI clean room” process by first specifying an architecture in a design document and writing out some requirements to Claude Code. After that, Blanchard “started in an empty repository with no access to the old source tree and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code.”

That is, generative AI would appear to allow open source licenses like the GPL to be circumvented by rewriting the code without copying anything directly from the original. That’s possible because AI is now so good at coding that the results can be better than the original, as Blanchard proved with version 7.0 of chardet. And because it is new code, it can be released under any license. In fact, it is quite possible that code produced by genAI is not covered by copyright at all, for the same reason that artistic output created solely by AI can’t be copyrighted. If the license can be changed or simply cancelled in this way, then there is no way to force people to release their own variants only under the GPL, as Stallman intended. Similarly, the incentive for people to contribute their own improvements to the main version is diminished.

The ramifications extend even further. These kind of “AI clean room” implementations could be used to make new versions of any proprietary software. That’s been possible for decades – Stallman’s 1983 GNU project is itself a clean-room version of Unix – but generally requires many skilled coders working for long periods to achieve. The arrival of highly-capable genAI coding tools has brought down the cost by many orders of magnitude, which means it is relatively inexpensive and quick to produce new versions of any software.

In effect, generative AI coding systems make copyright irrelevant for software, both open source and proprietary. That’s because what is important about computer code is not the details of how it is written, but what it does. AI systems can be guided to create drop-in replacements for other software that are functionally identical, but with completely different code underneath.

Companies that license their proprietary software will probably still be able to do so by offering support packages plus the promise that they take legal responsibility for their code in a way that AI-generated alternatives don’t: businesses would pay for a promise of reliability plus the ability to sue someone when things go wrong. But for the open source world these are not relevant. As a result, the latest progress in AI coding seems a serious threat to the underlying development model that has worked well for the last 40 years, and which underpins most software in use today. But a wise post by Salvatore “antirez” Sanfilippo sees opportunities too:

AI can unlock a lot of good things in the field of open source software. Many passionate individuals write open source because they hate their day job, and want to make something they love, or they write open source because they want to be part of something bigger than economic interests. A lot of open source software is either written in the free time, or with severe constraints on the amount of people that are allocated for the project, or – even worse – with limiting conditions imposed by the companies paying for the developments. Now that code is every day less important than ideas, open source can be strongly accelerated by AI. The four hours allocated over the weekend will bring 10x the fruits, in the right hands (AI coding is not for everybody, as good coding and design is not for everybody).

Perhaps a new kind of open source will emerge – Open Source 2.0 – one in which people do not contribute their software patches to a project, as they do today, but instead send their prompts that produce better versions. People might start working directly on the prompts, collaborating on ways to fine tune them. It’s open source hacking but functioning at a level above the code itself.

One possibility is that such an approach could go some way to solving the so-called “Nebraska problem”: the fact that key parts of modern digital infrastructure are underpinned up by “a project some random person in Nebraska has been thanklessly maintaining since 2003”. That person may not receive many more thanks than they have in the past, but with AI assistants constantly checking, rewriting and improving the code, at least the selfless dedication to their project becomes a little less onerous, and thus a little less likely to lead to programmer burn out.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to Walled Culture.

Posted on Techdirt - 1 April 2026 @ 11:04am

Copyright Industry Continues Its Efforts To Ban VPNs

Last month Walled Culture wrote about an important case at the Court of Justice of the European Union, (CJEU), the EU’s top court, that could determine how VPNs can be used in that region. Clarification in this area is particularly important because VPNs are currently under attack in various ways. For example, last year, the Danish government published draft legislation that many believed would make it illegal to use a VPN to access geoblocked streaming content or bypass restrictions on illegal websites. In the wake of a firestorm of criticism, Denmark’s Minister of Culture assured people that VPNs would not be banned. However, even though references to VPNs were removed from the text, the provisions are so broadly drafted that VPNs may well be affected anyway. Companies too are taking aim at VPNs. Leading the charge are those in France, which have been targeting VPN providers for over a year now. As TorrentFreak reported last February:

Canal+ and the football league LFP have requested court orders to compel NordVPN, ExpressVPN, ProtonVPN, and others to block access to pirate sites and services. The move follows similar orders obtained last year against DNS resolvers.

The VPN Trust Initiative (VTI) responded with a press release opposing what it called a “Misguided Legal Effort to Extend Website Blocking to VPNs”. It warned:

Such blocking can have sweeping consequences that might put the security and privacy of French citizens at risk.

Targeting VPNs opens the door to a dangerous censorship precedent, risking overreach into broader areas of content.

Indeed: if VPN blocks become an option, there will inevitably be more calls to use them for a wider range of material. The VTI also noted that some of its members are considering whether to abandon the French market completely. That could mean people start using less reliable VPN providers, some of which have dubious records when it comes to security and privacy. The incentive for VPNs to pull out of France is increasing. In August last year the Paris Judicial Court ordered top VPN service providers to block more sports streaming domains, and at the beginning of this year, yet more blocking orders were issued to VPNs operating in France. To its credit, one of the VPN providers affected, ProtonVPN, fought back. As reported here by TorrentFreak, the company tried multiple angles:

The VPN provider raised jurisdictional questions and also requested to see evidence that Canal+ owned all the rights at play. However, these concerns didn’t convince the court.

The same applies to Proton’s net neutrality defense, which argued that Article 333-10 of the French sports code, which is at the basis of all blocking orders, violates EU Open Internet Regulation. This defense was too vague, the court concluded, noting that Proton cited the regulation without specifying which provisions were actually breached.

ProtonVPN also argued that forcing a Swiss company to block sites for the French market is a restriction of cross-border trade in services, and that in any case, the blocking measures were “technically unrealizable, costly, and unnecessarily complex.” Despite this valiant defense, the court was unimpressed. At least ProtonVPN was allowed to contest the French court’s ruling. In a similar case in Spain, no such option was given. According to TorrentFreak:

The court orders were issued inaudita parte, which is Latin for “without hearing the other side.” Citing urgency, the Córdoba court did not give NordVPN and ProtonVPN the opportunity to contest the measures before they were granted.

Without a defense, the court reportedly concluded that both NordVPN and ProtonVPN actively advertise their ability to bypass geo-restrictions, citing match schedules in their marketing materials. The VPNs are therefore seen as active participants in the piracy chain rather than passive conduits, according to local media reports.

That’s pretty shocking, and shows once more how biased in favor of the copyright industry the law has become in some jurisdictions: other parties aren’t even allowed to present a defense. It’s a further reason why a definitive ruling from the CJEU on the right of people to use VPNs how they wish is so important.

Alongside these recent court cases, there is also another imminent attack on the use of VPNs, albeit in a slight different way. The UK government has announced wide-ranging plans that aim to “keep children safe online”. One of the ideas the government is proposing is “to age restrict or limit children’s VPN use where it undermines safety protections and changing the age of digital consent.” Although this is presented as a child protection measure, the effects will be much wider. The only way to bring in age restrictions for children is if all adult users of VPNs verify their own age. This inevitably leads to the creation of huge new online databases of personal information that are vulnerable to attack. As a side effect, the UK government’s misguided plans will also bolster the growing attempts by the copyright industry to demonize VPNs – a core element of the Internet’s plumbing – as unnecessary tools that are only used to break the law.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published on WalledCulture.

Posted on Techdirt - 24 March 2026 @ 03:37pm

An Open Training Set For AI Goes Global

As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they can off the Internet. Quite how that will work out in legal terms is not yet clear. Although a few court cases involving the use of copyright material for training have been decided, many have not, and the detailed contours of the legal landscape remain uncertain.

However, there is an alternative to this “grab it all” approach. It involves using materials that are either in the public domain or released under a “permissive” license that allows LLMs to be trained on them without any problems. There’s plenty of such material online, but its scattered nature puts it at a serious disadvantage compared to downloading everything without worrying about licensing issues. To address that, the Common Corpus was created and released just over a year ago by the French startup Pleias. A press release from the AI Alliance explains the key characteristics of the Common Corpus:

Truly Open: contains only data that is permissively licensed and provenance is documented

Multilingual: mostly representing English and French data, but contains at least 1[billion] tokens for over 30 languages

Diverse: consisting of scientific articles, government and legal documents, code, and cultural heritage data, including books and newspapers

Extensively Curated: spelling and formatting has been corrected from digitized texts, harmful and toxic content has been removed, and content with low educational content has also been removed.

There are five main categories of material: OpenGovernment, OpenCulture, OpenScience, OpenWeb, and OpenSource:

OpenGovernment contains Finance Commons, a dataset of financial documents from a range of governmental and regulatory bodies. Finance Commons is a multimodal dataset, including both text and PDF corpora. OpenGovernment also contains Legal Commons, a dataset of legal and administrative texts. OpenCulture contains cultural heritage data like books and newspapers. Many of these texts come from the 18th and 19th centuries, or even earlier.

OpenScience data primarily comes from publicly available academic and scientific publications, which are most often released as PDFs. OpenWeb contains datasets from YouTube Commons, a dataset of transcripts from public domain YouTube videos, and websites like Stack Exchange. Finally, OpenSource comprises code collected from GitHub repositories which were permissibly licensed.

The initial release contained over 2 trillion tokens – the usual way of measuring the volume of training material, where tokens can be whole words and parts of words. A significant recent update of the corpus has taken that to over 2.267 trillion tokens. Just as important as the greater size, is the wider reach: there are major additions of material from China, Japan, Korea, Brazil, India, Africa and South-East Asia. Specifically, the latest release contains data for eight languages with more than 10 billion tokens (English, French, German, Spanish, Italian, Polish, Greek, Latin) and 33 languages with more than 1 billion tokens. Because of the way the dataset has been selected and curated, it is possible to train LLMs on fully open data, which leads to auditable models. Moreover, as the original press release explains:

By providing clear provenance and using permissibly licensed data, Common Corpus exceeds the requirements of even the strictest regulations on AI training data, such as the EU AI Act. Pleias has also taken extensive steps to ensure GDPR compliance, by developing custom procedures to enable personally identifiable information (PII) removal for multilingual data. This makes Common Corpus an ideal foundation for secure, enterprise-grade models. Models trained on Common Corpus will be resilient to an increasingly regulated industry.

Another advantage for many users is that material with high “toxicity scores” has already been removed, thus ensuring that any LLMs trained on the Common Corpus will have fewer problems in this regard.

The Common Corpus is a great demonstration of the power of openness and permissive copyright licensing, and how they bring benefits that other approaches can’t match. For example: “Common Corpus makes it possible to train models compatible with the Open Source Initiative’s definition of open-source AI, which includes openness of use, meaning use is permitted for ‘any purpose and without having to ask for permission’. ” That fact, along with the multilingual nature of the Common Corpus, would make the latest version a great fit for any EU move to create “public AI” systems, something advocated on this blog a few months back. The French government is already backing the project, as are other organizations supporting openness:

The Corpus was built up with the support and concerted efforts of the AI Alliance, the French Ministry of Culture as part of the prefiguration of the service offering of the Alliance for Language technologies EDIC (ALT-EDIC).

This dataset was also made in partnership with Wikimedia Enterprise and Wikidata/Wikimedia Germany. We’re also thankful to our partner Libraries Without Borders for continuous assistance on extending low resource language support.

The corpus was stored and processed with the generous support of the AI Alliance, Jean Zay (Eviden, Idris), Tracto AI, Mozilla.

The unique advantages of the Common Corpus mean that more governments should be supporting it as an alternative to proprietary systems, which generally remain black boxes in terms of where their training data comes from. Publishers too would also be wise to fund it, since it offers a powerful resource explicitly designed to avoid some of the thorniest copyright issues plaguing the generative AI field today.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to Walled Culture.

Posted on Techdirt - 13 March 2026 @ 01:07pm

Roblox Rolls Out AI-Powered Real-Time Rephrasing Of Profanity Within Chat

The power of the latest generation of AI systems is such that previously impractical applications are not just possible, but scalable. For example, moving beyond basic early AI text translation tools, it is now possible to use live translation to communicate in another language in real time. For many people that will be a real boon, especially when they are traveling. But here’s something that is likely to prove more controversial: real-time rephrasing of profanity within chat. It’s a new AI-powered feature from Roblox that is designed to “keep gameplay fluid while maintaining civility within chat”:

Roblox is leveraging AI to automatically rephrase profanity. Rather than displaying only hashmarks, filtered text will be translated into more respectful language that remains closer to the user’s original intent. For example, a message that violates Roblox’s profanity policies, such as “Hurry TF up!” would previously have appeared as “####” within experience chat. That will now be rephrased to “Hurry up!” This new layer is designed to maintain civility by rephrasing the language and replacing “stop signs” with real-time guidance.

Specifically:

When a message violates Roblox’s profanity policy, everyone in the chat is notified that the text has been rephrased to keep things civil. While rephrasing reduces some of the disruption in the chat, Roblox’s multilayered safety system remains in effect for more serious behavior. Rephrasing is available exclusively for in-experience chat between age-checked users in similar age groups and is supported in all languages currently available through Roblox’s automatic translation tools.

Alongside this new AI-based capability, Roblox is also tweaking its text filtering system:

Early results from Roblox’s testing show significant improvements in detecting leet-speak, or letters replaced with numbers or symbols, and more sophisticated attempts to bypass filters.

Parents may applaud real-time rephrasing as a way for the service to nudge younger users away from bad language in their interaction with others, without stopping them playing altogether. But it creates a dangerous proof of concept that others may build on, particularly in jurisdictions that want stricter controls on what people say online.

It’s easy to imagine situations where Chinese AI systems, for example, rephrase people’s language on social media in real time to promote “social harmony”. Not only the style but even the content’s details could be subtly changed away from controversy towards conformity. It would be possible for rephrasing to be visible only to others, so the person making a comment might not even be aware that their words were being subverted in this way. Something similar is already happening with Chinese AI chatbots that censor their own answers, without acknowledging that fact. As Chinese AI companies become increasingly important players in the online world, this kind of covert rephrasing by them — and others — is another issue people will need to watch out for in our brave new AI world.

Follow me @glynmoody on Bluesky and on Mastodon.

Posted on Techdirt - 23 February 2026 @ 03:08pm

How Copyright Litigation Over Anne Frank’s Diary Could Impact The Fate Of VPNs In The EU

“The Diary of a Young Girl” is a Dutch language diary written by the young Jewish writer Anne Frank while she was in hiding for two years with her family during the Nazi occupation of the Netherlands. Although the diary and Anne Frank’s death in the Bergen-Belsen concentration camp are well known, few are aware that the text has a complicated copyright history – one that could have important implications for the legal status and use of Virtual Private Networks (VPNs) in the EU. TorrentFreak explains the copyright background:

These copyrights are controlled by the Swiss-based Anne Frank Fonds, which was the sole heir of Anne’s father, Otto Frank. The Fonds states that many print versions of the diary remain protected for decades, and even the manuscripts are not freely available everywhere.

In the Netherlands, for example, certain sections of the manuscripts remain protected by copyright until 2037, even though they have entered the public domain in neighboring countries like Belgium.

A separate foundation, the Netherlands-based Anne Frank Stichting, wanted to publish a scholarly edition of Anne Frank’s writing, at least in those parts of the world where her diary was in the public domain:

To navigate these conflicting laws, the Dutch Anne Frank Stichting published a scholarly edition online using “state-of-the-art” geo-blocking to prevent Dutch residents from accessing the site. Visitors from the Netherlands and other countries where the work is protected are met with a clear message, informing them about these access restrictions.

However, the Anne Frank Fonds was unhappy with this approach, and took legal action. Its argument was that such geo-blocking could be circumvented with VPNs, and so its copyrights in the Netherlands could be infringed upon by those using VPNs. The lower courts in the Netherlands dismissed this argument, and the case is now before the Dutch Supreme Court. Beyond the specifics of the Anne Frank scholarly edition, there are important issues regarding the use of VPNs to get around geo-blocking. Because of the potential knock-on effect the ruling in this case will have on EU law, the Dutch Supreme Court has asked for guidance from the EU’s top court, the Court of Justice of the European Union (CJEU).

The CJEU has yet to rule on the issues raised. But one of the court’s advisors, Advocate General Rantos, has published a preliminary opinion, as is normal in such cases. Although that advice is not binding on the CJEU, it often provides some indication as to how the court may eventually decide. On the main issue of whether the ability of people to circumvent geo-blocking is a problem, Rantos writes:

the fact that users manage to circumvent a geo-blocking measure put in place to restrict access to a protected work does not, in itself, mean that the entity that put the geo-blocking in place communicates that work to the public in a territory where access to it is supposed to be blocked. Such an interpretation would make it impossible to manage copyright on the internet on a territorial basis and would mean that any communication to the public on the internet would be global.

Moreover:

As the [European] Commission pointed out in its written observations, the holder of an exclusive right in a work does not have the right to authorise or prohibit, on the basis of the right granted to it in one Member State, communication to the public in another Member State in which that right has ceased to have effect.

Or, more succinctly: “service providers in the public domain country cannot be subject to unreasonable requirements”. That’s a good, common-sense view. But perhaps just as important is the following comment by Rantos regarding the use of VPNs to circumvent geo-blocking:

as the Commission points out in its observations, VPN services are legally accessible technical services which users may, however, use for unlawful purposes. The mere fact that those or similar services may be used for such purposes is not sufficient to establish that the service providers themselves communicate the protected work to the public. It would be different if those service providers actively encouraged the unlawful use of their services.

That’s an important point at a time when VPNs are under attack from some governments because of concerns about possible copyright infringement by those using them.

The hope has to be that the CJEU will agree with its Advocate General’s sensible and fair analysis, and will rule accordingly. But there is another important aspect to this story. The basic issue is that the Anne Frank Stichting wants to make its scholarly edition of Anne Frank’s diary available as widely as possible. That seems a laudable aim, since it will increase understanding and appreciation of the young woman’s remarkable diary by publishing an academically rigorous version. And yet the Anne Frank Fonds has taken legal action to stop that move, on the grounds that it would represent an infringement of its intellectual monopoly in some parts of Frank’s work, in some parts of the world. The current dispute is another clear example of how copyright has become for some an end in itself, more important than the things that it is supposed to promote.

Follow me @glynmoody on Mastodon and on Bluesky. Republished from Walled Culture.

More posts from Glyn Moody >>