Glyn Moody's Techdirt Profile

Glyn Moody

About Glyn Moody

Posted on Techdirt - 13 March 2026 @ 01:07pm

Roblox Rolls Out AI-Powered Real-Time Rephrasing Of Profanity Within Chat

The power of the latest generation of AI systems is such that previously impractical applications are not just possible, but scalable. For example, moving beyond basic early AI text translation tools, it is now possible to use live translation to communicate in another language in real time. For many people that will be a real boon, especially when they are traveling. But here’s something that is likely to prove more controversial: real-time rephrasing of profanity within chat. It’s a new AI-powered feature from Roblox that is designed to “keep gameplay fluid while maintaining civility within chat”:

Roblox is leveraging AI to automatically rephrase profanity. Rather than displaying only hashmarks, filtered text will be translated into more respectful language that remains closer to the user’s original intent. For example, a message that violates Roblox’s profanity policies, such as “Hurry TF up!” would previously have appeared as “####” within experience chat. That will now be rephrased to “Hurry up!” This new layer is designed to maintain civility by rephrasing the language and replacing “stop signs” with real-time guidance.

Specifically:

When a message violates Roblox’s profanity policy, everyone in the chat is notified that the text has been rephrased to keep things civil. While rephrasing reduces some of the disruption in the chat, Roblox’s multilayered safety system remains in effect for more serious behavior. Rephrasing is available exclusively for in-experience chat between age-checked users in similar age groups and is supported in all languages currently available through Roblox’s automatic translation tools.

Alongside this new AI-based capability, Roblox is also tweaking its text filtering system:

Early results from Roblox’s testing show significant improvements in detecting leet-speak, or letters replaced with numbers or symbols, and more sophisticated attempts to bypass filters.

Parents may applaud real-time rephrasing as a way for the service to nudge younger users away from bad language in their interaction with others, without stopping them playing altogether. But it creates a dangerous proof of concept that others may build on, particularly in jurisdictions that want stricter controls on what people say online.

It’s easy to imagine situations where Chinese AI systems, for example, rephrase people’s language on social media in real time to promote “social harmony”. Not only the style but even the content’s details could be subtly changed away from controversy towards conformity. It would be possible for rephrasing to be visible only to others, so the person making a comment might not even be aware that their words were being subverted in this way. Something similar is already happening with Chinese AI chatbots that censor their own answers, without acknowledging that fact. As Chinese AI companies become increasingly important players in the online world, this kind of covert rephrasing by them — and others — is another issue people will need to watch out for in our brave new AI world.

Follow me @glynmoody on Bluesky and on Mastodon.

Posted on Techdirt - 23 February 2026 @ 03:08pm

How Copyright Litigation Over Anne Frank’s Diary Could Impact The Fate Of VPNs In The EU

“The Diary of a Young Girl” is a Dutch language diary written by the young Jewish writer Anne Frank while she was in hiding for two years with her family during the Nazi occupation of the Netherlands. Although the diary and Anne Frank’s death in the Bergen-Belsen concentration camp are well known, few are aware that the text has a complicated copyright history – one that could have important implications for the legal status and use of Virtual Private Networks (VPNs) in the EU. TorrentFreak explains the copyright background:

These copyrights are controlled by the Swiss-based Anne Frank Fonds, which was the sole heir of Anne’s father, Otto Frank. The Fonds states that many print versions of the diary remain protected for decades, and even the manuscripts are not freely available everywhere.

In the Netherlands, for example, certain sections of the manuscripts remain protected by copyright until 2037, even though they have entered the public domain in neighboring countries like Belgium.

A separate foundation, the Netherlands-based Anne Frank Stichting, wanted to publish a scholarly edition of Anne Frank’s writing, at least in those parts of the world where her diary was in the public domain:

To navigate these conflicting laws, the Dutch Anne Frank Stichting published a scholarly edition online using “state-of-the-art” geo-blocking to prevent Dutch residents from accessing the site. Visitors from the Netherlands and other countries where the work is protected are met with a clear message, informing them about these access restrictions.

However, the Anne Frank Fonds was unhappy with this approach, and took legal action. Its argument was that such geo-blocking could be circumvented with VPNs, and so its copyrights in the Netherlands could be infringed upon by those using VPNs. The lower courts in the Netherlands dismissed this argument, and the case is now before the Dutch Supreme Court. Beyond the specifics of the Anne Frank scholarly edition, there are important issues regarding the use of VPNs to get around geo-blocking. Because of the potential knock-on effect the ruling in this case will have on EU law, the Dutch Supreme Court has asked for guidance from the EU’s top court, the Court of Justice of the European Union (CJEU).

The CJEU has yet to rule on the issues raised. But one of the court’s advisors, Advocate General Rantos, has published a preliminary opinion, as is normal in such cases. Although that advice is not binding on the CJEU, it often provides some indication as to how the court may eventually decide. On the main issue of whether the ability of people to circumvent geo-blocking is a problem, Rantos writes:

the fact that users manage to circumvent a geo-blocking measure put in place to restrict access to a protected work does not, in itself, mean that the entity that put the geo-blocking in place communicates that work to the public in a territory where access to it is supposed to be blocked. Such an interpretation would make it impossible to manage copyright on the internet on a territorial basis and would mean that any communication to the public on the internet would be global.

Moreover:

As the [European] Commission pointed out in its written observations, the holder of an exclusive right in a work does not have the right to authorise or prohibit, on the basis of the right granted to it in one Member State, communication to the public in another Member State in which that right has ceased to have effect.

Or, more succinctly: “service providers in the public domain country cannot be subject to unreasonable requirements”. That’s a good, common-sense view. But perhaps just as important is the following comment by Rantos regarding the use of VPNs to circumvent geo-blocking:

as the Commission points out in its observations, VPN services are legally accessible technical services which users may, however, use for unlawful purposes. The mere fact that those or similar services may be used for such purposes is not sufficient to establish that the service providers themselves communicate the protected work to the public. It would be different if those service providers actively encouraged the unlawful use of their services.

That’s an important point at a time when VPNs are under attack from some governments because of concerns about possible copyright infringement by those using them.

The hope has to be that the CJEU will agree with its Advocate General’s sensible and fair analysis, and will rule accordingly. But there is another important aspect to this story. The basic issue is that the Anne Frank Stichting wants to make its scholarly edition of Anne Frank’s diary available as widely as possible. That seems a laudable aim, since it will increase understanding and appreciation of the young woman’s remarkable diary by publishing an academically rigorous version. And yet the Anne Frank Fonds has taken legal action to stop that move, on the grounds that it would represent an infringement of its intellectual monopoly in some parts of Frank’s work, in some parts of the world. The current dispute is another clear example of how copyright has become for some an end in itself, more important than the things that it is supposed to promote.

Follow me @glynmoody on Mastodon and on Bluesky. Republished from Walled Culture.

Posted on Techdirt - 19 February 2026 @ 01:30pm

Wikipedia Grapples With New Challenges From AI

Wikipedia celebrated its 25th birthday last month. Given the centrality of Wikipedia to so much activity online, it is hard to remember (or to imagine, for those who are younger) a time without Wikipedia. The latest statistics are impressive:

  • Wikipedia is viewed nearly 15 billion times every month.
  • Wikipedia contains over 65 million articles across more than 300 languages.
  • Wikipedia is edited by nearly 250,000 editors every month around the world. Editors are defined by one edit or more every month; only editors with a username are counted.
  • Wikipedia is accessed by over 1.5 billion unique devices every month.

That’s testimony to the global nature of Wikipedia. But there’s something else, not mentioned there, that is of great relevance to this blog: the fact that every one of those 65 million articles is made available under a generous license – the Creative Commons Attribution-ShareAlike 4.0 license, to be precise. That means sharing and re-use are encouraged, in contrast to most material online, where copyright is fiercely enforced. Wikipedia is living proof that giving away things by relying on volunteers and donations – the “true fans” approach – works, and on a massive scale. Anil Dash puts it well in a post celebrating Wikipedia’s 25th anniversary:

Whenever I worry about where the Internet is headed, I remember that this example of the collective generosity and goodness of people still exists. There are so many folks just working away, every day, to make something good and valuable for strangers out there, simply from the goodness of their hearts. They have no way of ever knowing who they’ve helped. But they believe in the simple power of doing a little bit of good using some of the most basic technologies of the internet. Twenty-five years later, all of the evidence has shown that they really have changed the world.

However, Wikipedia is today facing perhaps its greatest challenge, which comes from the new generation of AI services. They are problematic for Wikipedia in two main ways. The first, ironically, is because it is widely recognized that Wikipedia’s holdings represent some of the highest-quality training materials available. In a post explaining why, “in the AI era, Wikipedia has never been more valuable”, the Wikimedia Foundation writes:

AI cannot exist without the human effort that goes into building open and nonprofit information sources like Wikipedia. That’s why Wikipedia is one of the highest-quality datasets in the world for training AI, and when AI developers try to omit it, the resulting answers are significantly less accurate, less diverse, and less verifiable.

That recognition is welcome, but comes at a price. It means that every AI company as a matter of course wants to download the entire Wikipedia corpus to be used for training its models. That has led to irresponsible behavior by some companies, when their scraping tools download pages from Wikipedia with no consideration for the resources they are using for free, or the collateral damage they are causing to other users in terms of slower responses.

Trying to stop companies drawing on this unique resource is futile; recognizing this, Wikimedia Foundation has come up with an alternative approach: Wikimedia Enterprise, “a first-of-its-kind commercial product designed for companies that reuse and source Wikipedia and Wikimedia projects at a high volume”. In 2022, its first customers were Google and the Internet Archive, and last month, Wikimedia Enterprise announced that Amazon, Meta, Microsoft, Mistral AI, and Perplexity have also signed. That’s important for a couple of reasons. It means that many of the biggest AI players will download Wikipedia articles more efficiently. It also means that the Wikipedia project will receive funding for its work.

This new money is crucial if Wikipedia is to remain a high quality resource. And that is precisely why every generative AI company that uses Wikipedia posts for training should – if only out of self-interest – pay to do so. What is happening here echoes something this blog suggested back in May 2024: that AI companies should pay artists to create new works, and give away the results, because fresh training material is vital. Helping to pay for Wikipedia to create more high-quality articles that are freely available to all is a variation on that theme.

The other problem that generative AI causes Wikipedia is more subtle. The Wikimedia Foundation explains that alongside financial support, the project needs proper attribution:

Attribution means that generative AI gives credit to the human contributions that it uses to create its outputs. This maintains a virtuous cycle that continues those human contributions that create the training data that these new technologies rely on. For people to trust information shared on the internet, platforms should make it clear where the information is sourced from and elevate opportunities to visit and participate in those sources. With fewer visits to Wikipedia, fewer volunteers may grow and enrich the content, and fewer individual donors may support this work.

Without fresh volunteers, Wikipedia will wither and become less valuable. That’s terrible for the world, but it is also bad for generative AI companies. So, again, it makes sense for them to provide proper attribution in their outputs. That requirement has become even more pressing in the light of a new development. According to tests carried out by the Guardian:

The latest model of ChatGPT has begun to cite Elon Musk’s Grokipedia as a source on a wide range of queries, including on Iranian conglomerates and Holocaust deniers, raising concerns about misinformation on the platform.

That’s potentially problematic because of how Grokipedia creates its entries. Research last year found that:

Grokipedia articles are substantially longer and contain significantly fewer references per word. Moreover, Grokipedia’s content divides into two distinct groups: one that remains semantically and stylistically aligned with Wikipedia, and another that diverges sharply. Among the dissimilar articles, we observe a systematic rightward shift in the political bias of cited news sources, concentrated primarily in entries related to politics, history, and religion. These findings suggest that AI-generated encyclopedic content diverges from established editorial norms-favouring narrative expansion over citation-based verification.

If leading chatbots starts drawing on Grokipedia routinely for their answers, it is less likely that there are independent sources where the information can be checked, something generally possible with Wikipedia. It therefore becomes even more urgent for generative AI systems to provide attribution, so at least users know where information is coming from, and whether there are likely to be further resources that confirm a chatbot’s claims. Not everyone will want to do that, but it is important to offer it as an option.

Wikipedia at 25 is an amazing achievement in multiple ways, one of which includes serving as a demonstration that material can be given away for free, supported directly by users, and on a global scale. It would be a tragedy if the current enthusiasm for generative AI systems led to that resource being harmed and even destroyed. A world without Wikipedia would be a poorer world indeed.

Follow me @glynmoody on Mastodon and on Bluesky. Republished from Walled Culture.

Posted on Techdirt - 4 February 2026 @ 01:32pm

OpenAI’s New Scientific Writing And Collaboration Workspace ‘Prism’ Raises Fears Of Vibe-Coded Academic AI Slop

It is no secret that large language models (LLMs) are being used routinely to modify and even write scientific papers. That’s not necessarily a bad thing: LLMs can help produce clearer texts with stronger logic, not least when researchers are writing in a language that is not their mother tongue. More generally, a recent analysis in Nature magazine, reported by Science magazine, found that scientists embracing AI — of any kind — “consistently make the biggest professional strides”:

AI adopters have published three times more papers, received five times more citations, and reach leadership roles faster than their AI-free peers.

But there is also a downside:

Not only is AI-driven work prone to circling the same crowded problems, but it also leads to a less interconnected scientific literature, with fewer studies engaging with and building on one another.

Another issue with LLMs, that of “hallucinated citations,” or “HalluCitations,” is well known. More seriously, entire fake publications can be generated using AI, and sold by so-called “paper mills” to unscrupulous scientists who wish to bolster their list of publications to help their career. In the field of biomedical research alone, a recent study estimated that over 100,000 fake papers were published in 2023. Not all of those were generated using AI, but progress in LLMs has made the process of creating fake articles much simpler.

Fake publications generated using LLMs are often obvious because of their lack of sophistication and polish. But a new service from OpenAI, called Prism, is likely to eliminate such easy-to-spot signs, by adding AI support to every aspect of writing a scientific paper:

Prism is a free workspace for scientific writing and collaboration, with GPT‑5.2⁠—our most advanced model for mathematical and scientific reasoning—integrated directly into the workflow.

It brings drafting, revision, collaboration, and preparation for publication into a single, cloud-based, LaTeX-native workspace. Rather than operating as a separate tool alongside the writing process, GPT‑5.2 works within the project itself—with access to the structure of the paper, equations, references, and surrounding context.

It includes a number of features that make creating complex — and fake — papers extremely easy:

  • Search for and incorporate relevant literature (for example, from arXiv) in the context of the current manuscript, and revise text in light of newly identified related work
  • Create, refactor, and reason over equations, citations, and figures, with AI that understands how those elements relate across the paper
  • Turn whiteboard equations or diagrams directly into LaTeX, saving hours of time manipulating graphics pixel-by-pixel

There is even voice-based editing, allowing simple changes to be made without the need to write anything. But scientists are already worried that the power of OpenAI’s Prism will make a deteriorating situation worse. As an article on Ars Technica explains:

[Prism] has drawn immediate skepticism from researchers who fear the tool will accelerate the already overwhelming flood of low-quality papers into scientific journals. The launch coincides with growing alarm among publishers about what many are calling “AI slop” in academic publishing.

One field that is already plagued by AI slop is AI itself. An FT article on the topic points to an interesting attempt by the International Conference on Learning Representations (ICLR), a major gathering of researchers in the world of machine learning, to tackle this problem with punitive measures against authors and reviewers who violate the ICLR’s policies on LLM-generated material. For example:

Papers that make extensive usage of LLMs and do not disclose this usage will be desk rejected [that is, without sending them out for external peer review]. Extensive and/or careless LLM usage often results in false claims, misrepresentations, or hallucinated content, including hallucinated references. As stated in our previous blog post: hallucinations of this kind would be considered a Code of Ethics violation on the part of the paper’s authors. We have been desk -rejecting, and will continue to desk -reject, any paper that includes such issues.

Similarly:

reviewers [of submitted papers] are responsible for the content they post. Therefore, if they use LLMs, they are responsible for any issues in their posted review. Very poor quality reviews that feature false claims, misrepresentations or hallucinated references are also a code of ethics violation as expressed in the previous blog post. As such, reviewers who posted such poor quality reviews will also face consequences, including the desk rejection of their [own] submitted papers.

It is clearly not possible to stop scientists from using AI tools to check and improve their papers, nor should this be necessary, provided authors flag up such usage, and no errors are introduced as a result. A policy of the kind adopted by the ICLR requiring transparency about the extent to which AI has been used seems a sensible approach in the face of increasingly sophisticated tools like OpenAI’s Prism.

Follow me @glynmoody on  on Bluesky and Mastodon.

Posted on Techdirt - 29 December 2025 @ 03:31pm

How Generative AI Is Enabling More Connections With True Fans

Walled Culture has written a number of times about the true fans approach – the idea that creators can be supported directly and effectively by the people who love their work. As Walled Culture the book explains (available as a free ebook), one of the earliest and best expositions of the concept came from Kevin Kelly, former Executive Editor at Wired magazine, in an essay he wrote originally in 2008. The true fans idea is sometimes dismissed as simply selling branded t-shirts to supporters. That may have been true decades ago, but things have moved on. For example, Universal Music Group has recently opened retail locations that cater specifically for true fans. In addition to shops in Tokyo and Madrid, there are new outlets in New York and London. Here’s what the latter will offer, as reported by Music Business Worldwide:

Located in Camden Market, the London-based space will “serve as a creative hub where music, fashion, and design collide,” UMG said.

The announcement added that the shop was “designed to capture Camden’s rebellious spirit and deep musical roots”.

The store will feature exclusive artist collections, immersive installations, and live performances, along with a Vinyl Lounge, DJ booth, and recording studio-inspired Sound Room that “allows fans to experience music like never before”.

That is a fairly conventional extension of the “selling branded t-shirts to supporters” idea. A post on the Midia Research blog points out a more radical development in the true fans space involving the latest generative AI technology:

AI is best considered as an accelerant rather than something entirely new, intensifying pre-existing trends. AI music absolutely fits this trend. Over the course of the last decade – including a super-charged COVID bump – accessible music tech has enabled ever-more people to become music creators. AI simply lowered the barriers to entry even further. The debate over whether a text prompt constitutes creativity will continue to run (just like the same debate still runs for sampling), but what is clear is that more people are now making music because of AI.

Thanks to genAI, true fans are not limited to a passive role. They can actively participate in the artistic ecosystem brought into being by their musical heroes, through the creation of new works based on and extending the originals they love. The fanfic world has been doing this for many years, so it is no surprise to find the use of generative AI there even more advanced there than in the world of music. For example, the DreamGen site lists no less than nine “AI fanfic generators”, including its own. It offers a good description of how these systems work:

1. You give it a prompt: This could be something like “Harry Potter and Hermione go on a space adventure” or “Naruto meets Spider-Man in New York.”

2. The AI takes over: It uses its knowledge of language and storytelling to write a story based on your idea. It fills in the details, such as dialogue, action, emotions,and plot twists.

3. You can guide it: Want more romance? More drama? A surprise ending? You can tweak the prompt or add instructions, and the AI will adjust the story.

4. You get a full fanfic: Some tools write it all at once, others let you build it paragraph by paragraph so you can shape the story as it goes.

As that indicates, the new AI-based fanfic generators are so easy to use, anyone can use them. The only limit is the imagination and the ability to put that into words. That’s an incredible democratization of creativity that takes the idea of participatory fandom to the next level. And, of course, it can be applied in other domains too, such as “fan art”, which Wikipedia defines as follows:

Fan art or fanart is artwork created by fans of a work of fiction or celebrity depicting events, character, or other aspect of the work. As fan labor, fan art refers to artworks that are not created, commissioned, nor endorsed by the creators of the work from which the fan art derives.

As with other uses of genAI, this raises questions of copyright, some of which have already found their way to court. Perhaps surprisingly, Disney has just announced its embrace of this use of AI by fans, in a partnership with OpenAI:

The Walt Disney Company and OpenAI have reached an agreement for Disney to become the first major content licensing partner on Sora, OpenAI’s short-form generative AI video platform, bringing these leaders in creativity and innovation together to unlock new possibilities in imaginative storytelling.

As part of this new, three-year licensing agreement, Sora will be able to generate short, user-prompted social videos that can be viewed and shared by fans, drawing from a set of more than 200 animated, masked and creature characters from Disney, Marvel, Pixar and Star Wars, including costumes, props, vehicles, and iconic environments. In addition, ChatGPT Images will be able to turn a few words by the user into fully generated images in seconds, drawing from the same intellectual property. The agreement does not include any talent likenesses or voices.

There’s a billion-dollar investment by Disney in OpenAI, as well as the following:

OpenAI and Disney will collaborate to utilize OpenAI’s models to power new experiences for Disney+ subscribers, furthering innovative and creative ways to connect with Disney’s stories and characters.

Presumably, Disney hopes to gain more Disney+ subscribers and drive more revenues with these short-form, fan-generated videos, plus whatever “creative ways” of using AI that it comes up with. OpenAI, meanwhile, gains some handy investment, and a showcase for its Sora genAI video platform.

Although this deal is a welcome sign that some major copyright companies are starting to think imaginatively and positively about genAI, and how it can actually boost profits, the new service will doubtless be rather limited, not least in terms of what kind of videos can generated. The press release emphasises:

OpenAI and Disney have affirmed a shared commitment to maintaining robust controls to prevent the generation of illegal or harmful content, to respect the rights of content owners in relation to the outputs of models, and to respect the rights of individuals to appropriately control the use of their voice and likeness.

That means that there will always be room for edgier, smaller sites producing fanfic, fan art and fan videos that don’t worry about things like good taste or copyright. As more fans discover the delights of building on and extending the creative ideas of their idols in novel ways using genAI, we can expect a corresponding rise in the number of legal actions trying to stop them doing so.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Posted on Techdirt - 23 December 2025 @ 08:01pm

40 Years Of Copyright Obstruction To Human Rights And Social Justice

One of the little-known but extremely telling episodes in the history of modern copyright, discussed in Walled Culture the book (free digital versions available), concerns the Marrakesh Treaty. A post on the Corporate Europe Observatory (CEO) site from 2017 has a good summary of what the treaty is about, and why it is important:

It sets out exceptions and limits to copyright rules so that people unable to use print media (including blind, visually impaired, and dyslexic people) can access a far greater range of books and other written materials in accessible formats. These exceptions to copyright law are important in helping to combat the ‘book famine’ for print-disabled readers. The Marrakesh Treaty is particularly important in global south countries where the range of materials in an accessible format – usually expensive to produce and disseminate – can be extremely limited.

Its importance was recognised long ago, as indicated by a timeline on the Knowledge Economy International (KEI) site:

In 1981, the governing bodies of WIPO and UNESCO agreed to create a Working Group on Access by the Visually and Auditory Handicapped to Material Reproducing Works Produced by Copyright. This group meeting took place on October 25-27, 1982 in Paris, and produced a report that included model exceptions for national copyright laws. (UNESCO/WIPO/WGH/I/3). An accessible copy of this report is available here.

And yet it was only in 2013 – 31 years after the original report – that the treaty was finally agreed. The reason for this extraordinary delay in making it easier for the visually impaired to enjoy even a fraction of the material that most have access to is simple: copyright. As KEI’s director, James Love, told Walled Culture in an interview three years ago: “the initial opposition was from the publishers, and the publishers did everything you can imagine to derail this [treaty]”. The CEO post explains why:

Industry’s lobby efforts have attempted to re-frame the Marrakesh Treaty away from being a matter of human rights, education, and social justice, towards a copyright agenda by portraying it as a threat to business’ interests.

Indeed, even industries well outside publishing lobbied hard against the treaty. For example:

Caterpillar, the machinery manufacturer, joined the campaign to oppose it, apparently convinced that the Treaty would act as a slippery slope towards weaker intellectual property rules elsewhere.

As the CEO article noted, after the Marrakesh Treaty was agreed, several EU member states insisted on it being watered down further:

contrary to the obvious benefits of the ratification and implementation of the Marrakesh Treaty for the 30 million blind or visually-impaired people in Europe (and 285 million worldwide), several EU member state governments have instead bought the business line that these issues should be viewed through the lens of copyright.

That was eight years ago. And yet – incredibly – the pushback against providing the visually impaired with at least minimal rights to convert print and digital material into forms that they could access has continued unabated. A recent post on the International Federation of Library Associations and Institutions (IFLA) blog analyses the ways in which the already diluted benefits of the Marrakesh Treaty have been diminished further:

it has become clear that there are a number of ways in which it is possible to undermine the goals and intent of the Marrakesh Treaty, ultimately limiting the progress of access to information than would otherwise be possible.

This article highlights examples from countries that are arguably getting Marrakesh implementation wrong. The list below illustrates provisions (or a lack of provisions) to avoid because they undermine the purpose of the treaty and create barriers to access for people with disabilities.

One extraordinary failure to implement the Marrakesh Treaty properly, a full 40 years after it was first discussed, is “where laws have set out that authorised entities need to be registered in order to use Marrakesh provisions, but then there is no way of registering.” According to the IFLA this is the case in Brazil and Argentina. Just slightly better is the situation where “only certain institutions and libraries should count as authorised entities.” Clearly, this “may have the effect of limiting the number of service providers, and place an additional burden on institutions.” Another problem concerns remuneration:

The Marrakesh Treaty includes an optional provision for remuneration of rightholders. This non-compulsory clause was added in order to secure support during negotiations, but undermines the Treaty’s purpose by allowing the payment of a royalty for an inaccessible work, and creates a financial and administrative burden, ultimately drawing resources away services to persons with disabilities.

Germany is a disappointing example of how new barriers can be placed in the way of the visually impaired by adding unjustified and exorbitant costs:

a fee of at least €15 is charged for each transfer of a book for each individual format. Fees (approx. 15 cents) are also charged for each download or stream of a book. Additionally, fees are charged for obtaining books from other German-speaking countries and for borrowing them. This leads to considerable costs, which inevitably result in a decline in purchases and the range of services offered.

Another obstacle is the requirement in some countries for “a commercial availability check for a work in an accessible format, when the very purpose of the Marrakesh Treaty was to address a market failure.” As the IFLA post rightly points out:

A commercial availability check is unnecessary – libraries will buy books in accessible formats where they can, as it is far more cost effective to purchase the work than produce it in accessible format. Yet Canada has introduced such a provision, and indeed even requires a second check when exporting books. It is burdensome to expect a library to conduct a search in a foreign market and be 100% sure that a book is not available in a given format there. Often the information simply is not available. Such provisions therefore create unacceptable liability, chilling the sharing of books. 

Finally, there are countries that have joined the Marrakesh Treaty, but have done little or nothing to implement it:

recent piece from Bangladesh highlights how delays in reforming domestic copyright laws, coupled with underinvestment, have meant that three years on from ratifying the Treaty, persons with print disabilities are still waiting for change. Similarly in South Africa, despite a judgement from the Constitutional Court, the necessary reforms to implement the Treaty are still being held up.

The Marrakesh Treaty saga shows the copyright industry and its friends in governments around the world at their very worst. Unashamedly placing copyright’s intellectual monopoly above other fundamental human rights, these groups have selfishly done all they can to block, hinder, delay and dilute the idea that granting the visually impaired ready access to books and other material is a matter of social justice and basic compassion.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Posted on Techdirt - 16 December 2025 @ 12:02pm

When People Realize How Good The Latest Chinese Open Source Models Are (And Free), The GenAI Bubble Could Finally Pop

Although the field of artificial intelligence (AI) goes back more than half century, its latest incarnation generative AI is still very new: ChatGPT was launched just three years ago. During that time a wide variety of issues have been raised, ranging from concerns about the impact of AI on copyright, people’s ability to learn or even think, job losses, the flood of AI slop on the Internet, the environmental harms of massive data centers, and whether the creation of a super-intelligent AI will lead to the demise of humanity. Recently, a more mundane worry is that the current superheated generative AI market is a bubble about to pop. In the last few days, Google’s CEO, Sundar Pichai, has admitted that there is some “irrationality” in the current AI boom, while the Bank of England has warned about the risk of a “sharp correction” in the value of major players in the sector.

One element that may not yet be factored in to this situation is the rising sophistication of open source models from China. Back in April, Techdirt wrote about how the release of a single model from the Chinese company DeepSeek had wiped a trillion dollars from US markets. Since then, DeepSeek has not been standing still. It has just launched its V3.2 model, and a review on ZDNet is impressed by the improvements:

the fact that a company — and one based in China, no less — has built an open-source model that can compete with the reasoning capabilities of some of the most advanced proprietary models currently on the market is a huge deal. It reiterates growing evidence that the “performance gap” between open-source and close-sourced models isn’t a fixed and unresolvable fact, but a technical discrepancy that can be bridged through creative approaches to pretraining, attention, and posttraining.

It is not just one open source Chinese model that is close to matching the best of the leading proprietary offerings. An article from NBC News notes that other freely downloadable Chinese models like Alibaba’s Qwen were also “within striking distance of America’s best.” Moreover, these are not merely theoretical options: they are already being put to use by AI startups in the US.

Over the past year, a growing share of America’s hottest AI startups have turned to open Chinese AI models that increasingly rival, and sometimes replace, expensive U.S. systems as the foundation for American AI products.

NBC News spoke to over 15 AI startup founders, machine-learning engineers, industry experts and investors, who said that while models from American companies continue to set the pace of progress at the frontier of AI capabilities, many Chinese systems are cheaper to access, more customizable and have become sufficiently capable for many uses over the past year.

As well as being free to download and completely configurable, these open source models from Chinese companies have another advantages over many of the better-known US products: they can be run locally without needing to pay any fees. This also means no data leaves the local system, which offers enhanced privacy and control over sensitive business data. However, as the NBC article notes, there are still some worries about using Chinese models:

In late September, the U.S. Center for AI Standards and Innovation released a report outlining risks from DeepSeek’s popular models, finding weakened safety protocols and increased pro-Chinese outputs compared to American closed-source models.

And the success of China’s open source models is prompting US efforts to take catch up:

In July, the White House released an AI Action Plan that called for the federal government to “Encourage Open-Source and Open-Weight AI.”

In August, ChatGPT maker OpenAI released its first open-source model in five years. Announcing the model’s release, OpenAI cited the importance of American open-source models, writing that “broad access to these capable open-weights models created in the US helps expand democratic AI.”

And in late November, the Seattle-based Allen Institute released its newest open-source model called Olmo 3, designed to help users “build trustworthy features quickly, whether for research, education, or applications,” according to its launch announcement.

The open source approach to generative AI is evidently growing in importance, driven by enhanced capabilities, low price, customizability, reduced running costs and better privacy. The free availability of these open source and open weight models, whether from China or the US, is bound to call into question the underlying assumption of today’s generative AI companies that there will be a commensurate payback for the trillions of dollars they are currently investing. Maybe it will be the realization that today’s open source models are actually good enough for most applications that finally pops the AI bubble.

Follow me @glynmoody on  on Bluesky and Mastodon.

Posted on Techdirt - 9 December 2025 @ 07:54pm

Public AI, Built On Open Source, Is The Way Forward In The EU

Aquarter of a century ago, I wrote a book called “Rebel Code”. It was the first – and is still the only – detailed history of the origins and rise of free software and open source, based on interviews with the gifted and generous hackers who took part. Back then, it was clear that open source represented a powerful alternative to the traditional proprietary approach to software development and distribution. But few could have predicted how completely open source would come to dominate computing. Alongside its role in running every aspect of the Internet, and powering most mobile phones in the form of Android, it has been embraced by startups for its unbeatable combination of power, reliability and low cost. It’s also a natural fit for cloud computing because of its ability to scale. It is no coincidence that for the last ten years, pretty much 100% of the world’s top 500 supercomputers have all run an operating system based on the open source Linux.

More recently, many leading AI systems have been released as open source. That raises the important question of what exactly “open source” means in the context of generative AI software, which involves much more than just code. The Open Source Initiative, which drew up the original definition of open source, has extended this work with its Open Source AI Definition. It is noteworthy that the EU has explicitly recognized the special role of open source in the field of AI. In the EU’s recent Artificial Intelligence Act, open source AI systems are exempt from the potentially onerous obligation to draw up a range of documentation that is generally required.

That could provide a major incentive for AI developers in the EU to take the open source route. European academic researchers working in this area are probably already doing that, not least for reasons of cost. Paul Keller points out in a blog post that another piece of EU legislation, the 2019 Copyright in the Digital Single Market Directive (CDSM), offers a further reason for research institutions to release their work as open source:

Article 3 of the CDSM Directive enables these institutions to text and data-mine all “works or other subject matter to which they have lawful access” for scientific research purposes. Text and data mining is understood to cover “any automated analytical technique aimed at analysing text and data in digital form in order to generate information, which includes but is not limited to patterns, trends and correlations,” which clearly covers the development of AI models (see here or, more recently, here).

Keller’s post goes through the details of how that feeds into AI research, but the end-result is the following:

as long as the model is made available in line with the public-interest research missions of the organisations undertaking the training (for example, by releasing the model, including its weights, under an open-source licence) and is not commercialised by these organisations, this also does not affect the status of the reproductions and extractions made during the training process.

This means that Article 3 does cover the full model-development pathway (from data acquisition to model publication under an open source license) that most non-commercial Public AI model developers pursue.

As that indicates, the use of open source licensing is critical to this application of Article 3 of EU copyright legislation for the purpose of AI research.

What’s noteworthy here is how two different pieces of EU legislation, passed some years apart, work together to create a special category of open source AI systems that avoid most of the legal problems of training AI systems on copyright materials, as well as the bureaucratic overhead imposed by the EU AI Act on commercial systems. Keller calls these “public AI”, which he defines as:

AI systems that are built by organizations acting in the public interest and that focus on creating public value rather than extracting as much value from the information commons as possible.

Public AI systems are important for at least two reasons. First, their mission is to serve the public interest, rather than focusing on profit maximization. That’s obviously crucial at time when today’s AI giants are intent on making as much money as possible, presumably in the hope that they can do so before the AI bubble bursts.

Secondly, public AI systems provide a way for the EU to compete with both US and Chinese AI companies – by not competing with them. It is naive to think that Europe can ever match levels of venture capital investment that big name US AI startups currently enjoy, or that the EU is prepared and able to support local industries for as long and as deeply as the Chinese government evidently plans to do for its home-grown AI firms. But public AI systems, which are fully open source, and which take advantage of the EU right of research institutions to carry out text and data mining, offer a uniquely European take on generative AI that might even make such systems acceptable to those who worry about how they are built, and how they are used.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to the Walled Culture blog.

Posted on Techdirt - 19 November 2025 @ 01:35pm

Fans Of Open Access, Unite: You Have Nothing To Lose But Your Chained Libraries

When books were rare and extremely expensive, they were often chained to the bookcase to prevent people walking off with them, in what were known as “chained libraries”. Copyright serves a similar purpose today, even though, thanks to the miracle of perfect, zero-cost digital copies, it is possible simultaneously to take an ebook home and yet leave the original behind. For a quarter of a century, the open access movement has been fighting to break those virtual chains for academic works, and to allow anyone freely to read and make copies of the knowledge contained in online virtual libraries.

The detailed history of the movement can be found in Chapter 3 of Walled Culture the book (free digital versions available). As the timeline there, and posts on this blog both make clear, the open access movement has made only limited progress despite the enormous effort expended by many dedicated individuals. Moreover the open access idea has been embraced and then subverted by the academic publishers whose greed and selfishness it was meant to fight.

One version of open access, known as the “diamond” variant, still offers hope that the goals of free access to knowledge for everyone could still be achieved. But even this minimalist approach to academic publishing requires funding, which raises questions about its long-term sustainability. Economic issues also lie at the heart of wider discussions about what could replace copyright, which was born in the analogue world, and whose dysfunctional nature in the digital environment is evident every day.

Walled Culture the book concludes with a look at perhaps the most promising alternative model, whereby “true fans” support directly the creators whose work they value. This approach can also be applied to open access. In this case, the “true fans” of the research work published in papers and books are the academic libraries, acting on behalf of the people who use them. There are various ways for them to support the journals their academics want to access, but one of the most promising is “subscribe to open” (S2O), which helps publishers convert traditional journals into open access. The idea was formalized by Raym Crow, Richard Gallagher, Kamran Naim in 2019. Here’s their explanation of how it works:

S2O offers a journal’s current subscribers continued access at a discount off the regular subscription price. If current subscribers participate in the S2O offer, the publisher opens the content covered by that year’s subscription. If participation is not sufficient – for example, if some subscribers delay renewing in the expectation that they can gain access without participating – then the content remains gated. Because the publisher does not guarantee that the content will be opened unless all subscribers participate in the offer, institutions that value access to the content – demonstrably, the journal’s current subscribers – must either subscribe conventionally (at full price) or participate in S2O (at a discount) to ensure continued access. The offer is repeated every year, with the opening of each year’s content contingent on sufficient participation.

As with the “true fans” model, supporting S2O journals is in the self-interest of libraries, which receive subscriptions to journals their academics want, and for a lower price. But there is a collateral benefit for society because everyone else also receives access to the knowledge contained in those titles. Publishers receive a guaranteed subscription income up front, and as a consequence of the open access route, they can also reach a larger audience. For example, when the Annual Review of Public Health publication tried out the S2O model, its monthly usage factor went up by a factor of eight. Since that successful trial, the S2O model has gone from strength to strength, as a review article published at the end of last year explains:

As of 2024, thanks to the Subscribe to Open model, over 180 journals have been able to publish entire volumes in open access, which would never have been possible otherwise because of the shortcomings of the [article processing charge] models for these journals and their respective disciplines. The S2O model continues to grow, with more publishers set to launch their S2O offerings in 2025.

In August, the prestigious Royal Society announced that it would be moving eight of its subscription journals to S2O. Among those titles is Philosophical Transactions of the Royal Society, the world’s longest-running scientific journal. In an article reflecting on that move, Rod Cookson, publishing director of The Royal Society, explained why he and other forward-thinking publishers are fans of S2O:

It is cost neutral and a relatively small change through which libraries can enable entire journals to become open access. This combination of simplicity and transparency has generated enthusiasm for S2O among librarians the world over. Publishers now need to demonstrate to those librarians that in addition to being aligned with their missions, S2O delivers a return on investment that justifies their expenditure. With sensible features that make the S2O proposition work well for both libraries and publishing houses—like multi-year agreements, “premium benefits” for S2O supporters, and collective sales packages—S2O will continue to grow as a trusted and durable model for delivering open access.

S2O represents a successful application of the true fans idea in the context of academic publishing. But perhaps supporters of open access should embrace even more of the true fan spirit and look to the example of fan fiction to help re-imagine scholarly publishing. That, at least, is the bold idea of Caroline Ball, who is the community engagement lead for the Open Book Collective, and whose advocacy work appeared in Walled Culture four years ago. Here’s why she thinks academic research should be more like fan fiction:

At first glance, fanfiction—non-commercial works created by fans who reimagine and remix existing stories, characters, and worlds—and academic research may seem worlds apart. But look closer, and both are practices of deep engagement, intertextual interpretation, and knowledge creation.

Fanfiction doesn’t just regurgitate stories; it interrogates, reinvents, and expands on them, often filling in gaps and exclusions left offscreen. Likewise, scholarship builds on prior work, challenges assumptions, and contributes new insights. Both are iterative, dialogic, and community based. And both, at their best, come from a place of passion and curiosity.

Her post explains how Archive of Our Own (AO3), a community-run digital repository for fan fiction, works, and why it could be a model for a new kind of open access:

Archive of Our Own (AO3) is a community-run digital repository for fanfiction. Launched in 2008 by the nonprofit Organization for Transformative Works (OTW), AO3 is entirely open access. It charges nothing to publish, nothing to read, and is powered by open-source code and volunteer labor. As of May 2025 (according to the OTW Communications Committee), it hosts over 15 million works across 71,880 fandoms and sees a daily average of 94 million hits.

Ball goes on to suggests ways in which scholarly publishing could learn from that evident success. Specific areas include AO3’s flexible metadata system; its innovative approach to reviews and comments; its “format agnosticism”, accepting any kind of contribution; and the way it re-imagines recognition and reputation. In summary, she writes:

AO3 reminds us that platforms can be built by and for communities, without extractive profit models or exclusionary hierarchies. It shows what’s possible when infrastructure is treated as a public good, and when participation is scaffolded, not gated. And crucially, AO3 demonstrates how practices that have been piloted in isolation across the scholarly landscape—open peer commentary, volunteer governance, flexible metadata, inclusive formats—can be woven together into a single, sustainable system.

The S2O model described above is a welcome addition to the ways in which sustainable open access can be brought in by publishers. But ultimately Ball is right in emphasizing that universal and unconstrained access to knowledge will only be achieved when the entire scholarly publishing system is re-invented with that goal in mind. It’s well past time for all the fans of open access to unite in this endeavor, and to do away with today’s digital chained libraries forever.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to Walled Culture.

Posted on Techdirt - 14 November 2025 @ 11:56am

Copyright Is The Wrong Tool To Deal With Deepfake Harms

A key theme of Walled Culture the book (free digital versions available) is that copyright, born in an analogue age of scarcity, works poorly in today’s digital world of abundance. One manifestation of that is how lawmakers struggle to adapt the existing copyright rules to deal with novel technological developments, like the new generation of AI technologies. The EU’s AI Act marks a major step in regulating artificial intelligence, but it touches on copyright only briefly, leaving many copyright-related questions still open. The process of aligning national copyright laws with the AI Act provides an opportunity for EU Member States to flesh out some of the details, and that is what Italy has done with its new “Disposizioni e deleghe al Governo in materia di intelligenza artificiale.” (Provisions and delegations to the Government regarding artificial intelligence). The Communia blog explains the two main provisions. The first specifies that only works of human creativity are eligible for protection under Italian copyright law:

It codifies a crucial principle: while AI can be a tool in the creative process, copyright protection remains reserved for human-generated intellectual effort. This positions Italian law in alignment with the broader international trend, seen in the EU, U.S., and UK, of rejecting full legal authorship rights for non-human agents such as AI systems. In practice, this means that works solely generated by AI without significant human input will likely fall outside the scope of copyright protection.

The second provision deals with the legality of text and data mining (TDM) activities used in the training of AI models:

This provision essentially reaffirms that text and data mining (TDM) is permitted under certain conditions, namely where access to the source materials is lawful and the activity complies with the existing TDM exceptions under EU copyright law

The Italian AI law is about clarifying existing copyright law to deal with issues raised by AI. But some EU countries want to go much further in their response to generative AI, and bring in an entirely new kind of copyright. Both Denmark and the Netherlands are proposing to give people the copyright to their body, facial features, and voice. The move is intended as a response to the rising number of AI-generated deepfakes, where aspects such as someone’s face, body and voice are used without their permission, often for questionable purposes, and sometimes for criminal ones. There are good reasons for tackling deepfakes, as noted in an excellent commentary by P. Bernt Hugenholtz regarding the proposed Danish and Dutch laws:

Fake porn and other deepfake content is causing serious, and sometimes irreversible, harm to a person’s integrity and reputation. Fake audio or video content might deceive or mislead audiences and consumers, poison the public sphere, induce hatred, manipulate political discourse and undermine trust in science, journalism, and the public media. Like misinformation more generally, deepfakes pose a threat to our increasingly fragile democracies.

The problem is not that new laws are being brought in, but that the Danish and Dutch governments are proposing to use the wrong legal framework – copyright – to do so:

If concerns over privacy and reputation are the main reasons for regulating deepfakes, any new rules should be grounded in the law of privacy. If preserving trust in the media or safeguarding democracy are the dominant concerns, deepfakes ought to be addressed in media regulation or election laws. The Danish and Dutch bills address and alleviate none of these concerns.

It’s a classic example of copyright maximalism, where wider and stronger copyright laws are seen as the solution to everything. As well as being a poor fit for the problem, taking this approach would bring with it a real harm:

both deepfake bills conceive the new right to control deepfakes as a marketable, exploitable right, subject to monetization by way of licensing.

The message both bills convey is not that deepfakes are taboo, but that deepfakes amount to a new licensing opportunity.

In other words, the copyright maximalist approach makes everything about money, not morals. Ironically, taking such an approach would weaken copyright itself, as Communia’s submission to the Danish consultation on the deepfake proposal explains:

the proposal risks undermining the coherence of copyright law itself by introducing doctrinal inconsistencies. Copyright protects original expressive works, not a person’s indicia of personal identity, such as their image, voice or other physical characteristics. It is awarded for a limited duration in order to incentivise the creation of new works, and the existing corpus of limitations and exceptions has been designed with this premise in mind. Extending copyright to subject matter of an entirely different nature, for which marketisation is not an intended objective, will inevitably create legal uncertainty.

Communia points out a further reason not to take the copyright route for protecting people against deepfakes. The Danish bill would grant performing artists a new and wide-ranging copyright in their performances that would have a negative impact on the public domain:

the proposed extension of protection to subject matter that does not constitute a performance of an artistic or literary work raises significant concerns as to scope and proportionality. The introduction of a new exclusive right with such a wide scope would unduly restrict the Public Domain, interfering with the lawful access and reuse of subject matter that is currently out-of-copyright and that should remain as such, in the absence of clear economic evidence that such expansion is needed.

Moreover, as Communia notes:

The recitals of the draft [Danish] bill themselves acknowledge that multiple legal bases for acting against deepfakes already exist, including within criminal law. If individuals face difficulties in asserting their rights under the current framework, the appropriate course of action would be for the legislator to clarify the existing legal position. Introducing an additional and conceptually flawed layer of protection risks creating confusion and may ultimately prove counterproductive.

There’s no doubt that the harms caused by AI-generated deepfakes need tackling. The situation is made worse by advanced AI apps explicitly designed to make deepfake generation as easy as possible, such as OpenAI’s Sora, which are currently entering the market. But introducing a new kind of copyright is the wrong way to do it.

Follow me @glynmoody on Mastodon and on Bluesky. Originally published to Walled Culture.

More posts from Glyn Moody >>