AI And Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead

from the the-wrong-approach dept

You shouldn’t need a permission slip to read a webpage—whether you do it with your own eyes, or use software to help. AI is a category of general-purpose tools with myriad beneficial uses. Requiring developers to license the materials needed to create this technology threatens the development of more innovative and inclusive AI models, as well as important uses of AI as a tool for expression and scientific research.  

Threats to Socially Valuable Research and Innovation 

Requiring researchers to license fair uses of AI training data could make socially valuable research based on machine learning (ML) and even text and data mining (TDM) prohibitively complicated and expensive, if not impossible. Researchers have relied on fair use to conduct TDM research for a decade, leading to important advancements in myriad fields. However, licensing the vast quantity of works that high-quality TDM research requires is frequently cost-prohibitive and practically infeasible.  

Fair use protects ML and TDM research for good reason. Without fair use, copyright would hinder important scientific advancements that benefit all of us. Empirical studies back this up: research using TDM methodologies are more common in countries that protect TDM research from copyright control; in countries that don’t, copyright restrictions stymie beneficial research. It’s easy to see why: it would be impossible to identify and negotiate with millions of different copyright owners to analyze, say, text from the internet. 

The stakes are high, because ML is critical to helping us interpret the world around us. It’s being used by researchers to understand everything from space nebulae to the proteins in our bodies. When the task requires crunching a huge amount of data, such as the data generated by the world’s telescopes, ML helps rapidly sift through the information to identify features of potential interest to researchers. For example, scientists are using AlphaFold, a deep learning tool, to understand biological processes and develop drugs that target disease-causing malfunctions in those processes. The developers released an open-source version of AlphaFold, making it available to researchers around the world. Other developers have already iterated upon AlphaFold to build transformative new tools.  

Threats to Competition 

Requiring AI developers to get authorization from rightsholders before training models on copyrighted works would limit competition to companies that have their own trove of training data, or the means to strike a deal with such a company. This would result in all the usual harms of limited competition—higher costs, worse service, and heightened security risks—as well as reducing the variety of expression used to train such tools and the expression allowed to users seeking to express themselves with the aid of AI. As the Federal Trade Commission recently explained, if a handful of companies control AI training data, “they may be able to leverage their control to dampen or distort competition in generative AI markets” and “wield outsized influence over a significant swath of economic activity.” 

Legacy gatekeepers have already used copyright to stifle access to information and the creation of new tools for understanding it. Consider, for example, Thomson Reuters v. Ross Intelligence, widely considered to be the first lawsuit over AI training rights ever filed. Ross Intelligence sought to disrupt the legal research duopoly of Westlaw and LexisNexis by offering a new AI-based system. The startup attempted to license the right to train its model on Westlaw’s summaries of public domain judicial opinions and its method for organizing cases. Westlaw refused to grant the license and sued its tiny rival for copyright infringement. Ultimately, the lawsuit forced the startup out of business, eliminating a would-be competitor that might have helped increase access to the law.  

Similarly, shortly after Getty Images—a billion-dollar stock images company that owns hundreds of millions of images—filed a copyright lawsuit asking the court to order the “destruction” of Stable Diffusion over purported copyright violations in the training process, Getty introduced its own AI image generator trained on its own library of images.  

Requiring developers to license AI training materials benefits tech monopolists as well. For giant tech companies that can afford to pay, pricey licensing deals offer a way to lock in their dominant positions in the generative AI market by creating prohibitive barriers to entry. To develop a “foundation model” that can be used to build generative AI systems like ChatGPT and Stable Diffusion, developers need to “train” the model on billions or even trillions of works, often copied from the open internet without permission from copyright holders. There’s no feasible way to identify all of those rightsholders—let alone execute deals with each of them. Even if these deals were possible, licensing that much content at the prices developers are currently paying would be prohibitively expensive for most would-be competitors.  

We should not assume that the same companies who built this world can fix the problems they helped create; if we want AI models that don’t replicate existing social and political biases, we need to make it possible for new players to build them. 

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.  

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI. 

Threats to Free Expression 

Generative AI tools like text and image generators are powerful engines of expression. Creating content—particularly images and videos—is time intensive. It frequently requires tools and skills that many internet users lack. Generative AI significantly expedites content creation and reduces the need for artistic ability and expensive photographic or video technology. This facilitates the creation of art that simply would not have existed and allows people to express themselves in ways they couldn’t without AI.  

Some art forms historically practiced within the African American community—such as hip hop and collage—have a rich tradition of remixing to create new artworks that can be more than the sum of their parts. As professor and digital artist Nettrice Gaskins has explained, generative AI is a valuable tool for creating these kinds of art. Limiting the works that may be used to train AI would limit its utility as an artistic tool, and compound the harm that copyright law has already inflicted on historically Black art forms. 

Generative AI has the power to democratize speech and content creation, much like the internet has. Before the internet, a small number of large publishers controlled the channels of speech distribution, controlling which material reached audiences’ ears. The internet changed that by allowing anyone with a laptop and Wi-Fi connection to reach billions of people around the world. Generative AI magnifies those benefits by enabling ordinary internet users to tell stories and express opinions by allowing them to generate text in a matter of seconds and easily create graphics, images, animation, and videos that, just a few years ago, only the most sophisticated studios had the capability to produce. Legacy gatekeepers want to expand copyright so they can reverse this progress. Don’t let them: everyone deserves the right to use technology to express themselves, and AI is no exception.  

Threats to Fair Use 

In all of these situations, fair use—the ability to use copyrighted material without permission or payment in certain circumstances—often provides the best counter to restrictions imposed by rightsholders. But, as we explained in the first post in this series, fair use is under attack by the copyright creep. Publishers’ recent attempts to impose a new licensing regime for AI training rights—despite lacking any recognized legal right to control AI training—threatens to undermine the public’s fair use rights.  

By undermining fair use, the AI copyright creep makes all these other dangers more acute. Fair use is often what researchers and educators rely on to make their academic assessments and to gather data. Fair use allows competitors to build on existing work to offer better alternatives. And fair use lets anyone comment on, or criticize, copyrighted material.  

When gatekeepers make the argument against fair use and in favor of expansive copyright—in court, to lawmakers, and to the public—they are looking to cement their own power, and undermine ours.  

A Better Way Forward 

AI also threatens real harms that demand real solutions.  

Many creators and white-collar professionals increasingly believe that generative AI threatens their jobs. Many people also worry that it enables serious forms of abuse, such as AI-generated nonconsensual intimate imagery, including of children. Privacy concerns abound, as does consternation over misinformation and disinformation. And it’s already harming the environment.  

Expanding copyright will not mitigate these harms, and we shouldn’t forfeit free speech and innovation to chase snake oil “solutions” that won’t work.  

We need solutions that address the roots of these problems, like inadequate protections for labor rights and personal privacy. Targeted, issue-specific policies are far more likely to succeed in resolving the problems society faces. Take competition, for example. Proponents of copyright expansion argue that treating AI development like the fair use that it is would only enrich a handful of tech behemoths. But imposing onerous new copyright licensing requirements to train models would lock in the market advantages enjoyed by Big Tech and Big Media—the only companies that own large content libraries or can afford to license enough material to build a deep learning model—profiting entrenched incumbents at the public’s expense. What neither Big Tech nor Big Media will say is that stronger antitrust rules and enforcement would be a much better solution. 

What’s more, looking beyond copyright future-proofs the protections. Stronger environmental protections, comprehensive privacy laws, worker protections, and media literacy will create an ecosystem where we will have defenses against any new technology that might cause harm in those areas, not just generative AI. 

Expanding copyright, on the other hand, threatens socially beneficial uses of AI—for example, to conduct scientific research and generate new creative expression—without meaningfully addressing the harms.  

Originally posted to the EFF’s Deeplinks blog.

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “AI And Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead”

Subscribe: RSS Leave a comment
21 Comments

This comment has been flagged by the community. Click here to show it.

Anonymous Coward says:

Comparing LLMs and other trash to the way that humans create hip hop remixes is insulting. It also feels tone-deaf to bring up an example of Black culture and ingenuity to try and prop up GenAI. It ignores how tech with the “AI” moniker on it has, in the past and the present, discriminated against Black people. Speech and content creation are already democratized, too.

The “Here’s what to do instead” segment was also a huge nothingburger. The same “privacy and media literacy” schtick that they always do when they don’t have any real answers. And they need to face facts that one facet of labor rights also means giving workers more control over the things that they produce, which would be awesome, but runs counter to the whole “computer models should be allowed to gorge themselves on the sum total of human creativity” thing that the EFF has a hard-on for. It feels like the EFF is stuck in the SOPA/PIPA era when it comes to talk of copyright, and it has them treating independent artists and authors like they’re the RIAA and MPAA.

ECA (profile) says:

TL/DR all of it.

Excuse, I had eye surgery.
Its agreed that if you let an independent, AI scan and Correlate it Might be a good thing.
But its to easy to program it to be WHAT THEY WANT, not what we are all looking for.
Its as bad as watching TV and the news on every channel, or Every comic book you can find.
Whats FED in or it finds the MOST of will probably be ascertained to be the norm. Ask it a question and every answer is based on what it has Found.
The Key word is FOUND. Not always the truth, fact, or a comparison of ideologies concepts or What is REALLY needed.
Walk into a room of debating Consonants, and yu may never hear a Vowel. It may not understand that there IS a vowel. Which means you Must explain it? Which goes back to programming. you might as well be teaching a Child, and use words they dont understand yet.

The Idea of competing is forbidden in the Current climate of USA Capitalism. they dont like a FAIR ground to stand on, they want the advantage, in anyway shape or form.

then we get to the area of Understanding and learning. Insted of Grabbing info off the net/books/news/Idiocy of Forums and debates. The creation of its Own opinion and having a CHOICE or a restriction a Company wants to place, that would force a choice of Only sources it has chosen.

Then we can get into Forbidden subjects. Think about Facebook, demanding true names, and requiring ID. Is THAT their responsibility? It seems at FB, that that Ideal has fallin into the Dumpster abit, with Inbound Adverts from companies(?) that are online sending out cheap notices, And as a company they dont need to ID themselves of PROVE where they are in the Whole world.
Sorting facts and truth, as well as the WHOLE story is like looking USA history and not wanting to pay for all the Written books on the subjects. If AI had that advantage, how many $50 history books on certain wars and battles would be needed?

If we forget all of that, lets ask a BASIC question. How much room/data are we talking about? How many remote access from 1 server to another Planted around the world with the Data needed to have SOME sort of fact checking do you want? How about take all the servers we have now and double it? The amounts of data IS huge. Its like comparing the Data of what was happening upto and including WWII, and Whose side has an opinion about it. As in knowing that we were at war for 40 years Before WWII with the islands in the pacific. That the USA Blockaded Japan with the Aussie’s from getting oil from the middle east, that we had a volunteer, fighting force in China, that We were fighting With China and fighting in the Philippines Even before Japan attacked the USA(?).. And for the 40 years prior, fighting Cuba, Central America, Panama, the Pacific Islands we Never claimed Any of them as part of the USA? Even after we decided that the ‘America’s’ were to Never belong to another nation NOT part of the America’s. To much in that story line, and 1/2 of it we can never know.(but there are Books about Some of it).
How much do you want to edit from an Independent AI that can make a REAL opinion? I would love it.
Would it Solve much of anything?
Go watch an old movie, called Corban project. very interesting if you can get threw it.

Arianity says:

Without fair use, copyright would hinder important scientific advancements that benefit all of us.

This argument seems fatally flawed. It boils down to “if something benefits all of us, it’s worth taking”. While fair use takes public interest into account, it’s supposed to be a balancing test. (Never mind how this argument undermines copyright in general)

Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is,

The solution there is to address their bargaining power. Not make the imbalance worse. The take away from “property rights alone are not sufficient because they have no bargaining power” is not “property rights don’t matter”.

But imposing onerous new copyright licensing requirements to train models would lock in the market advantages enjoyed by Big Tech and Big Media

Big Tech doesn’t seem to be arguing all that hard for that moat. Not to mention, the suggested antitrust actions would address this.

We need solutions that address the roots of these problems,

Notably lacking are any proposals that would even attempt to solve things creators are worried about. While things like antitrust would be good in their own right, they don’t even attempt to solve any of these problems in AI. Creators would still have no leverage even in a world of perfect antitrust. There’s a reason they reach for copyright, and it’s because you’re not actually offering anything to solve the problem.

I would also note, these arguments are often focused (reasonably) on the financial aspects of copyrightable works. But a nontrivial amount of it comes down to expressive reasons, as well. Which copyright does solve.

There’s also very little acknowledgement that this issue may be mooted by alternative solutions like synthetic data. Many cutting edge models have literally already run out of content to train on, copyright completely aside. And there is a lot of ongoing work from companies to make datasets that can be practically licensed.

A random thought occurs to me: What about a compromise? ML models are free from copyright, as long as their models are fully open source and/or not for commercial use. What’s good for the goose is good for the gander, let society reap the benefits of open models.

This comment has been flagged by the community. Click here to show it.

terop (profile) says:

It’s easy to see why: it would be impossible to identify and negotiate with millions of different copyright owners to analyze, say, text from the internet.

Its not any more difficult to negotiate with the authors than it is to pass along those text pieces from the original source material to the readers. In fact the same mechanism (==text files) can be used to conduct the negotiation.

But given that copyright has not been properly enforced, society has not developed proper tools for conducting copyright negotiations. All tools we’ve seen are focused on cloning/copying the data, but none of them included the negotiation tools required by the law.

This lack of negotiation tools is significant failure in the marketplace. The tools that does this are known to work for limited areas of the world, but ultimately works as disincentive for users since they would need to pay for the content instead of doing blatant copyright infringements.

Its this lack of respect for the law and requirements set by the law that is creating the current criminal environment around copyrighted works.

terop (profile) says:

Re: Re:

Thousands and thousands of copyrighted works are being pirated right now on the internet. The only thing preventing this to turn into court cases is the fact that many copyright owners are ignoring the blatant infringements as too minor issues. But proper reading of the law would turn these cases into copyright infringement cases.

I’ll have an example: My contract with the publisher prevents me from publishing my 1994 game further in the internet, but the publisher already stopped distributing the work. Thus there isn’t anyone in the world that still have permission to use the work. Recardless of that, the following people are publishing the work on internet:
https://amiga.abime.net/games/view/mega-motion
https://www.youtube.com/watch?v=qQygH64LBHU
https://superadventuresingaming.blogspot.com/2022/08/mega-motion-amiga.html
https://www.lemonamiga.com/games/details.php?id=2656
https://www.youtube.com/watch?v=qP3QmlFK_CA
https://www.mobygames.com/game/74019/mega-motion/
https://gamefaqs.gamespot.com/amiga/662900-mega-motion
https://www.uvlist.net/game-270996-Mega+Motion
https://www.gamespot.com/games/mega-motion/cheats/
https://www.myabandonware.com/game/mega-motion-7fj
http://janeway.exotica.org.uk/release.php?id=16110
https://www.planetemu.net/rom/commodore-amiga-games-adf/mega-motion-1993-black-legend-cr-hlm
https://www.amigareviews.leveluphost.com/megamoti.htm#megamotionaf
https://stare.e-gry.net/pomoc/mega-motion
https://www.youtube.com/watch?v=hLAWYe3WctE
https://www.youtube.com/watch?v=qnUIGFFjpu8

This is clear proof that copyrighted works are being pirated on the internet in massive scale, and the only reason this doesn’t turn into massive copyright war is because I’m such a nice guy and don’t want anything bad to happen to the people who genuinely believe that it’s ok to publish these reviews and stuff. But our contracts with the publisher kinda closes that possibility.

terop (profile) says:

Re: Re: Re:

of course the real issue is that the money flow from users to the authors were closed down long ago, so even if millions of users would find it nice game, we as authors would not benefit at all from the distribution. Thus copyright law prevents this distribution, and eventually all activity around the game would stop and the product would be properly terminated. But some internet sites are not following the rules as they were designed to work and there are internet distribution still happening.

Note that none of the current internet publishers were among the original licensors of the work and there are things like pirate group intros included to the product distributions. I have left out places which is known to be among original licensors, i.e. some amiga future cdroms were directly coming from our publisher..

Crafty Coyote says:

The point was to use robots to do the dangerous work like repairing deep sea cables, or fixing space stations or creating art that would technically infringe copyright but not actually get anyone in trouble because a machine doesn’t have a sense of morals. The art they create is copyright free and can be a teaching tool for artists- as long as robots aren’t given copyright over their work, this could be a boon for artists.

Anonymous Coward says:

This would result in all the usual harms of limited competition—higher costs, worse service, and heightened security risks—as well as reducing the variety of expression used to train such tools and the expression allowed to users seeking to express themselves with the aid of AI.

So the basic problem of LLM and generative AI commercial uses being identical to crypto, NFT and virtual reality grifts, then.

I’m not seeing the problem here with blanket opposition other than rule of law crumbling for researcher fair use exemptions due to factors beyond the technology itself.

PrivateFrazer says:

We are all creators?

As an artist (E level, = existing just about) I should be in favour of licencing but I agree that giving it all to a few large companies is a bad idea. It needs to be open for all. Plus think about the artist is not the whole story: think about all the art and output that is in the public domain – it now belongs to the people And the people are creators too, posts photos etc. Ie LLM input belongs to all of us so please don’t just think about the artist
Maybe AI should be forced to be open source so it all belongs to everyone.

PrivateFrazer says:

We are all creators?

As an artist (E level, = existing just about) I should be in favour of licencing but I agree that giving it all to a few large companies is a bad idea. It needs to be open for all. Plus think about the artist is not the whole story: think about all the art and output that is in the public domain – it now belongs to the people And the people are creators too, posts photos etc. Ie LLM input belongs to all of us so please don’t just think about the artist
Maybe AI should be forced to be open source so it all belongs to everyone.

Practical Future says:

There's a reason why copyright hasn't expanded to AI (and won't)

Legal Precedents Supporting Training Use:
Google Books (Authors Guild v. Google, Inc.)
Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015)
The case where Google digitized entire libraries of books to create a searchable database, with only snippets available to users, was ruled as fair use. The court focused on the transformative nature of creating a searchable index, which was a new use of the text. This precedent strongly supports the argument that training an AI model, which results in new, original outputs, can also be seen as transformative.

Software Cases: Cases like Sega Enterprises Ltd. v. Accolade, Inc.
Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992)
Sony Computer Entertainment, Inc. v. Connectix Corp., 203 F.3d 596 (9th Cir. 2000)
and Sony Computer Entertainment, Inc. v. Connectix Corp.

Allowed intermediate copying in software reverse engineering because the end goal was to create a new product. These cases support the argument that making intermediate copies during the training process, especially when the ultimate outputs are new and non-infringing, should be considered fair use.

Everything will be in limbo until people come to terms with the fact we’re in a new era for better or worse depending on the person’s viewpoint. But it won’t be expanded.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...