The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training

from the yes-we're-serious dept

Fri, Dec 8th 2023 01:45pm - Cathy Gellis

A little over a month ago we told the Copyright Office in a comment that there was no role for copyright law to play when it comes to training AI systems. In fact, on the whole there’s little for copyright law to do to address the externalities of AI at all. No matter how one might feel about some of AI’s more dubious applications, copyright law is no remedy. Instead, as we reminded in this follow-up reply comment, trying to use copyright to obstruct development of the technology instead creates its own harms, especially when applied to the training aspect.

One of those harms, as we reiterated here, is that it impinges on the First Amendment right to read that human intelligence needs to have protected, and that right must inherently include the right to use technological tools to do that “reading,” or consumption in general of copyrighted works. After all, we need record players to play records – it would do no one any good if their right to listen to one stopped short of being able to use the tool needed to do it. We also pointed out that this First Amendment right does not diminish even if people consume a lot of media (we don’t, for instance, punish voracious readers for reading more than others) or at speed (copyright law does not give anyone the right to forbid listening to an LP at 45 rpm, or watching a movie on fast forward). So if we were to let copyright law stand in the way of using software to quickly read a lot of material to it would represent a deviation from how copyright law has up to now operated, and one that would undermine the rights to consume works that we’ve so far been able to enjoy.

Which is why we also pointed out that using copyright to deter AI training distorted copyright law itself, which would be felt in other contexts where copyright law legitimately applies. And we highlighted a disturbing trend emerging in copyright law from other quarters as well, this idea that whether a use of a work is legitimate somehow depends on whether the copyright holder approves of it. Copyright law was not intended, or written, to give copyright owners an implicit veto over any or all uses of works – the power of a copyright is limited to what its exclusive rights allow control over and fair use doesn’t otherwise justify.

A variant of this emerging trend also getting undue oxygen is the idea that profiting from a use of a copyrighted work used for free is somehow inherently objectionable and therefore ripe for the copyright holder to veto. But, again, such would represent a significant change if copyright law could work that way. Copyright holders are not guaranteed every penny that could potentially result from the use of a copyrighted work, and it has been independently problematic when courts have found otherwise.

Furthermore, to the extent that this later profiting may represent an actual problem in the AI space, which is far from certain, a better solution is to instead keep copyright law away from AI outputs as well. Some of the objection to AI makers later profiting seems to be based on the concern that certain enterprises might use works for free to develop their systems and then lock up the outputs with their own copyrights. But it isn’t necessary for copyright to apply to everything that is ever created, and certainly not by an artificial intelligence, so we should therefore also look hard at whether it is itself appropriate for copyright to apply to AI outputs. Not everything needs to be owned; having works immediately enter the public domain after their creation is an option, and a good one that vindicates copyright’s goals of promoting the exchange of knowledge.

Which brings us back to an earlier point to echo again now, that using copyright law as a means of constraining AI is also an ineffective way of addressing any of its potential harms. If, for instance, AI is used in hiring decisions and leads to discriminatory results, such is not a harm recognized by copyright law, and copyright law is not designed to address it. In fact, trying to use copyright law to fix it will actually be counterproductive: bias is exacerbated when the training data is too limited, and limiting it further will only make worse the problem we’re trying to address.

Comments on “The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training”

faffod (profile)

December 8, 2023 at 2:46 pm

Can I use AI to distribute copyrighted material?

AI ends up storing the source material in a compress format, but stores it. Just like a JPEG is compressed, this is a different compression algorithm. If I run a server with MP3s I would be shut down for serving copyrighted material. I would not be able to argue that because it was compressed it was transformative. An AI server provides interpolative representation of the source material, but they by definition are serving all of the source material.

Anonymous Coward

December 8, 2023 at 2:54 pm

Re:

This is like saying it’s copyright infringement to distribute a 50bps mp3 encode of a song. The compression is so lossy it isn’t meaningfully infringing.

Anonymous Coward

December 8, 2023 at 3:20 pm

Re:

If AI was storing compressed version it would be useless at creating novel images. What it is capable of is generating images similar to collections of images it has seen. For example it does not need to store any specific cat image or garden image to generate an image of a cat in the a garden, or too much searching of cat images to find a similar image to the one it produce. However finding a similar image does not mean that image was in its training set, or extracted from its database by decompressing some of the data.

Anonymous Coward

December 8, 2023 at 4:02 pm

Re:

AI ends up storing the source material in a compress format, but stores it.

Wrong. It effectively stores a hash of the context of the source material. When a query is presented, the context thereof is determined, a hash is generated, and that is what is matched in the database of hashes. Reconstituting a basic context from those hashes is relatively easy, and thence the generation of a reply to the original query.

If the AI is merely a very sophisticated controller of some kind, then it’s nothing more than decision-making look-up table, albeit on steroids. Adding voice capability to ask a ‘higher authority’ (a human) for permission to do X, or for clarification, or whatever, that’s not intelligence of any kind, that’s simply getting fancy where a console (screen and keyboard) could’ve done the job.

Putting it bluntly, we’re going to find that underneath the hood, AI will do nothing more than imitate a human’s decision making process, i.e. “if-then-else” (or case, or switch, or loop-while, or… etc.). Originality and creativity are yet to come. See Asimov’s The Bicentennial Man for a better explanation than I can provide as to why that is true (so far).

Anonymous Coward

December 8, 2023 at 4:30 pm

Re: Re:

Current AI is no more than a tool, and the results of using it are a creative as the user driving it. It has two advantages, it is quicker that using a paint program etc., and it is an artist with technical skill, implementing the user vision.

Mike Masnick (profile)

December 8, 2023 at 4:31 pm

Re:

AI ends up storing the source material in a compress format, but stores it. Just like a JPEG is compressed, this is a different compression algorithm. If I run a server with MP3s I would be shut down for serving copyrighted material.

Search engines store full copies of the pages they index, and that has been found to be fair use, as they are used in transformative ways.

So even if it were true that AI stores source material (which… is not actually how it works), it would still be fair use for the same reasons as search engine archival copies.

Anonymous Coward

December 8, 2023 at 6:33 pm

Re:

“AI ends up storing the source material ”

What .. is that wrong or something?
When you view a picture ‘on the internet’, a copy of that picture is stored upon your computer. That is how your computer is able to display the picture you bloody pirate.

Diogenes (profile)

December 8, 2023 at 6:44 pm

Re: nope

In all copyright complaints you need to show either that the distributed output is the copyrighted work or at least too much like the copyrighted work. Basically a side by side comparison. AI produced works are not copies. The output style may be like a copyrighted work but style is not copyrightable.

Anonymous Coward

December 8, 2023 at 3:52 pm

AI being used in hiring decisions in general is bad. It’s not the lack of data scraped and dumped in from all corners of the web that makes it bad. That last bit just comes off as defending the bullshit use of AI in hiring choices, and it’s a bad argument for AI scraping and absorption in general.

Anonymous Coward

December 9, 2023 at 9:46 pm

Re:

The metagame of jobs and hiring has seen AI usage ramped up by both sides. Recruiters use AI to sort through the long list of applicants they get, so applicants turn to anything they think will give them an edge in standing out. In turn the hirers use AI to sort through the AI-generated resumes, and the applicants resort to stronger AI models to beat these hirer AIs. Which then turns into an arms race between both teams.

The truth is though, you could take AI out of the equation and the same thing would still happen – an arms race between the people looking for a job, and the people rolling a die to see who gets to not starve.

Anonymous Coward

December 8, 2023 at 5:08 pm

So, I make 3 applications.

a book reader, which scans the page, translates the images to words, and “reads aloud” those words.
a word counting application, which scans the page, translates the images to words, then increases a count for each word encountered. It then prints the word counts.
a LLM, for which the book is scanned in as above, but can tell me all the character names and their hair color.

If LLM-style AI violates copyright, What is the critical difference between the first two, and the third, in how it does so?

Anonzy

December 29, 2023 at 4:50 pm

Re:

The LLM is also used to write books which compete with or materially limit the market of the original work (book).

Anonymous Coward

December 14, 2023 at 9:49 pm

Fox complains about farmer’s construction of hen house.

Add Your Comment

Thursday
12:03	DHS Is Hunting Down Trump Critics. The 'Free Speech' Warriors Are Mighty Quiet. (2)
10:52	The Full Orwell: DOJ Weaponization Working Group Finally Gets Off The Ground (2)
10:47	Daily Deal: The 2026 Canva Bundle (0)
09:30	The Wyden Siren: Senator's Cryptic CIA Letter Follows A Pattern That's Never Been Wrong (8)
05:27	Josh Hawley Trots Out Trans Panic Attacks On Netflix To Help Larry Ellison Buy CNN, HBO (16)
Wednesday
19:55	Measles Has Now Begun To Infect Immigrant Detention Camps (5)
15:49	Baton Rouge Acquires A Straight-Up Military Surveillance Drone (4)
13:32	OpenAI's New Scientific Writing And Collaboration Workspace 'Prism' Raises Fears Of Vibe-Coded Academic AI Slop (10)
12:02	Jeff Bezos Is Destroying What's Left Of The Washington Post To Please Our Dim, Unpopular Autocrats (17)
10:54	Federal Judges Are Done With The Deference: Courts Call Out Admin's Immigration 'Bullshit' In Increasingly Pointed Terms (12)

The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training

from the yes-we're-serious dept

Comments on “The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training”

Can I use AI to distribute copyrighted material?

Re:

Re:

Re:

Re: Re:

Re:

Re:

Re: nope

Re:

Re:

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Thursday

Wednesday

More

Tools & Services

Company

Contact

More

The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training

from the yes-we're-serious dept

Comments on “The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Thursday

Wednesday

More

Email This Story

Tools & Services

Company

Contact

More