The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training

from the yes-we're-serious dept

A little over a month ago we told the Copyright Office in a comment that there was no role for copyright law to play when it comes to training AI systems. In fact, on the whole there’s little for copyright law to do to address the externalities of AI at all. No matter how one might feel about some of AI’s more dubious applications, copyright law is no remedy. Instead, as we reminded in this follow-up reply comment, trying to use copyright to obstruct development of the technology instead creates its own harms, especially when applied to the training aspect.

One of those harms, as we reiterated here, is that it impinges on the First Amendment right to read that human intelligence needs to have protected, and that right must inherently include the right to use technological tools to do that “reading,” or consumption in general of copyrighted works. After all, we need record players to play records – it would do no one any good if their right to listen to one stopped short of being able to use the tool needed to do it. We also pointed out that this First Amendment right does not diminish even if people consume a lot of media (we don’t, for instance, punish voracious readers for reading more than others) or at speed (copyright law does not give anyone the right to forbid listening to an LP at 45 rpm, or watching a movie on fast forward). So if we were to let copyright law stand in the way of using software to quickly read a lot of material to it would represent a deviation from how copyright law has up to now operated, and one that would undermine the rights to consume works that we’ve so far been able to enjoy.

Which is why we also pointed out that using copyright to deter AI training distorted copyright law itself, which would be felt in other contexts where copyright law legitimately applies. And we highlighted a disturbing trend emerging in copyright law from other quarters as well, this idea that whether a use of a work is legitimate somehow depends on whether the copyright holder approves of it. Copyright law was not intended, or written, to give copyright owners an implicit veto over any or all uses of works – the power of a copyright is limited to what its exclusive rights allow control over and fair use doesn’t otherwise justify.

A variant of this emerging trend also getting undue oxygen is the idea that profiting from a use of a copyrighted work used for free is somehow inherently objectionable and therefore ripe for the copyright holder to veto. But, again, such would represent a significant change if copyright law could work that way. Copyright holders are not guaranteed every penny that could potentially result from the use of a copyrighted work, and it has been independently problematic when courts have found otherwise.

Furthermore, to the extent that this later profiting may represent an actual problem in the AI space, which is far from certain, a better solution is to instead keep copyright law away from AI outputs as well. Some of the objection to AI makers later profiting seems to be based on the concern that certain enterprises might use works for free to develop their systems and then lock up the outputs with their own copyrights. But it isn’t necessary for copyright to apply to everything that is ever created, and certainly not by an artificial intelligence, so we should therefore also look hard at whether it is itself appropriate for copyright to apply to AI outputs. Not everything needs to be owned; having works immediately enter the public domain after their creation is an option, and a good one that vindicates copyright’s goals of promoting the exchange of knowledge.

Which brings us back to an earlier point to echo again now, that using copyright law as a means of constraining AI is also an ineffective way of addressing any of its potential harms. If, for instance, AI is used in hiring decisions and leads to discriminatory results, such is not a harm recognized by copyright law, and copyright law is not designed to address it. In fact, trying to use copyright law to fix it will actually be counterproductive: bias is exacerbated when the training data is too limited, and limiting it further will only make worse the problem we’re trying to address.

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Copia Institute Tells The Copyright Office Again That Copyright Law Has No Business Obstructing AI Training”

Subscribe: RSS Leave a comment
13 Comments
faffod (profile) says:

Can I use AI to distribute copyrighted material?

AI ends up storing the source material in a compress format, but stores it. Just like a JPEG is compressed, this is a different compression algorithm. If I run a server with MP3s I would be shut down for serving copyrighted material. I would not be able to argue that because it was compressed it was transformative. An AI server provides interpolative representation of the source material, but they by definition are serving all of the source material.

Anonymous Coward says:

Re:

If AI was storing compressed version it would be useless at creating novel images. What it is capable of is generating images similar to collections of images it has seen. For example it does not need to store any specific cat image or garden image to generate an image of a cat in the a garden, or too much searching of cat images to find a similar image to the one it produce. However finding a similar image does not mean that image was in its training set, or extracted from its database by decompressing some of the data.

Anonymous Coward says:

Re:

AI ends up storing the source material in a compress format, but stores it.

Wrong. It effectively stores a hash of the context of the source material. When a query is presented, the context thereof is determined, a hash is generated, and that is what is matched in the database of hashes. Reconstituting a basic context from those hashes is relatively easy, and thence the generation of a reply to the original query.

If the AI is merely a very sophisticated controller of some kind, then it’s nothing more than decision-making look-up table, albeit on steroids. Adding voice capability to ask a ‘higher authority’ (a human) for permission to do X, or for clarification, or whatever, that’s not intelligence of any kind, that’s simply getting fancy where a console (screen and keyboard) could’ve done the job.

Putting it bluntly, we’re going to find that underneath the hood, AI will do nothing more than imitate a human’s decision making process, i.e. “if-then-else” (or case, or switch, or loop-while, or… etc.). Originality and creativity are yet to come. See Asimov’s The Bicentennial Man for a better explanation than I can provide as to why that is true (so far).

Mike Masnick (profile) says:

Re:

AI ends up storing the source material in a compress format, but stores it. Just like a JPEG is compressed, this is a different compression algorithm. If I run a server with MP3s I would be shut down for serving copyrighted material.

Search engines store full copies of the pages they index, and that has been found to be fair use, as they are used in transformative ways.

So even if it were true that AI stores source material (which… is not actually how it works), it would still be fair use for the same reasons as search engine archival copies.

Anonymous Coward says:

Re:

The metagame of jobs and hiring has seen AI usage ramped up by both sides. Recruiters use AI to sort through the long list of applicants they get, so applicants turn to anything they think will give them an edge in standing out. In turn the hirers use AI to sort through the AI-generated resumes, and the applicants resort to stronger AI models to beat these hirer AIs. Which then turns into an arms race between both teams.

The truth is though, you could take AI out of the equation and the same thing would still happen – an arms race between the people looking for a job, and the people rolling a die to see who gets to not starve.

Anonymous Coward says:

So, I make 3 applications.

  • a book reader, which scans the page, translates the images to words, and “reads aloud” those words.
  • a word counting application, which scans the page, translates the images to words, then increases a count for each word encountered. It then prints the word counts.
  • a LLM, for which the book is scanned in as above, but can tell me all the character names and their hair color.

If LLM-style AI violates copyright, What is the critical difference between the first two, and the third, in how it does so?

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...