Wherein The Copia Institute Tells The Copyright Office There’s No Place For Copyright Law In AI Training
These days everyone seems to be talking about AI, and the Copyright Office is no exception, although it may make sense for it to speak here because people keep trying to invoke copyright as a concept implicated by various aspects of AI, including, and perhaps especially, with regard to “training” AI systems. So the Copyright Office recently launched a study to get feedback on the role copyright has, or should be changed to have, in shaping any law that bears on AI, and earlier this week the Copia Institute filed an initial comment in that study.
In our comment we made several points, but the main one was that, at least when it comes to AI training, copyright law needs to butt out. It has no role to play now, nor could it constitutionally be changed to have one. And regardless of the legitimacy to any concerns for how AI may be used, allowing copyright to be an obstructing force in order to prevent AI systems from being developed will only have damaging effects not just deterring any benefits that the innovation might be able to provide but undermining the expressive freedoms we depend on.
In explaining our conclusion we first observed that one overarching problem poisoning any policy discussion on AI is that “artificial intelligence” is a terrible term that obscures what we are actually talking about. Not only do we tend to conflate the ways we develop it (or “train” it), with the way we use it, which presents its own promises and potential perils, but in general we all too often regard it as some new form of powerful magic that can either miraculously solve all sorts of previously intractable problems or threaten the survival of humanity. “AI” can certainly inspire both naïve enthusiasm prone to deploying it in damaging ways, and also equally unfounded moral panics preventing it from being used beneficially. It also can prompt genuine concerns as well as genuine excitement. Any policy discussion addressing it must therefore be able to cut through the emotion and tease out exactly what aspect of AI we are talking about when we are addressing those effects. We cannot afford to take analytical shortcuts, especially if it would lead us to inject copyright into an area of policy where it does not belong and its presence would instead cause its own harm.
Because AI is not in fact magic; in reality it is simply a sophisticated software tool that helps us process information and ideas around us. And copyright law exists to make sure that there is information and ideas for the public to engage with. It does so by bestowing on the copyright owner certain exclusive rights in the hopes that this exclusivity makes it economically viable for them to create the works containing those ideas and information. But these exclusive rights necessarily all focus on the creation and performance of their works. None of the rights limit how the public can then consume those works once they exist, because, indeed, the whole point of helping ensure they could exist is so that the public can consume them. Copyright law wouldn’t make sense, and probably not be constitutional per the Progress Clause, if the way it worked constrained that consumption and thus the public’s engagement with those ideas and information.
It also would offend the First Amendment because the right of free expression inherently includes what is often referred to as the right to read (or, more broadly, the right to receive information and ideas). Which is a big reason why book bans are so constitutionally odious, because they explicitly and deliberately attack that right. But people don’t just have the right to consume information and ideas directly through their own eyes and ears. They have the right to use tools to help them do it, including technological ones. As we explained in our comment, the ability to use tools to receive and perceive created works is often integral to facilitating that consumption – after all, how could the public listen to a record without a record player, or consume digital media without a computer. No law could prevent the use of tools without seriously impinging upon the inherent right to consume the works entirely. The United States is also a signatory to the Marrakesh Treaty, which addresses the unique need by those with visual and audio impairments to use tools such as screen readers to help them consume the works to which they would otherwise be entitled to perceive. Of course, it is not only those with such impairments who may have need to use such tools, and the right to format shift should allow anyone to use a screen reader to help them consume works if such tools will help them glean those ideas effectively.
What too often gets lost in the discussion of AI is that because we are not talking about some exceptional form of magic but rather just fancy software, AI training must be understood as simply being an extension of these same principles that allow the public to use tools, including software tools, to help them consume works. After all, if people can direct their screen reader to read one work, they should be able to direct their screen reader to read many works. Conversely, if they cannot use a tool to read many works, then it undermines their ability to use a tool to help them read any. Thus it is critically important that copyright law not interfere with AI training in order not to interfere with the public’s right to consume works as they currently should be able to do.
So at minimum such AI training needs to be considered a fair use, but the better practice is to recognize that there is no role for copyright to play when it comes to AI training at all. To say it is allowed as a fair use is to inflate the power of a copyright holder beyond what the statute or Constitution should allow because it suggests that using tools to consume works could ever potentially be an infringement, which only happens to be excused in this context. But copyright law is not supposed to give copyright owners such power over the consumption of their works, which we would then need to be dependent on fair use to temper. It should never apply to limit the consumption of works in any context, and we should not let concerns about AI generally, or their uses or outputs specifically, to open the door to copyright law ever becoming an obstacle to that consumption.
Filed Under: 1st amendment, ai, copyright, free speech, right to read, screen readers, us copyright office
Comments on “Wherein The Copia Institute Tells The Copyright Office There’s No Place For Copyright Law In AI Training”
hear hear
Copyright infringement has only ever been claimed against two things:
1. if someone’s Final product is so like the copyrighted product that it is clearly copied from it.
2. if someone makes copies and distributes them without permission.
Training isnt either of these. Only the final product can be infringement and only if you can show a side by side comparison of sameness.
Please can someone address why it is legal to use e.g. a copy of Stephen King’s IT that they found on the internet, to train a LLM.
Re:
See above article
Re: Re:
You download a database of text. It contains the hypothetical copy of Stephen King’s IT. You use that database to train the LLM. You violated the copyright by downloading the book, just as you would have if you borrowed a copy from the library and scanned each page.
Your scraping program browses deviantart. It downloads all the pictures. You feed the pictures into an image AI. You violated copyright by downloading the pictures.
Again, can someone explain why copyright has no role in a system built on mass copyright infringement.
Re: Re: Re:
A a large language model does not copy a book into its data base, but analyses the books use of language to build and modify its database. The training is done under the same rules as you or I reading content on the Internet, or downloading copies where that is allowed.
Re: Re: Re:
Infringement is infringement. If you downloaded an infringing copy of a novel, that’s infringement. Doesn’t matter if it is for an LLM training dataset or not.
No one looking at what is displayed on DeviantArt is infringing. It’s on display.
Next?
Re: Re: Re:2 Define infringement
If infringement is your concern, then non-infringement is by definition not something that would be of your concern.
If reading a book at a library isn’t infringing, which it isn’t, then a machine “reading” the same book cannot be infringing. That the machine can read all the books in the library in less time than a normal reader can read one book, is also, not infringing.
Clearly you are operating under a miasma of ignorance, or with notions that are based on invalid emotional arguments. They don’t keep the works, they train with them. That action is not significantly different than a human reading the book. Maintaining a database of everything they read is intractable, and unnecessary.
Seriously, how could you be so ignorant as to believe that the entire freaking web would be stored in individual databases for all the different AI projects? The cost alone would be prohibitive.
Re:
For the same reason that colleges have libraries for their students to use.
What about "training" students?
How is teaching (aka training) machines with an author’s works ANY DIFFERENT than training (aka teaching) students with her works?
It isn’t…and shouldn’t be treated differently.
Copyright isn’t a factor when a student reads a book. It shouldn’t be a factor when a machine “reads” a book.
As someone who’s blind, Copyright has negatively impacted me a lot. DRM in books is a particular problem. Oh, companies like Amazon have (tried) to make their books accessible but it doesn’t always go well. IMO section 1201 should either be repealed, or be modified so that it is not a violation of that section (or of any other section of title 17) if the tampering, infringement, or deactivation of mechanisms is for accessibility purposes.
Maybe Fair Use, However AI Output Is Publishing
AI can take in everything just like human consumption, however, any output is inherent publishing. So if copyrighted material comes out of it’s virtual mouth, violation and owner/operator must pay or “kill” it as prison is useless to the undead.
Re:
Private uses of AI is no more publishing that anything anybody you know shows or tells you.
Re:
AI can’t have copyright. Missed that bit.
While I totally agree with this approach to copyright, we have to remember that the DMCA exists, and part of its intent is to control what people (and machines) can do to consume copyrighted content.
So the Copyright Office is not above another similar carve-out for anything involving machine learning.