The Copyright Office Issues A Largely Disappointing Report On AI Training, And Once Again A Major Fair Use Analysis Inexplicably Ignores The First Amendment
from the did-an-LLM-write-this dept
As this post explains, we have issues with the third installment of the Copyright Office’s report on AI. The rest of this post discusses them, but we note here that, while the report is due criticism, the criticism does not warrant or justify the firing of Register of Copyrights Shira Perlmutter from her duly-appointed position. But we will save comment on that news for other posts and use this one to discuss our chief concerns with the report itself.
At the end of last week the Copyright Office released a prepublication version of the third, and likely final, report of the study it did on the intersection of copyright law and AI. Earlier installments addressed the questions raised by copyright with respect to AI output and digital replicas, whereas this installment addressed whether and how copyright law is implicated by training an AI model on copyrighted works and, in particular, whether such use of works for this purpose was a fair use.
There are some good things to highlight from the report. For example, it acknowledged a concern raised by commenters, including one we raised as the Copia Institute, that if models could only be trained on licensed works it would inherently produce a distorted model tainted by bias and inaccuracy.
But there are also concerning aspects to the report, with one of the most significant being that there was not a single mention of the First Amendment. And we know commenters raised it as an important consideration, because we did in our comments. In particular we discussed how allowing a copyright to bar AI training would interfere with the First Amendment’s protection of the right to read because if people are free to read directly, then they should be able to use tools (like crawlers and bots) to help them do their reading, and if they can’t be free to use tools to do their reading, then are they really free to read after all, which the First Amendment says they are supposed to be. It’s an important question to resolve, but one which the report seems to have entirely ignored.
It is also odd to omit any discussion of the First Amendment in any significant fair use analysis because fair use is an important way copyright law is able to comport with the First Amendment. As we’ve explained before, the Progress Clause of the Constitution says that Congress has the authority to write copyright law, but the First Amendment tempers that authority, just as it tempers all of Congress’s authority to write laws, to ensure that it “makes no law” that abridges freedom of expression. Without fair use, expressive freedom is often abridged, so it is very odd to produce a major document addressing a potential fair use and not directly consider how the Constitution informs the analysis.
Worse, it seems to be part of a growing trend to skip right over that part, which we saw earlier when the Supreme Court issued its own major fair use decision in the case about whether Andy Warhol’s Prince picture was a fair use of the earlier Lynn Goldsmith photograph. Not only did the entire decision fail to mention the First Amendment even once, but its analytical approach, which has been echoed in this report, tends to overemphasize market concerns over transformative concerns.
True, this report did acknowledge that AI “training a generative AI foundation model on a large and diverse dataset will often be transformative,” and being transformative is also a significant way that First Amendment interests are advanced by fair use because it recognizes how a later use adds something that an original use did not, and fair use is about saying yes to that new thing.
But, like the Supreme Court in Warhol, the Copyright Office invited the subordination of the transformative quality of the new use in favor of concerns about market harm for the original works an AI model trains on. Even though, like in Warhol, and also the Second Circuit’s decision in Hachette v. Internet Archive, these concerns are often predicated on dubious evidence about what harm there actually might be and questionable presumptions about what copyright owners should be entitled to say no to when others want to use their works. (It is another point that we made in our comments that if copyright owners can prevent reading, it would significantly expand the list of exclusive rights a copyright grants from what the statutory language currently includes.)
Unlike these decisions, however, the report only collected and collated public comments speaking to these issues; it doesn’t make law itself. The copyright statute is the statute as Congress has written it and courts will interpret it. But the report is influential in how it guides courts and Congress, and so it is important to note that while on its face it appears seemingly exhaustive, plenty of important analysis is missing from it, and thus its ability to effectively influence is commensurately limited.
Filed Under: 1st amendment, ai, copyright, copyright office, data, fair use, free speech, llms, right to read, training


Comments on “The Copyright Office Issues A Largely Disappointing Report On AI Training, And Once Again A Major Fair Use Analysis Inexplicably Ignores The First Amendment”
Companies with their machine models that are meant to become products that they seek to profit off of, moving to gorge said models on people’s labor in ways that inevitably seek to supplant said labor, whilst straining systems and creating insane levels of traffic that are damaging important projects like Wikimedia? You think that deserves to be Fair Use? Really?
Re:
None of your complaints is covered by copyright law.
Re: Re:
It’s not the entire discussion by any means, but profit motive ( (1)the purpose and character of the use, including whether such use is of a commercial nature), supplanting the labor ( (4) the effect of the use upon the potential market for or value of the copyrighted work), and to a lesser extent traffic (via public benefit, which is a part of the fourth factor) are all pretty explicitly considered in the four factors for fair use.
Re:
Fair Use and the First Amendment don’t exist piecemeal.
i personally believe commercial “AI” can go die in a fire, but for training, it inputs works in a far stupider way than any human can do.
Fight AI. Don’t make copyright worse or fuck the First Amendment.
i still think it wrong, but here’s a compromise: Copyright lasts 14 years with the optional 14 year extension, copyright is no-auto (you must register), and AI can’t use copyrighted works.
Re:
From what I’m reading, they’re thinking about the big picture. As they say, people should be free to use tools to read, and if that was deemed not fair use, it’s only a matter of time before a company lawyer will try to argue that tools used for reading violate the copyright of whatever work they’re used for.
And similar to what the article notes, you’re effectively throwing fair use discussion out of the window because of fears related to market harm.
Re: Re:
What the author is doing is drawing a false line of equivalency to what these scrapers and crawlers do when they “read”, and what humans do when they actually read things. This is so that the author can claim that people who disagree with her argument are copyright maximalists who’d be fine with banning human reading and learning.
Re: Re:
They already have. There’s been cases (and precedent) for stuff like Google’s indexing, book snippets, crawlers etc before AI became a big deal.
Re: So I should be barred from reading copyright works too?
So I should be barred from reading copyrighted works because I might take some of another author’s labor and ideas to write my own work with the hopes of supplanting the existing copyright works??
I have read hundreds of books during my life and certainly many of them have shaped how I approach my writings including possibly even my borrowing phrases or directly copying some concepts, pretty much the same way AI works now.
Obviously if I directly repeat existing works that should be stopped, but that should only happen after the new work is generated and determined to be too similar. You (and others) seem to be suggesting that no one should even look at copyright works if they are considering writing their own work prior to that new work being generated.
Re: Re:
You’re a human and not a machine being built for corporations and such to profit off of, so you should be free to read and be inspired all you like. Humans being inspired is different from machines gorging themselves on countless websites and creative works they don’t have the rights to in order for someone else to profit off of sale of that machine’s capabilities it gains through said gorging process.
Re: Re: Re:
What do you think being an employee is all about?
“Companies can ignore robots.txt and engage in what are essentially Denial Of Service Attacks, especially on smaller websites, and call it Fair Use and First Amendment Protected Speech.”
"Right to read"?
Saying that AI is a “right to read” issue is an argument that it IS an illegal copyright violation, for the same reason that I can’t scan all my paper books and make them available for download and claim that I’m just providing tools for people to read those books. That’s straight-up illegal redistribution.
As to the Fair Use question, the point of copyright law broadly, and Fair Use specifically, is, “To promote the Progress of Science and useful Arts.” Generative AI is anti-science and anti-art; what it creates is constantly false (in the case of science) and not even art (since the human input is de minimis). Allowing generative AI goes completely against the copyright clause of the Constitution.
And, since what generative AI produces is not art, is not expression, is not even copyrightable, it’s not even a First Amendment issue on that end. Computers do not have rights. I get that we gave corporations rights in the stupidest Supreme Court decision ever, but let’s not double-down on that mistake with computers.
Re: AI base research got Nobel Prizes on Physics and Chemistry
AI base research got Nobel Prizes on Physics and Chemistry. Thus, AI is at the core of natural science research as “Progress of Science”.
You are totally wrong.
Re:
The term “useful arts” doesn’t mean artistic endeavors. It references applied science and refers specifically in the interpretation of the Supreme Court to inventions.
It seems like you’re myopically attacking popular LLMs and maybe aren’t aware that there are other models that are actually better than humans at tasks we value as a society, such as diagnosing medical conditions. Imagine a doctor telling you he can’t diagnose your condition because he couldn’t afford a thousand subscriptions to relevant medical journals. There is value in pooling human knowledge. We certainly need to improve reliability of LLMs for this purpose to reduce hallucinations, but no human will be able to read and retain as much information and that ability can be highly useful, especially if disconnected from a for-profit, capitalist intention.
Don’t forget that copyright serves greedy corporations far more than the actual artists and writers and scientists who produce works and has been extended for the benefit of those otherwise immortal institutions.
What are Techdirt’s thoughts about the GOP wanting to ban state-level regulation of AI companies?
Because I can only imagine that Cathy and Mike and Co. are excited about allowing a new era of Glorious Innovation to flourish without those dirty bureaucrats getting in the way.
Re:
False dilemma. It’s possible to have regulations and to have innovation at the same time. The only innovation that well-crafted regulations would stifle are the unethical ones that vulture capitalists want to use it for.
Re: Re:
That sounds like a case of the devil being in the details. Well-crafted regulations to do what exactly and who defines what is ‘unethical’?
Re: Re: Re:
The devil is always in the details. That is the nature of complex systems. Anyone selling you on a simple solution is playing you for a fool. Nothing is simple, even when it appears to be.
What is well-crafted and unethical are subjective, so I can’t tell you who you think would be qualified to make those judgments. My personal preference would be for well-educated academics and experts who aren’t captured by corporate greed or conservative think tanks. I don’t agree with them on absolutely everything, but the EFF tends to come up with decent takes on these types of issues.