I actually find this discrepancy super interesting. What I am inferring from this split is the philosophical perspective of whether non-human competition with human works of authorship is the type of “market harm” that would affect a finding of fair use. Alsup brushes off the market harm analysis as just being competition, while Chhabria takes the perspective that AI-originated competition on authorship is more than simple competition.
Well, the question is about whether the competition is fair, not whether copyright should protect authors from competition. (The latter is the wrong simplification to the problem that would lead to wrong conclusion.)
The original copyright law was enacted to stop printing companies from blatantly copy books published by someone else. There, it assume authors have exclusive rights over reproduction of their books (although in practice it's the book publishers that exercise those rights). It did kill competition, and yet the aim was to kill the unfair competition, rather than those that present no harm to the authors' market.
And by following the spirit of this, it's not hard to understand that the "fair use" statute considers the market factor being the most important of the four factors.
Claiming "fair use" by saying "it's just market competition" won't cut it for the purpose of copyright law. We need to address whether that "competition" was fair at all for the book authors. That's why Judge Alsup made a flawed reasoning about the market factor while Judge Chhabria make a correct one.
Tools can’t encourage people to abuse. Tools have no agency or volition. People can do that. And even how a tool is configured doesn’t necessarily determine how a user will utilize it. This just speaks to your bias that you can’t imagine that an LLM is useful for anything other than “abuse.”
MGM v. Grokster. When a tool maker advertises the illegal use of the tool, that tool maker is liable. It's the ruling.
I'm not answering your unfounded assumption about I can't "imagine that an LLM is useful for anything other than 'abuse.'" Because that's irrelevant. You are arguing like marijuana, which does have good medical uses, but many average people would just buy it for bad uses.
Prove that software is intelligent, personified, and capable of expressing its will.
It's the maker of the software that encourage illegal uses, not the software itself, dammit!
Do you have reading disabilities?
But the key aspect what AI clearly infringes is the DERIVATIVE WORKS section. If plaintiffs would focus on derivative works, they could win all AI related lawsuit.
The "fair use" in U.S. law does shield users from infringing the derivative work right. So your proposed focus would not work. (It's Campbell v Acuff-Rose Music case law.)
There will be platforms and publishers that require human authorship verification. There are already writers showing their writing process in online streams. There will be people who want to engage in the act of verification, so there will be curation lists and reviews.
Keep in mind that the drawing process can still be faked by AI:
https://www.reddit.com/r/aiwars/comments/1auoy3x/so_apparently_ai_can_generate_videos_showing_the/
Do others find it disturbing when judges write opinions that read like their own amicus briefs?
Many pro-AI people would want to sway your opinion by presenting incomplete view of facts, even after the lawsuit. When people have a broader view on the social impacts of AI it could become clear that training AI with copyrighted material is unethical at the start (with a few exceptions).
News media will very likely just make the title "AI Training is Fair Use" without letting to grab onto the details of the judgement. You might be thankful that Judges wrote detailed opinion for you to read so that the issues of AI can be further debated in the public. I just don't think that's bad, as many people misunderstood the whole picture of AI and the copyright issue involved.
@MrWilson Why can't the creators persue tool makers that encourage people to "abuse" then?
Not all generative AIs are neutral. Some of them do encourage abuse, like Midjourney and Suno. So it becomes close to Napster case where creators didn't persue every pirate using Napster but aimed at Napster itself in order to break that pirate chain.
The very difficulty of judging market harm by AI is that many of the AI "slops" are indistinguishable from human-made work thanks to the AI training on copyrighted works that make AI better and better at masquerading.
While I think there should be legislation at all AI-generated content must be labeled, it's anyway too early to tell whether the AI "slop" would have an actual impact on book sales.
That's why the conflicting opinions among judges.
Speaking of this, my position is that AI training with copyrighted works is unethical and should be illegal except for AI deployments that do very limited purposes (e.g. translators, summary generators, grammar fixers).
The fair use arguments with AI are going absurd, by the way, by equating AI training with human learning we risk undermining humanity in the AI arms race (especially when they are aiming for "super-intelligence" that is far beyond what fair use has been legislated for).
@terop
Mind you. I don't like the Google Books precedent at all. Even though the regurgitation of 50 words is not much, a malicious users could eventually extract the whole book out of AI by repeat trying the prompts to piece many 50-word outputs together, to make a full version of the book that's infringement.
The Google Book case is a Second Circuit ruling. Theoretically it can be overturned by the Supreme Court, but the aforementioned malicious use has not been seen and the plaintiffs didn't cite any evidence for such. It isn't worth it to appeal this case - it's better to file a suit again with different authors.
The recent paperwork claimed that meta AI output could only reproduce less than 50 words from each individual book, even if you carefully craft the prompt to look for info from that book.
And this fact was used to claim that google book scanning case applies to the situation..
=> so the small amount of infringing data in output is essential part of their case…
People often quote content of a book to express opinions about the books by themselves. Such "quoting for commentary" use are definitely fair within copyright law. Unless you quote too much making your commentary effectively substituted the book sales.
There is another weakness to the genAI fair use claim: There is a possibility that the regurgitated portion end up in another book for sale on that is also same purpose for the original author (e.g. quote from novel end up in another commercial novel; or quote from news article for publishing another news without crediting the original source). That could defeat the fair use. Judge Chhabria might have anticipated this "unfair use" in mind, yet plaintiffs didn't argue. And so he had to rule Meta as marginally fair use, and yet with a lot of warnings.
@terop You didn't read the case of Google Books and made the wrong assumption. Google did index the full content of the books.
And as Judge Chhabria had ruled, you need to point out evidence that generative AI "obfuscated" the sources before your infringement claim works. Note that it's not I like AI, it's that the infringement claim needs stronger evidences in order to work. And hell, I know data laundering is a serious moral issue, but that thing doesn't lead to your conclusion.
But purchasing a CD, listening to the music, learning to play guitar and understand chord progressions and then writing your own music using the skills you learned is perfectly legal. Adding “with a computer” shouldn’t magically make that different.
This argument is fine for music maker applications. But probably not for music-generating AIs (Suno & Udio). When it comes to AI it cannot be equated with human learning, because there is no so-called "skills". Rather they are more about "samples" and the quality of those samples.
Let’s take the machine out of the process. You sit a chimp in front of Bob Ross episodes and the chimp learns to paint. The chimp paints a painting that isn’t a copy of a Bob Ross painting. Is the chimp violating copyrights? No, of course not. If you sell the chimp’s painting, you’re also not violating copyrights just because the chimp learned from watching Bob Ross. The chimp being supplanted by a machine doesn’t change the legal foundations of the process.
I would say this is a good analogy, MrWilson, but details can matter.
If the chimp's painting is substantially similar to Bob Ross (think of when the chimp photographed the painting rather then redraw with a brush), then the chimp's output can still infringe copyright. Except that the chimp can't be sued for infringement, it would be the human redisributor of the chimp's work that is liable of infringement.
If there's no substantial similarity (ignoring the aspect of "style" copying which is out of scope of copyright law but may be in scope of trademarks), then of course there's no infringement. The chimp's work would be uncopyrightable, by the way, assuming it can draw decently.
So it's not always yes or always no, it's the details.
There’s a lot of money to be made in licensing deals if media companies can force everyone to license training data. That doesn’t mean it will be overturned on appeal. It could be. We’ll see.
Considering that Judge Chhabria has also ruled on the case now. I won't debate on this part further. Judge Chhabria's arguments are much better than Judge Alsup's.
such as a human clicking a camera button to take a picture, even though it’s the camera that is actually capturing the image – that’s a debate for another day
I don't think a debate is needed on this. The key is the amount of human creative control that determines copyrightability. When the AI generates significant part of image/music/content that the human has no control of (as if, most of the internal decisions are black box), those part would be uncopyrightable. And the USCO has recognised certain works with AI and registered copyrights for them, albeit each of them carries a waiver (on which parts are uncopyrightable). (I would call these cases as "partial copyright" protection, in constrast to full copyright.)
Well then. The ruling on Meta's fair use is out.
https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.598.0_1.pdf
Judge Chhabria grants fair use on Meta, but reluctantly. He complains the plaintiffs were bringing wrong arguments and warns that, if the evidences were presented differently, the judgment would be different.
This fair use is limited to this case against the 13 plaintiff authors only. This is not a class action, so other authors can still sue Meta under different evidences.
Judge Chhabria rejects Judge Alsup's reasoning (Bartz v. Anthropic) about AI learning is like human learning (dispite the conclusion granting fair use for Meta). In the opening section: "[W]hen it comes to market effects, using books to teach children to write is not
remotely like using books to create a product that a single individual could employ to generate
countless competing works with a miniscule fraction of the time and creativity it would
otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the
fair use analysis."
Judge also noticed the Meta AI can regurgitate part of plantiff's works. But no more than 50 words. The limit is less than what Google Books could output in the Google Books case (Authors Guild v. Google).
Judge rejects the assumption that AI only copies unprotectable ideas during training (as in the U.S. Copyright Office report).
On the market factor, market dilution is addressed (as in the U.S. Copyright Office report), but Judge criticizes on plaintiffs' lack of evidence on the harm of the plaintiffs' books (lost sales, etc.) even when dilution is possible.
Judges gives a warning that this fair use grant doesn't mean Meta's copying was lawful. In other words the AI companies should seek for licences for training data anyway.
@terop
In the U.S. the "fair use" in copyright law is ruled by the court in a case-by-case basis. Rather than listing which particular cases are fair use, the statute mandates four consideration factors (17 U.S. Code § 107). The judges will evaluate the four factors of fair use separately and then combine the factors together for the overall conclusion. The judges will also reference precedents so that similar cases would evaluate fair use in similar way.
fair use should be limited to sentences of size 6 words or smaller. Currently they’re asking fair use to apply to terabytes of data, and they’re not considering the work amounts that went into collecting those databases(much less creating the material from scratch).
The sad fact is there was a case nicknamed "Google Books" (Authors Guild v. Google) that had ruled fair use even when Google scraped terabytes of data. It's a book search and indexing engine, and the courts gave that fair use. So it isn't about the amount of data scraped. Even terabytes can be fair for a search engine.
If companies paid proper money amounts for the data, the AI databases would cost significant amount of money, millions of dollars.
And yes this is why the AI companies try to lobby and try to gain fair use for everything they scraped. (They had fair use for search engines and are trying to push that for generative AI.)
Why should your company get access to huge database of data, when the same data is unavailable for use for everyone else who follows copyright law?
Good point. And this is why the recent Anthropic case the judge denied fair use on pirated books (I totally agree on this part despite the rest of the rulings are significantly flawed.)
Basically none of the AI companies executed the proper process of dividing the data to small pieces and obtaining separate license for each piece from its author. They think it’s too burdensome, but copyright law thinks that they should not use that much data, since creating it from scratch is also burdensome.
Note. In the case of book search engines, creating data from scratch won't make sense. There are also another case (sorry I can't find a case law for this) of a plagiarism detector when the machine needs to keep the full copy of the books so that it can used to find plagiarism on users' inputs.
[W]e learned copyright law, the conclusion was that everything in internet is illegal to use in your own product. There simply wasn’t licenses available for the data.
That is partly true. Most contents published on the internet are not allowed for commercial reuse. But there is a subset of data that comes with explicit licenses such as Creative Commons that would permit you to use it without contacting the author. (I would argue that, with proper attributions, AI can be trained with Creative Commons licensed works. It's just that we didn't see AI companies attribute the sources when they train AIs.)
They should develop technologies that use less data. Make their AI algorithms work with smaller datasets.
Or in the alternative, obtain licenses for all the datasets. This is how large, open-source software (such as Linux) thrives.
Why is training magically not fair use when other transformative uses are? If the material is obtained legally, why isn’t the use legal?
A better analogy is that when you buy a CD from a music store, it grant you a license for personal (and home) enjoyment of music, but you are not to play that music on your workplace or store.
Purchasing the CD does not imply a license for commercial use of that music.
Just to mention, I strongly suggest this case will be appealed. Judge Alsup's reasoning is deeply flawed and it focused too much on "transformative"-ness that it engulfed other considerations of fair use. Also it erroneously equated machine learning to human learning (I've suggested this equation shouldn't hold because there is no legal personhood for machines; not founded in any constitution of any country).
@terop
Regarding the Bartz v. Anthropic summary judgements, the opinion I saw are mixed. In particular, creators are not happy.
The only good side of this judgement is that piracy is likely a game over for AI companies now. (I'm talking about Meta and OpenAI, too.)
I have >50% confidence that the fair use judgement for the case will apply for a appeal. Because this "training is fair use so long as you legally acquired a copy" would mean a greenlight to OpenAI and Google scraping billions of web pages simply because they're available gratis (for free). This is a terrible precedent for e.g. news companies that publish content mostly on the internet, and independent bloggers and writers.
How is RIAA able to get contracts to top-level artists, if they’re doing nothing to the benefit of those artists? Copyright gives copyright ownership to the artists when the product is created, so riaa had to do something to get access to the copyright ownership.
Only top-level artists. RIAA doesn't care small artists along the way. So small artists have to file separate lawsuits against Suno and Udio (AI music generators) in order to demand a share from them. (And they have filed suits, Justice v. Suno and Justice v. Uncharted Labs.)
It’s a non-infringing use. There is no legal precedent that human learning from copyrighted content is a copyright infringement (and to be clear, we’re just talking about learning, not some unrealistic impractical convoluted scenario you’ve invented where infringement is assumed).
Define "learning". Because it looks like we have definition difference for the term.
That has nothing to do with human learning. That case was about intermediate copies, not non-fixed human brain learning.
Again, define "learning", before I can go with this debate.
It is the process of the model learning, not a translation or encryption or compression of the training data. You can’t take the weights and faithfully reproduce the training data.
Again define "learning" when it comes to machine as well.
The data the model contains is not a different form of the training data. It is different data. It is data created by the model.
Created through what?
There’s nothing to be derivative of in the model.
Then the models comes from thin air?
And because it’s derivative work it needs copyright license to distribute, period.
You can’t cite case law or law that says this.
You didn't read any recent lawsuit about AI training, did you? There is one case law, Thomson Reuters.
If no case law or law has yet found LLM training to always require licensing, then it is legal until such case or law becomes a precedent.
You can only say this before March, 2025.
And I can wait until another case law's summary judgment is made, which is no later than this year.
The big media corporations will get the licensing funds and very few creators will get much of anything. Creators will be pressured to license their work for almost nothing when signing new contracts. The future you think you’re railing against and that you think can be prevented by successful lawsuits will not be stopped by these methods.
It's marginally better for a creator to be "forced" to sign a licensing contract, than have their works taken without permission!
For two things: (1) the creator can get paid under the contract; (2) if the contract turns out to be unfair labor practice or anything, they can sue with sufficient evidence.
Generative AI output isn’t by default published to a market. That’s a false claim.
"By default"? What the fuck again? How can the companies be charged with secondary liability of copyright infringement only when the output is published "by default"?
By the way, there's a recent complaint. Midjourney keeps the user's image generation for public viewing in its "Explore" pages, and Disney (and Universal) cited the Explore pages for infringement proof. Say to Disney that it's not "published".
it’s a strong argument in favor of fair use. It frequently is.
Which argument?
I said the 10th Amendment was the source of the fact that it was legal before the expansion of the length of copyright.
And yet the expansion of copyright through Congress wasn't illegal. End of story.
I said US citizens were deprived of the ability. I didn’t even call it a right. You did.
'Oh! I am "deprived" of the ability to kill! I am "deprived" of the ability to commit adultery! I am "deprived" of the ability to steal!'
Seriously, what the fuck?
I do condemn corrupt SCOTUS decisions. They have made unconstitutional decisions at times. Would you argue that a SCOTUS justice isn’t capable of being corrupt or making a biased decision that contradicts the Constitution?
That is a political question. I don't reply this because it's out of scope. You go persuade the politicians because this is nothing to do with me.
and you’ll get pressured to accept a contract that includes the licensing for nothing or almost nothing.
It's already happening even without the suit! Google admits that it uses YouTube videos to train their video generating AI, Veo 3 (https://www.cnbc.com/2025/06/19/google-youtube-ai-training-veo-3.html)
And there's no way to opt out!
YouTube Terms of Service
"By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content (including to reproduce, distribute, prepare derivative works, display and perform it) in connection with the Service and YouTube’s (and its successors’ and Affiliates’) business, including for the purpose of promoting and redistributing part or all of the Service." (Emphasis added)
You’re literally fighting for someone else’s profit at the expense of poor people.
I'm fighting for my profit. And dismiss "poor people" again. (This "poor people" is spamming bullshit as there is no single witness showing up. Why the fuck should I assume "poor people" exists or why should I care for them?)
You are saying you have no moral stance here. You’re just out for your own. You don’t care about others. That means no one is going to care about you or your pet issues.
My moral stance is respect copyright and no copyright exemption for AI training. There is no "others" you mentioned that I should care about. Bring a witness here, or else I dismiss.
Why do the small creators need LLM to do anything?
Ask them.
I ask you. Because you brought this bullshit argument about "poor people" need access to LLM.
You assume I’m speaking for them even though I’ve said and you’ve agreed that I represent myself. But you’re also ignoring that it’s not just about creators. It’s about all US citizens, including the majority whose existence you dispute.
I dispute even the "majority" word of yours here.
I would rather have polls like this before you claim you opinion is the majority:
https://theaipi.org/poll-biden-ai-executive-order-10-30-5/
Except it’s not. You claimed children learning to write by reading copyrighted works was a copyright infringement. It is not.
There is no blanket fair use for children's learning, mind you.
American Geophysical Union v. Texaco case shows that intermediate copies could consistitute infringement. For this particular case it's employees of a for-profit corporation learning and it's human learning case that was ruled not fair use.
It would have to have the sentences and paragraphs to be able to translate it. You can’t translate if you don’t have the original text. The model doesn’t have the original text!
Because in the AI "pre-training" phase the text has been translated to model weights! It does not need to be text to constitute "copies".
And you can literally legally learn to write from copyrighted content. This isn’t an analogy. This is literally my lived experience. If I remember how to write based on what I’ve read, it’s legal. I often don’t remember the specific sentences I’ve read but the method of writing is retained, such that the “data” that I’ve trained myself on isn’t even present in my head. I can’t quote word for word much of some writer’s expression despite remember plots, characters, and writing style.
Why the hell should I care about your learning experience? Are you AI and not human?
The models don’t contain the original trained data.
You keep presenting this as a fact while I dispute this many times! The model contains data in a different form than what the copyright content was originally "fixed" in. And for the purpose of copyright, the model itself is a derivative work!
And because it's derivative work it needs copyright license to distribute, period.
I’m not talking about these lawsuits you keep bringing up. You’ve brought them up, not me. You keep pretending I’ve been defending the actions taken by the AI companies. You really, seriously, definitely need to read what I first said at the start of this whole thing. It would save you so much time. Here, I’ll do it for you, again:
“My frustration with the arguments of people claiming it’s not fair use and that all training must be licensed is that many people seem to think they’re championing the little guy when they’re inadvertently advocating for the benefit of the wealthy and corporations.”
If you don’t disagree with every part of what I said there, you should stop responding.
You cited no single case where AI training is not infringing or "fair use" because there is none (yet). Stop making the claim that you can train AI without copyright license!
Not necessarily. The uses can be infringing without the model being infringing, the same way a VCR can record content off TV and the person making the recording can violate copyright by selling it without authorization or they can use it for time-shifting their fair use viewing. If your assertion were true, then any technology that can be used for an infringing use would be illegal. [...] Again, you don’t understand US copyright law.
If only the model is not infringing (which is false premise already, and I don't need to argue about the further what-if scenario).
(And I wish there are AI models with data fully licensed, damn it! But what we've seen are AI companies trying to defend their scraping is fair use while their arguments don't hold. Thomson Reuters v. Ross is an example case.)
You could reproduce a significant portion of the original work for a classroom assignment and it won’t be a market substitute at all.
No. You've the confused between the exemption for schools (§ 110(1) and (2)) and the fair use exemption (§ 107). I'm not talking about the § 110 case. Copying for school classroom use is already exempt so I don't need to address the fair use four factors.
[Factor one] That’s not the purpose of a book report. It’s education.
Factor one must be evaluated with the ultimate purpose of the use. So it's not "education" in the intermediate step, but the commercial publishing of the notes/book that matters. Texaco case law.
[Factor three] You’re saying they use a lot when they likely wouldn’t. This is like saying, “if someone owns a gun they’re a murderer because I made up a scenario where they murdered someone with their gun.” You are starting from the conclusion. That’s intellectually dishonest.
Except that you are refuting a straw man. Even when what you say is true here, Factor Three would still rule neutrally here.
[Factor four] It doesn’t create a market substitute. You’ve only claimed that a teacher would ask a student to publish their commentaries about Stephen King’s work, which isn’t likely. What publisher is going to print that? Are they self-publishing? Whose doing the formatting? Is this for an assignment? Does the curriculum for the class cover this purpose? Your scenario doesn’t make any sense.
Did I say the hypothetical scenario has to make sense? This is actually what generative AI has been doing as a metaphor. So no need to question whether there is a "publisher" who would print that because there just is.
Wait, the data is transformed? Would you say the process of transforming something is… transformative?
USCO even said that generative AI outputs are transformative, so what's the issue here?
You mistakenly believing "transformaive = fair use" is the issue.
I said that the public was deprived of the use of the public domain works within their own lifetime. And I provided the source of that.
There is no such right of "using works within their lifetime"! Even with your quote on U.S. Copyright Act of 1790 doesn't say there is such right.
Your quoted acts only says about the copyright of 14 years, but then, there is nothing unconstitutional for extended that lifespan to "life + 50 years" or "life + 70 years". When you insist on the a right that doesn't exist in statute, there is nothing to be "deprived" of.
Then link to the ruling and not to a propaganda organization.
The Supreme Court has been making some unconstitutional decisions. Many justices are openly corrupt now. Many of their appointments were the result of corruption and unlawful activity. I won’t be surprised if any particular SCOTUS decision goes a way I’d disagree with.
So you have more authority than the Supreme Court? Get it.
Setting a legal precedent that all LLM training requires licensing won’t stop that.
I would go for setting a legal precedent for that whether you like it or not. Call me a bad guy whenever you want, because you even disregard the Supreme Court.
You are siding bad guys who are exploiting the creative labor of humans. That’s literally what I’m railing against! I am the exploited. You’re cheering on the people who have exploited me! I know you aren’t the good guy.
At least I can get paid for my works! I don't care whether you are exploited! I even dismiss your claim about "poor people" as without evidence. Are you happy now?
You’re pretending like AI companies are the only exploiters out there. It’s the whole damn system!
And why the heck should I let AI companies keep exploiting anyway? I have no obligation to fight what you called the "whole damn system".
[If] LLM training data must be licensed, wealthy LLM companies will still be able to afford to license them (and small creators will get virtually nothing), and so the LLMs will get trained and used and pushed on us without our consent.
Why do the small creators need LLM to do anything? You didn't answer this question when I asked before, so the rest of the outcome you suggest is nonsense!
"LLM gets pushed on [me] without [my] consent"? What the fuck?
I’m talking about anyone who might ever want to train an LLM in the future in the US, you know, including those poor people and students and researchers you don’t apparently think exist.
Because LLM is a luxury and not everyone can get the luxury however you tried. The LLM requires a server farm which means smaller companies won't have that luxury to deploy an LLM unless they can rent the servers from someone else. This is unchangeable. A better world would be every personal computer be able to run SLMs (Small language models) instead of rely on LLMs for most computing tasks that need AI. Assuming the SLMs have training materials that are all licensed, by the way.
Another evidence about "fake speedpaint" exists: https://www.youtube.com/watch?v=E9vryBnKVz8 The term to search for is "fake speedpaint".
@BeerOnTap
Many pro-AI people would want to sway your opinion by presenting incomplete view of facts, even after the lawsuit. When people have a broader view on the social impacts of AI it could become clear that training AI with copyrighted material is unethical at the start (with a few exceptions). News media will very likely just make the title "AI Training is Fair Use" without letting to grab onto the details of the judgement. You might be thankful that Judges wrote detailed opinion for you to read so that the issues of AI can be further debated in the public. I just don't think that's bad, as many people misunderstood the whole picture of AI and the copyright issue involved.@MrWilson Why can't the creators persue tool makers that encourage people to "abuse" then? Not all generative AIs are neutral. Some of them do encourage abuse, like Midjourney and Suno. So it becomes close to Napster case where creators didn't persue every pirate using Napster but aimed at Napster itself in order to break that pirate chain.
The very difficulty of judging market harm by AI is that many of the AI "slops" are indistinguishable from human-made work thanks to the AI training on copyrighted works that make AI better and better at masquerading. While I think there should be legislation at all AI-generated content must be labeled, it's anyway too early to tell whether the AI "slop" would have an actual impact on book sales. That's why the conflicting opinions among judges. Speaking of this, my position is that AI training with copyrighted works is unethical and should be illegal except for AI deployments that do very limited purposes (e.g. translators, summary generators, grammar fixers). The fair use arguments with AI are going absurd, by the way, by equating AI training with human learning we risk undermining humanity in the AI arms race (especially when they are aiming for "super-intelligence" that is far beyond what fair use has been legislated for).
@terop Mind you. I don't like the Google Books precedent at all. Even though the regurgitation of 50 words is not much, a malicious users could eventually extract the whole book out of AI by repeat trying the prompts to piece many 50-word outputs together, to make a full version of the book that's infringement. The Google Book case is a Second Circuit ruling. Theoretically it can be overturned by the Supreme Court, but the aforementioned malicious use has not been seen and the plaintiffs didn't cite any evidence for such. It isn't worth it to appeal this case - it's better to file a suit again with different authors.
@terop You didn't read the case of Google Books and made the wrong assumption. Google did index the full content of the books. And as Judge Chhabria had ruled, you need to point out evidence that generative AI "obfuscated" the sources before your infringement claim works. Note that it's not I like AI, it's that the infringement claim needs stronger evidences in order to work. And hell, I know data laundering is a serious moral issue, but that thing doesn't lead to your conclusion.
- If the chimp's painting is substantially similar to Bob Ross (think of when the chimp photographed the painting rather then redraw with a brush), then the chimp's output can still infringe copyright. Except that the chimp can't be sued for infringement, it would be the human redisributor of the chimp's work that is liable of infringement.
- If there's no substantial similarity (ignoring the aspect of "style" copying which is out of scope of copyright law but may be in scope of trademarks), then of course there's no infringement. The chimp's work would be uncopyrightable, by the way, assuming it can draw decently.
- So it's not always yes or always no, it's the details.
Considering that Judge Chhabria has also ruled on the case now. I won't debate on this part further. Judge Chhabria's arguments are much better than Judge Alsup's. I don't think a debate is needed on this. The key is the amount of human creative control that determines copyrightability. When the AI generates significant part of image/music/content that the human has no control of (as if, most of the internal decisions are black box), those part would be uncopyrightable. And the USCO has recognised certain works with AI and registered copyrights for them, albeit each of them carries a waiver (on which parts are uncopyrightable). (I would call these cases as "partial copyright" protection, in constrast to full copyright.)Well then. The ruling on Meta's fair use is out. https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.598.0_1.pdf
Remember there was a lawsuit between Universal and Nintendo about the King Kong vs. Donkey Kong trademarks?
@terop In the U.S. the "fair use" in copyright law is ruled by the court in a case-by-case basis. Rather than listing which particular cases are fair use, the statute mandates four consideration factors (17 U.S. Code § 107). The judges will evaluate the four factors of fair use separately and then combine the factors together for the overall conclusion. The judges will also reference precedents so that similar cases would evaluate fair use in similar way.
The sad fact is there was a case nicknamed "Google Books" (Authors Guild v. Google) that had ruled fair use even when Google scraped terabytes of data. It's a book search and indexing engine, and the courts gave that fair use. So it isn't about the amount of data scraped. Even terabytes can be fair for a search engine. And yes this is why the AI companies try to lobby and try to gain fair use for everything they scraped. (They had fair use for search engines and are trying to push that for generative AI.) Good point. And this is why the recent Anthropic case the judge denied fair use on pirated books (I totally agree on this part despite the rest of the rulings are significantly flawed.) Note. In the case of book search engines, creating data from scratch won't make sense. There are also another case (sorry I can't find a case law for this) of a plagiarism detector when the machine needs to keep the full copy of the books so that it can used to find plagiarism on users' inputs. That is partly true. Most contents published on the internet are not allowed for commercial reuse. But there is a subset of data that comes with explicit licenses such as Creative Commons that would permit you to use it without contacting the author. (I would argue that, with proper attributions, AI can be trained with Creative Commons licensed works. It's just that we didn't see AI companies attribute the sources when they train AIs.) Or in the alternative, obtain licenses for all the datasets. This is how large, open-source software (such as Linux) thrives.@terop Regarding the Bartz v. Anthropic summary judgements, the opinion I saw are mixed. In particular, creators are not happy. The only good side of this judgement is that piracy is likely a game over for AI companies now. (I'm talking about Meta and OpenAI, too.) I have >50% confidence that the fair use judgement for the case will apply for a appeal. Because this "training is fair use so long as you legally acquired a copy" would mean a greenlight to OpenAI and Google scraping billions of web pages simply because they're available gratis (for free). This is a terrible precedent for e.g. news companies that publish content mostly on the internet, and independent bloggers and writers.