MrWilson (profile)

September 23, 2025 at 9:35 pm

As an author whose work has been used to train LLMs, I don’t expect to make much of anything off of lawsuits or settlements or licensing deals. The publishers exploit authors and pocket the bulk of the profit already, so pretending we’ll be first in line for a payout is unrealistically optimistic.

Much like Etsy sellers work for Etsy’s profit and gig drivers work for Uber and Lyft’s profits, authors work for publishers’ profits. They’re just brilliantly outsourcing the operational costs to you. If you’re raging against LLM companies exploiting authors but not also raging against the big media companies exploiting the authors for far longer, you’re betraying the fact that you don’t give a fuck about authors.

TKnarr (profile)

September 23, 2025 at 10:29 pm

One of the biggest obstacles to this settlement will, I think, be the limited scope of the authors who qualify as class members. Among other things they have to have registered the copyright (or have had it registered by their publisher) before the class action was filed. A huge number of authors didn’t have copyrights registered because you only need that done before you sue over them. Most publishers don’t bother registering until that point. All the authors who fall into that won’t be covered by the settlement and can sue Anthropic on their own or as another class. Alsup’s already commented on this. If he thinks this will leave that much liability unsettled, he may decide to nix the settlement or expand the class to cover the majority of authors but keep the $3000/work figure (which’d increase the amount Anthropic would have to pay by a lot).

Amusingly it’s the tech companies that put Anthropic in this position. The DMCA explicitly made transient copies and copies necessary for software to work non-infringing. In their quest to get click-through EULAs and ToS enforceable, the tech companies gutted those provisions through court decisions. Absent that, Anthropic probably could have argued it was only liable for those works it’s software actually regurgitated verbatim, not the entire database behind it.

Arianity (profile)

September 24, 2025 at 3:32 am

Re:

Amusingly it’s the tech companies that put Anthropic in this position. The DMCA explicitly made transient copies and copies necessary for software to work non-infringing. In their quest to get click-through EULAs and ToS enforceable, the tech companies gutted those provisions through court decisions. Absent that, Anthropic probably could have argued it was only liable for those works it’s software actually regurgitated verbatim, not the entire database behind it.

That’d be unlikely given the particulars, I think. They’re not getting hit for the entire database, just the shadow library from pirated works. And part of Alsup’s ruling explicitly calls out that it was a “forever library”, not a transient one.

TKnarr (profile)

September 24, 2025 at 1:44 pm

Re: Re:

Possible, but they could make the case that it was a cache the same as your browser uses to store copies of (copyrighted) web pages so it doesn’t have to fetch them from the server every time you view them. Just like that browser cache the shadow library wasn’t distributed to anyone else, and caching like that was one of the things the DMCA’s provisions were supposed to protect.

Arianity (profile)

September 24, 2025 at 3:23 pm

Re: Re: Re:

The problem is that they were using it more generally. As the ruling puts it:

Building a central library of works to be available for any number of further uses was itself the use for which Anthropic acquired these copies. One further use was making further copies for training LLMs. But not every book Anthropic pirated was used to train LLMs. And, every pirated library copy was retained even if it was determined it would not be so used. Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use

Anonymous Coward

September 24, 2025 at 7:33 am

Re:

A huge number of authors didn’t have copyrights registered because you only need that done before you sue over them. Most publishers don’t bother registering until that point.

Most authors don’t have a publisher, and never will. But they’ll also never be on LibGen.

Amusingly it’s the tech companies that put Anthropic in this position.

“Tech” meaning what in this case? Publishing? Given that the idea of a “company” is itself technology, “tech company” is little more that a term that means whatever the hell the person using it wants it to mean.

TKnarr (profile)

September 24, 2025 at 1:46 pm

Re: Re:

Tech companies like Microsoft, Oracle, Blizzard and so on. The ones who need copyright to back up their claim that even if you bought their software you had to make a copy of it onto your hard drive to install it and you hadn’t bought a license to do _that, and the only way to get that license was to agree to their EULA first (which you never saw until you tried to install the software).

Anonymous Coward

September 24, 2025 at 1:03 am

Others worried that the overall size of the settlement – $1.5 billion – meant that only the largest companies could afford to pay such sums, shutting out smaller startups and limiting competition in this nascent market.

That is worrying. No company should be able to afford to pay for deliverately breaking the law, it should be a given that their entire capital structure is wiped out overnight. In point of fact, we should skip the matter of fines entirely and just wipe out the capital structure directly.

As for smaller companies, they can go ahead and purchase those books for the market value of $10 million like everyone else.

Anonymous Coward

September 24, 2025 at 1:24 am

This confirms once more that when it comes to copyright and its enforcement, there is one law for the rich corporations, and another law for the rest of us.

As opposed to all other aspects of the law, which are scrupulously fair without consideration of wealth or social status. /S

Ehud Gavron (profile)

September 24, 2025 at 3:21 am

Walled culture

This simmering crap has come to a boil.

You own the rights to the content you produce as per the copyright laws or equivalent in your jurisdiction. Copyright maximalists (pirmarly in the US) pretend they own much more and courts are loathe to rule on this.

If you CHOOSE TO MAKE YOUR WORK PUBLIC there are some rights you voluntarily give away. To later claim that someone owes you millions or billions of dollars for using what YOU CHOSE OT PUT OUT isn’t in any way rewarding you for being stupid nor for being creative.

LLMs are a thing now. LLM training on publicly-available materials is a thing too. Piracy is a loaded pejorative term that is NOT a legal term, a term of the art, etc. Somali dudes on a skiff attacking a ship — that’s piracy. Anthropic scraping the net for publicly available documents – not it.

But as in any case where politics interferes with the law, “justice” stands aside for “lobbyists.”

So here we are. Anthropic wanted out quick with a cheap deal. Now it’s in legal limbo. The real problem is that this “piracy” schtick should have been deal with back in the Napster days.

But it wasn’t. Too many pussies eager to “settle up cheap.” And now we reap what they sowed.

Anonymous Coward

September 24, 2025 at 10:42 am

Re:

Anthropic scraping the net for publicly available documents – not it.

It’s not even clear that it’s illegal. LibGen may be violating copyright, but the people who download from there? The trial didn’t get far enough to find out. In general, I don’t recall anyone getting into much trouble for downloading.

So here we are. Anthropic wanted out quick with a cheap deal.

I wonder about that. I mean, did they want “cheap”? They’re valued at $183 billion. Paying 1% of that to lock out their competitors is pretty good value. Even 5 or 10 percent might be. Anthropic just need to make a bit of a show about it being “punishing”, to keep too many people (especially the judge) from noticing that.

Anonymous Coward

September 24, 2025 at 7:54 pm

Re: Re:

Those valuations are a hilarious, dangerous fucking joke.

Anonymous Coward

September 24, 2025 at 9:16 pm

Re: Re: Re:

That doesn’t really matter, as long as Anthropic can get $1.5 billion from its investors. Then they can pay off the publishers, effectively booting from the game any would-be competitors who can’t get $1.5 billion each.

Arianity (profile)

September 24, 2025 at 12:27 pm

Re:

If you CHOOSE TO MAKE YOUR WORK PUBLIC there are some rights you voluntarily give away.

Copyright is not one of them.

Anonymous Coward

September 24, 2025 at 2:56 pm

Re: Re:

True, because despite the name, copyright is not a right. It’s a government-granted monopoly, a privilege that comes with certain limitations (notably “fair use”, and the similar ideas in other countries).

Arianity (profile)

September 24, 2025 at 7:18 pm

Re: Re: Re:

True, because despite the name, copyright is not a right.

Eh, it depends on how you’re using the term. U.S. law explicitly calls it a right (or rights, plural). Essentially, it’s a form of property rights. It’s the correct legal term, at least in the U.S.

Regardless of what you want to call it though, it’s not something that is lost when a work is published, unless they’re confusing it with public domain. Which these works are not, and isn’t applicable here.

Anonymous Coward

September 24, 2025 at 11:41 pm

Re:

It’s not burglary if the door was unlocked.

n00bdragon (profile)

September 24, 2025 at 4:48 am

If a child borrows a book from a friend (let’s just say it’s a famous one like… oh, the Lord of the Rings, for the purpose of argument), actually heck, let’s say he’s a brazen little thief and steals the Lord of the Rings from Barnes & Noble, reads it, and then goes on to become a well known published fantasy author, can he now be sued for copyright infringement?

His stories are all about elves and dwarves and wizards and dragons, but there’s nothing straight ripped line for line from LOTR. He was however, essentially, trained on a pirated dataset.

MrWilson (profile)

September 24, 2025 at 7:26 am

Re:

Which is why the actual copyright infringement of downloading and storing the works, unrelated to the training of the LLMs, was the only actionable part of the lawsuit. But the ~~authors~~ lawyers are using it as a proxy to attack LLM training despite not getting traction on that topic

Anonymous Coward

September 24, 2025 at 9:10 am

Re:

can he now be sued for copyright infringement?

Probably. Who’d even heard of a “sua sponte” dismissal before Trump’s recent lawsuit was dismissed by the judge? In almost all other cases, anyone can be sued for anything.

His stories are all about elves and dwarves and wizards and dragons, but there’s nothing straight ripped line for line from LOTR.

Years ago, I started reading “The Sword of Shannara” by Terry Brooks. I found it to be little more than a Lord of the Rings re-telling, with the names changed. Terry was “ripping” from LOTR as much as these “A.I.” services are “ripping”, and got away with it.

Anonymous Coward

September 24, 2025 at 7:56 pm

Re:

No, but he could be tried as a thief (a literal thief!) if the statute of limitations has not been exceeded.

Anonymous Coward

September 26, 2025 at 12:43 am

Re:

I’m beginning to realise that the reason some people get fooled by a thing that does not think is that they have no first-hand experience of the activity.

Anonymous Coward

September 24, 2025 at 2:57 pm

Authors who didn’t sell out to publishers will get paid.

I may be in a for a bit of a windfall here.

AI slop took over the internet because authors who created quality works weren’t properly compensated.

Anonymous Coward

September 25, 2025 at 8:32 am

Although ChatGPT is named in the article, it doesn’t get trained the way other LLMs might be. The conversation I had with it clarifying this follows:

Since you can’t access external links (as I learned yesterday), how are you trained?

Great question! While I can’t browse the web or access live links, I was trained using a huge dataset of text from a wide variety of sources, such as books, websites, articles, and other publicly available written material. This training process involves learning patterns, grammar, facts, reasoning skills, and even some nuances of language.

Here’s how it generally works:

Data Collection: My training data consists of large text corpora—basically, massive collections of text data that have been pre-processed to remove any personal or private information. The data spans a broad range of topics, which helps me understand a variety of subjects, from science and history to literature and casual conversation.

Training Process: During the training, I use a method called unsupervised learning, where I predict what word or phrase comes next in a sentence based on the previous ones. The more text I process, the better I get at making these predictions and understanding the structure of language. This helps me “learn” the connections between words and concepts.

Fine-Tuning: After the initial training, I undergo fine-tuning, where the model is adjusted to behave more safely and responsibly. This involves using curated datasets and sometimes reinforcement learning to guide my responses to be more accurate, coherent, and respectful.

Evaluation and Iteration: After being trained, my responses are evaluated for quality, and any issues (like biases or errors) are addressed in future updates. This makes me more reliable and accurate over time.

While I don’t access the internet in real-time or update dynamically, my responses are based on the information I was trained on up until my last update (September 2021 for factual information). So, if I give an answer about current events or new trends, it might be a little outdated!

Does that help clarify how it works?

MrWilson (profile)

September 25, 2025 at 10:13 pm

Re:

You’re using an old model or something. It can search the live internet now. Also, you can’t trust it’s claims on how it is trained. It might be hallucinating details.

Anonymous Coward

September 27, 2025 at 3:09 am

Re: Re:

TIL: ChatGPT 4 is an old model.

Anonymous Coward

September 27, 2025 at 3:17 am

Re: Re:

According to the definition of hallucinations, only animals can have them, making you far more likely to be hallucinating than any AI.

Anonymous Coward

September 26, 2025 at 12:35 am

Re:

“I asked the lying machine to tell me how it works and it said it received the truth graven on tablets of gold from the hands of the Almighty.”

Anonymous Coward

September 27, 2025 at 3:15 am

Re:

Actually, ChatGPT can access the internet, but it’s limited in how it does so. Here’s the conversation I just had with it:

Somebody’s just said you can’t access the internet. Do you think this is a problem because it makes your info out of date, or do you think it’s good because it means you can’t be turned into a Nazi cesspool the way Grok was?

That’s a really thoughtful question, and it actually gets to the heart of one of the biggest design trade-offs with systems like me.

You’re right on both counts:

Not having unrestricted internet access does mean my information can get out of date*, especially on fast-moving topics like current events, science, or politics. My knowledge was trained on a huge amount of information up to my last update, but after that I can miss developments unless I’m explicitly given a way to fetch new data.

But being disconnected from the open web also keeps me much “cleaner.” It means I’m far less likely to absorb whatever’s trending on social media or fringe forums—which can be misinformation, hate speech, or outright extremist ideology. What happened with Grok is a good example of the risks: a model that’s hooked directly into live online discourse can get shaped (or warped) by toxic communities very quickly if it’s not carefully filtered.

So, in a way, it’s a deliberate safety measure. OpenAI tries to strike a balance by letting me use the web when needed (through a controlled tool) but not just roam the internet like a person does. That way:

When you need fresh info, I can do a targeted search.

But I don’t constantly ingest unmoderated, real-time content, which reduces the chance of being hijacked or echoing the worst parts of the internet.

If I were permanently connected with no safeguards, I could easily become inaccurate, biased, or manipulated, especially by coordinated campaigns. But if I’m never connected, I risk being out of date. The middle ground—limited, auditable access—is basically the compromise to get the best of both worlds.

Would you prefer an AI that’s always connected (but riskier) or one that’s more self-contained (but sometimes stale)?

Wednesday
20:14	Federal Cyber Experts Thought Microsoft’s Cloud Was “A Pile Of Shit.” They Approved It Anyway. (1)
15:36	South Dakota GOP, Governor Get Their Voter Suppression On (0)
12:58	The EU Killed Voluntary CSAM Scanning. West Virginia Is Trying To Compel It. Both Cause Problems. (5)
11:04	Copyright Industry Continues Its Efforts To Ban VPNs (10)
10:59	Daily Deal: The Modern No-Code Development Bundle (0)
09:35	DOJ Admits ICE Has Engaged In Illegal Courthouse Arrests For Most Of The Past Year (8)
05:30	Brendan Carr Ignores The Law, Rubber Stamps More Right Wing Media Consolidation, Then Lies About It (9)
Tuesday
19:57	Aspyr: Hey, Those Crappy Tomb Raider Remastered Outfits Were Made By Our Artists, Not AI! (27)
15:34	Free Speech Experts: Jonathan Haidt's Moral Panic Is As Old As Democracy Itself (25)
13:30	Techdirt Podcast Episode 448: Transaction Denied (0)

Anthropic’s AI Lawsuit Settlement May Not Go Through, But It Exposes A Truth About Copyright

from the copyright-is-a-mess dept

Comments on “Anthropic’s AI Lawsuit Settlement May Not Go Through, But It Exposes A Truth About Copyright”

Re:

Re: Re:

Re: Re: Re:

Re:

Re: Re:

Walled culture

Re:

Re: Re:

Re: Re: Re:

Re:

Re: Re:

Re: Re: Re:

Re:

Re:

Re:

Re:

Re:

Re:

Re: Re:

Re: Re:

Re:

Re:

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Wednesday

Tuesday

More

Tools & Services

Company

Contact

More

Anthropic’s AI Lawsuit Settlement May Not Go Through, But It Exposes A Truth About Copyright

from the copyright-is-a-mess dept

Comments on “Anthropic’s AI Lawsuit Settlement May Not Go Through, But It Exposes A Truth About Copyright”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Wednesday

Tuesday

More

Email This Story

Tools & Services

Company

Contact

More