Anonymous Coward

August 15, 2025 at 4:32 pm

With the AI Act adopted in 2024 referring to these provisions, we’re left with a raft of questions about the future of AI models. One of the key concerns is the potential for a data winter—a scenario where AI models face limited access to the data they need to evolve and improve.

Well, I’m not allowed to view that link, but I suspect a bot could get around it. Supposedly these “A.I.” services are already defeating CAPTCHAs; somebody got one to explain its steps, and got a result like “Now I’m clicking the ‘I’m human’ box to prove I’m not a robot.”

It’s not all that suprising, really, given that the operators of some of these CAPTCHA services have openly stated that the results are used to train bots (for optical character recognition, to identify school buses and fire hydrants, and so on). So I’m skeptical about a “data winter” being caused by bot-blocking—or by copyright rules, which the systems are already ignoring.

I suspect the actual limit is going to be finding data that’s not auto-generated slop. Already, these bots must be accidentally training, in large part, on the output of other bots. It’s not all that different from humans, though: humans are also learning from the badly-written text of other humans (and bots), and repeating their mistakes. Even books by major authors are often not well copy-edited (or perhaps I should say that happens especially once they become popular); see, for example, the ubiquitous comma-splicing in the later Harry Potter books.

I see two reasonably reliable ways to get non-computer-generated data. Companies like Facebook can probably guess which of their users are human, and can use that data under their terms of service; except, as just noted, most of those humans can’t write worth a damn. The other way is to use old data, especially out-of-copyright data. That has two problems: one is simply that it’s old, such that your “A.I.” might end up talking like a grizzled 1890s prospector (nevermind the biases noted by Caroline already). The other is that it’s subject to errors, such as JBIG2 character substitutions by scanners and just bad optical character recognition.

Anonymous Coward

August 15, 2025 at 4:33 pm

What an absolutely pathetic opinion from an industry shill. Even sadder if you aren’t even scamming the scammers out of their plagiarism money to defend their ruination of our planet and ability to think and create as humans.

1) What innovation? What wondrous new contributions to human knowledge and society has an LLM ever actually made? They hallucinate presidents and tell us to eat rocks.

2) LLMs _do_not_learn_. If they did, they could count the number of Rs in strawberry by now. Something that can learn doesn’t need hundreds of billions of dollars, countless billions of gallons of water, and grid-crushing amounts of electricity to figure out something every human on earth learned in elementary school.

3) A human manipulating a tool to produce a creative work based on their individual, human perceptions and choices bears no resemblance to a mouth breathing prompt jockey acting like a Problem Client continually yelling at a machine to copy more ideas from others more better more faster because they think learning a trade or skill is too boring or too hard. Why should we cater our entire Human Endeavor to the laziest and dumbest among us?

4) The path forward involves abandoning these resource monopolizing, planet destroying, brain atrophying monstrosities and never again taking seriously the opinions of anyone who ever defended them.

Anonymous Coward

August 15, 2025 at 5:10 pm

The real issue here is “Intellectual Property”, the infinite supply of human thoghts and ideas.

Do humans inherently ‘own’ their thoughts as an economic ‘Property Right’ that should be enforced by government ??

Until one can rationally answer this question, productive intellectual analysis of Copyright, Patents, Trademarks, etc is impossible.

Stephen T. Stone (profile)

August 15, 2025 at 10:55 pm

Re:

Do humans inherently ‘own’ their thoughts as an economic ‘Property Right’ that should be enforced by government ??

No. Ideas can’t be copyrighted⁠—only fixed expressions of ideas. For example: “Man travels through a post-apocalyptic world” is a broad idea, which means the expression thereof can range from the Mad Max franchise to the MST3K-worthy Warrior of the Lost World.

Anonymous Coward

August 16, 2025 at 6:27 am

Re: Re:

No. Ideas can’t be copyrighted⁠

According to the laws as written, sure. In practice, courts do uphold copyright on ideas. For example, someone using any attribute of Sherlock Holmes or Mickey Mouse from after the public-domain stuff can expect to get sued. The “unofficial sequel” to Catcher in the Rye was banned by a U.S. court on copyright grounds, despite copying none of the fixed expressions.

Stephen T. Stone (profile)

August 16, 2025 at 8:15 am

Re: Re: Re:

In practice, courts do uphold copyright on ideas. For example, someone using any attribute of Sherlock Holmes or Mickey Mouse from after the public-domain stuff can expect to get sued.

The argument in such cases is that the elements of a character not yet present in public domain works are part of the fixed expression of that character and are therefore off-limits so long as those elements remain in works covered by copyright. I don’t think it’s the best argument, but it’s the one that’s held up by law.

The “unofficial sequel” to Catcher in the Rye was banned by a U.S. court on copyright grounds, despite copying none of the fixed expressions.

The creation of derivative works based on existing copyrighted material generally requires the permission of the copyright holder. Exceptions do exist⁠—parody, for example⁠—and sometimes derivative works are ignored because they help generate interest in the original work (e.g., fan art and fanfiction). But if you’re trying to publish an “unofficial sequel” to a book still covered by copyright, you’d do well to file off all the serial numbers before you pull the trigger. I mean, Fifty Shades of Grey started off as Twilight fanfiction before E. L. James rewrote it as an original piece.

Anonymous Coward

August 16, 2025 at 12:52 pm

Re: Re: Re:²

The argument in such cases is that the elements of a character not yet present in public domain works are part of the fixed expression of that character

That’s a bit like when software patents are not allowed, so people patent a general-purpose computer running software that implements their idea. Which in practice is the exact same thing, but lawyers bullshit the courts into believing there’s a distinction.

The creation of derivative works based on existing copyrighted material generally requires the permission of the copyright holder.

U.S. law says:

A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a “derivative work”.

A “sequel” is usually based only on the pure ideas of a copyrighted work, using none or almost none of the content. It’s therefore not a “derived work”, or at least isn’t supposed to be. But, again, lawyers are good bullshitters…

Stephen T. Stone (profile)

August 16, 2025 at 4:14 pm

Re: Re: Re:³

lawyers bullshit the courts into believing there’s a distinction

Like I said: I don’t think it’s the best argument, but it’s the one that’s held up by law.

A “sequel” is usually based only on the pure ideas of a copyrighted work, using none or almost none of the content. It’s therefore not a “derived work”, or at least isn’t supposed to be.

If a given work is a sequel to an existing work, chances are it’s using at least one of the characters from the existing work. Mad Max: Fury Road has no other characters from the other Mad Max films besides the titular Road Warrior himself, but it’s still a sequel to the other films. A sequel to Catcher in the Rye would necessarily have to involve the use of at least one character from that book, and chances are the sequel would reference events from that book. That would absolutely make it a direct derivative work because it is referencing a direct expression of ideas. Remember, copyright is about the expression of ideas, not the ideas themselves⁠—and if you’re going to copy or build upon someone else’s expression of ideas, you should either have permission or have an argument for Fair Use (e.g., parody) ready to deploy in case you need it in a court of law.

CCBEB

August 16, 2025 at 4:00 pm

Re: Re:

no, owned PROPERTY can only be something physical.

but politicians instead decided that some types of intangible human ideas were especially valuable to society … and could readily encouraged by an artificial legal ‘grant’ of private-property status … in defiance of basic economics.

It’s been a huge mess ever since, loaded with inefficiency, corruption & severe negative consequences to society.

eMike (profile)

August 15, 2025 at 6:15 pm

Interestingly, copyright usually only becomes an issue when there’s the perception that someone or something is successful—and that copyright holders are missing out on potential control or revenues.

Does this mean that when a corporation infringes enough copyright to bankrupt the entire country with a DMCA style lawsuit, that it’s ok because they successfully torrented all that data?

Anonymous Coward

August 15, 2025 at 7:06 pm

Re:

While i have no love for “AI”, and have issues with seemingly thoughtful people who cannot see that “AI” is not going to get better, and nothing “AI” does can ever justify its costs, i can make no sense out of what you wrote.

eMike (profile)

August 16, 2025 at 1:20 pm

Re: Re:

Fair. Let me try again.

Saying that it only matters when copyright holders think they’re owed money means that the AI companies think that they should be allowed to have free access to all data in the world. They’re wasting the bandwidth of everybody else to get access to all these petabytes of data that they feel like they shouldn’t have to pay for, but somebody else should give them even though it has real costs for the providers.

Meta torrented tons of data and will likely see no repercussions, while people have been bankrupted by the RIAA for much, much less.

Lines like this show that AI sympathizers seem to think that because it’s AI they should be allowed to break the laws the rest of us have to live by. “You only want money. We only downloaded a little data but we’re doing something good with it, so it’s ok. We promise.”

Strawb (profile)

August 16, 2025 at 1:29 am

Re:

No, that’s not what it means.

It means that corporations only become concerned with copyright issues when there’s money to be made, or rather, when there’s money to be missed out on making.

Aaron

August 17, 2025 at 2:58 pm

Re:

The weasel term “infringement” was invented to conceal the fact that using someone else’s intangible ideas is NOT property THEFT under any previous concept of Common Law nor statute law.

Pixelation

August 15, 2025 at 7:31 pm

Hmmm, stupid meets vapor…

drew (profile)

August 16, 2025 at 12:46 am

Anything in, garbage out.

The old Garbage In, Garbage Out maxim is not a useful expression here because it implies a converse (Truth In, Truth Out) that is not correct. Even fed 100% accurate content LLMs are capable of hallucinations because they are just probability models and have no concept of either truth or garbage.
It’s also worth noting that the UK is ploughing its own furrow on this and is allowing copyright on machine-created work. Which is spectacularly dumb but it will be interesting to watch as a case study.

Anonymous Coward

August 16, 2025 at 2:27 am

Re:

How’s about “DIGO”: Data in, garbage out?

Anonymous Cowards Fan

August 17, 2025 at 1:59 pm

Re: Re:

How about DInGO?

“DInGO ate my work.”

Arianity (profile)

August 16, 2025 at 1:26 am

(Your regular reminder that the blurb at the bottom is a euphemism for lobbying, evaluate accordingly)

Interestingly, copyright usually only becomes an issue when there’s the perception that someone or something is successful—and that copyright holders are missing out on potential control or revenues

I’m not sure that’s true at all? There’s plenty of artists whose only art is posted on something like deviantart, who get annoyed at their art being stolen. Some are just thrilled to be noticed, but plenty will stand on principle. Especially if it’s uncredited. Financial success definitely gives more of an incentive and means (and perhaps most importantly, garners attention. No one notices when a small artist makes a fuss), sure- but creatives often have pretty strong feelings on how their work is used, one way or another.

(Similarly, people did try to use copyright to kill crawling- they just failed. Not for lack of trying.)

we need copyright laws that protect human creativity and labour without hampering access to the data that AI

Can’t say this series has been terribly reassuring on this front. So far the main argument seems to be not to worry, because the human element is irreplaceable. Honestly not even clear what’s left to protect via the law, if that hope were true.

n00bdragon (profile)

August 16, 2025 at 5:49 am

Whenever content is involved, copyright enters the conversation. And when we talk about AI, we’re talking about systems that absorb petabytes of content to meet their training needs. So naturally, copyright issues are at the forefront of the debate.

This is a non-sequitur. The amount of data consumed has no relevance to copyright law. Libraries do not become infringement centers if you read “too many” books out of them. Everything that follows this is absurd. AI has no interaction with Copyright law. It doesn’t make creative choices (it follows a repeatable algorithm) and it isn’t a person. It has all the copyright status of a road striper.

Anonymous Coward

August 16, 2025 at 6:30 am

Re:

It has all the copyright status of a road striper.

Don’t people who put stripes on roads deserve some kind of protection? What’s the point of even doing it if anyone (including A.I.) can copy their work for free?

Anonymous Coward

August 19, 2025 at 3:16 am

Re: Re:

Don’t people who put stripes on roads deserve some kind of protection?

That’s what the hi-vis vests and hard hats are providing.

Diogenes (profile)

August 16, 2025 at 8:53 pm

lose lose

Seems like AI is a loser on both sides. First they are liable for copyright infringement for the training, and then anything they create cannot be copyrighted so it can be taken without compensation.

Anonymous Coward

August 17, 2025 at 8:32 am

I have no sympathy for any AI training which ignores robots.txt, and no sympathy for any website which objects to being crawled but doesn’t have a robots.txt forbidding it.

This comment has been flagged by the community. Click here to show it.

Matthew N. Bennett (profile)

August 17, 2025 at 6:45 pm

haha her last name is cock

Franciakulcs (profile)

August 16, 2025 at 3:00 am

A new authority is needed to investigate this.

Wednesday
11:56	Judge Highlights Government Fuckery In Ruling Over Migrant Detainees' Due Process Rights (1)
10:53	Brendan Carr's Abuse Of FCC 'Equal Opportunity' Rule Completely Blows Up In His Face (2)
10:48	Daily Deal: Luminar Neo Bundle (0)
09:28	The ‘Most Massive Attack On Free Speech’ Is Happening Right Now, And The Twitter Files Crew Is Mighty Quiet (7)
05:27	Ars Technica Retracts Story Featuring Fake Quotes Made Up By AI, About A Different AI That Launched A Weird Smear Campaign Against An Engineer Who Rejected Its Code (Seriously) (34)
Tuesday
20:01	SC State Senator Proposes Bill To Remove Religious Exemptions For Vaccines In Public School Children (25)
15:24	Preserving The Web Is Not The Problem. Losing It Is. (7)
13:30	Techdirt Podcast Episode 444: Answering Your Questions (0)
12:04	Hey Brett Kavanaugh, This Is On You: (27)
10:53	Cowardly And Complicit CBS Pulls Colbert Interview With Dem Politician To Please Republicans (22)

When Copyright Enters the AI Conversation

from the reading-by-robots dept

The Input Side: Potential Innovation and the Garbage In, Garbage Out Principle

Challenges on the Output Side: Copyright Protection for AI-Generated Content

Comments on “When Copyright Enters the AI Conversation”

Re:

Re: Re:

Re: Re: Re:

Re: Re: Re:²

Re: Re: Re:³

Re: Re:

Re:

Re: Re:

Re:

Re:

Anything in, garbage out.

Re:

Re: Re:

Re:

Re: Re:

lose lose

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Tools & Services

Company

Contact

More

When Copyright Enters the AI Conversation

from the reading-by-robots dept

The Input Side: Potential Innovation and the Garbage In, Garbage Out Principle

Challenges on the Output Side: Copyright Protection for AI-Generated Content

Comments on “When Copyright Enters the AI Conversation”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Email This Story

Tools & Services

Company

Contact

More