Anonymous Coward

May 31, 2024 at 1:46 pm

Almost like a handful of rich assholes owning all the imaginary property for 100+ years at a time is unworkable.

Crafty Coyotr

May 31, 2024 at 7:00 pm

Re:

And when people realize that the property is indeed imaginary, they’ll be ready to “steal” that property and get away with it. And those rich assholes will realize that when people are willing to be arrested to expose the lie, we’ll win

Anonymous Coward

May 31, 2024 at 9:08 pm

Re:

For real property, the law recognized long ago that ownership has a huge problem: how do I know who, if anyone, owns the land on which I’d like to build a driveway (for example)? In almost any part of the world, some government keeps track of that and can answer the question. And if the owner is dead or defunct, or appears to have abandoned the land, there’s a process to deal with it.

With copyright, we seem to have none of that. We just kind of accept that maybe someone will come along out of nowhere, make claims we can’t really verify, and sue us or just intercept our advertising revenue.

Crafty Coyote

June 1, 2024 at 11:56 pm

Re: Re:

Life is too short to live enslaved by fear. Check through government records- the absence of an official copyright confirmation- and then have fun. Dead or defunct owners of this property should be treated like mines that might still have a vein of gems or precious metals in them. I’ll definitely go wildcatting around some forgotten music or images and see if I’m not genius enough to strike it rich.

Anonymous Coward

May 31, 2024 at 2:05 pm

AI training is not fair use

And that’s because it’s not a derivative work: it’s a copy. All of these “AI” systems are just massive exercises in linear algebra incorporating all the data they’ve ingested. They’re not “learning”, they’re just adjusting the values of the weights in the models until they produce the output desired by their makers. There’s no intelligence, no understanding, no comprehension in them. As the (justifiably) famous paper observes, they’re stochastic parrots.

The only difference between these systems and a system which ingested the same content and spit it all back in the same order that it was read is that this one does it when prompted.

Anonymous Coward

May 31, 2024 at 2:17 pm

Re:

The trained model is the work, clown shoe.

Anonymous Coward

May 31, 2024 at 2:49 pm

Re:

Lol, no. There is no copy.

If you saw a picture one, even stared at it for an hour, or studied it for elements or styles you’d use in something you might create in the future, is there a copy? (Spoiler: No.)

Anonymous Coward

May 31, 2024 at 3:25 pm

Re:

AI training is not fair use And that’s because it’s not a derivative work: it’s a copy

This does not follow. A full and exact copy can indeed be fair use. Libraries, for example, have been storing tiny copies of major newspapers since a hundred years ago. An CD-rip of a disc you own is totally fine too, as much as the copyright maximalists hate it.

A US Copyright Office document was linked in comments on a recent story, confirming that short sentences are “de minimus” and not eligible for copyright. If that’s all that the “AI” models spit out, it might be fine, but I fully expect that’s gonna be disputed and eventually end up at SCOTUS.

bhull242 (profile)

May 31, 2024 at 4:00 pm

Re:

The only difference between these systems and a system which ingested the same content and spit it all back in the same order that it was read is that this one does it when prompted.

And that it doesn’t actually store the data. And that it’s not always a perfect copy. And that they have nothing in common.

Anonymous Coward

June 1, 2024 at 3:58 am

Re:

So if I read a bunch of research papers on a range of subjects, and then use most or all of the same words in a different order in a research paper of my own on a completely different subject, that’s a copy? How so?

Anonymous Coward

May 31, 2024 at 2:11 pm

That’s what is strange about the billions of images on internet, they free to access but not free to use.
How many sunsets on internet? By judging at how many people I’ve seen taking them with their phone each time I can see one, a lot, but only a tiny fraction is really free to use.
And it’s certainly where Google and Facebook can shine, by playing dirty using non-free (like pictures on Google Maps) even private pictures until they AI is becoming good enough to be newly trained on free content (if it’s really matter one day), and where open-source models and data will struggle.
So maybe, this copyright “issues” on AI generative content should be, in some way, ignored to allow everyone to get access to enough content (known as the “whole” internet) to make the technology affordable not only for the biggest companies.

Mamba (profile)

May 31, 2024 at 2:52 pm

Re:

How is Google using the photos they took to train an AI ‘playing dirty’? Or are you talking about the photos people upload to maps, where you’re required grant them a license to sue the image for derivative works?

I also maintain that AI doesn’t even need to rely on fair use. If Google, for example, purchases a digital copy of a book and trains the AI on it…it’s just use.

Tom B

May 31, 2024 at 2:13 pm

Using Unreliable Training Data

I doubt that anyone is going to agree with me here, but so be it.
You shouldn’t be using the internet (or internet-assessable) public or private data to train your AI! The only solution that prevents the cost of lawsuits and guarantees control over the training process is to come up with your own training data, 100%. What’s that, you say? Too hard? Takes too long? Costs too much? My heart bleeds for you. Trying to make the big bucks without putting in the — admittedly — hard work is the corporate equivalent of school students cheating off each other’s test papers. Don’t complain if your student gets caught hallucinating the wrong answers! There’s a very good reason we’re about 50-100 years away from true, reliable Artificial Intelligence. There are no shortcuts, fair use or no fair use.

Anonymous Coward

May 31, 2024 at 2:19 pm

Re:

We’d all still be working from an abacus we had to build ourselves if you had your way.

Anonymous Coward

May 31, 2024 at 11:52 pm

Re:

come up with your own inspiration, 100%. What’s that, you say? Too hard? Takes too long? Costs too much? My heart bleeds for you.

come up with your own 3D modeling software, 100%. What’s that, you say? Too hard? Takes too long? Costs too much? My heart bleeds for you.

come up with your own literary genre, 100%. What’s that, you say? Too hard? Takes too long? Costs too much? My heart bleeds for you.

Yeah… I don’t think that statement works out the way you want it to.

Anonymous Coward

June 1, 2024 at 3:55 am

Re:

I will note YOU have been trained off other peoples works. and you are still using other people’s languages.

Anonymous Coward

May 31, 2024 at 3:14 pm

I know of a number of images that have been uploaded to Wikimedia with a CCO attribution that, if put into a Google Image Search, will be virtually identical to images with restrictive copyright licensing.

Possibly what this project could do, if they really wanted to ensure license compliance, is also do a reverse image search on every image in the collection, and prune anything for which others have claimed copyright on similar images.

This would in most cases be silly, as many corporations do exactly the opposite and claim copyright over images first published with less restrictions. But at least the training corpus would have a solid backing of good provenance and good-faith pruning. Copyrighted works will still sneak in, but are less likely to skew the output in any meaningful way.

Anonymous Coward

May 31, 2024 at 4:50 pm

Re:

One thing that could really help with the CCO attribution would be for the AI to find the provenance of every image that it was trained with. That could really help clear the underbrush. Unfortunately you can’t really know that it didn’t just hallucinate the provenance. Oops,

Arianity

May 31, 2024 at 3:46 pm

should the fact that they thought they were using a CC0 image be relevant to their liability?

A good faith defense seems reasonable. Especially if there’s a process to remove it from the database.

Anonymous Coward

May 31, 2024 at 5:10 pm

It’s cute that you only want the system you fucking brunchlords to change only because it stands in the way of your fucking stock options.

SORRY NOT SORRY you lot are finally facing the problems of content creators writ large.

And I am enjoying every single fucking planck unit of it.

Enjoy the hell you created. We hope you’ll fix it, but you’ll only try to carve out exemptions for your ingroup, so from the rest of us in the hell you created:

WELCOME.

Anonymous Coward

June 2, 2024 at 12:41 pm

Re:

lolwut

Matthew M Bennett

June 1, 2024 at 1:46 am

My god, this is literally an article about nothing

“determining what is actually open license is hard. Why are AI companies so bad at this?”

Hilarious.

The absolute state of this shitty site.

Anonymous Coward

June 1, 2024 at 4:00 am

Re:

I imagine you’re so hard right now because you just made yet another point-free criticism.

Strawb (profile)

June 1, 2024 at 7:42 am

Re:

The absolute state of this shitty site.

Yet you keep coming back.

It’s almost as if you have nothing better to do with your life.

MrWilson (profile)

June 1, 2024 at 7:06 pm

Re:

“This food I’ve eaten for the twentieth time still tastes like shit!”

You’re just insulting yourself when you comment here. If you don’t like it but continue to return, either you’re actually getting something out of it and thus lying or a very dumb masochist. Either way isn’t a good look.

Anonymous Coward

June 2, 2024 at 12:48 pm

Re: Re:

i always go back to the same awful restaurant* so i can keep writing terrible reviews. Doesn’t everyone? What? Don’t they?

[*Really just a restaurant which serves cuisine that i don’t care for. It just makes me angry that other cuisines exist.]

Anonymous Coward

June 3, 2024 at 4:57 pm

AI is free to use my content as they sere fit.

terop (profile)

June 10, 2024 at 1:25 am

For small entities like programmers or artists, it’s better to drop AI completely than start building your own non-infringing AI databases. The amount of data is just too large for the operation to be suitable for small teams like that. Dropping AI (like I have done) is clearly the right solution. Going with the small data storage is significantly better solution.

Friday
19:39	Developer Promises To Keep Failed Online Game Servers Up: Art Deserves To Be Preserved (0)
15:24	Why The US Can't Adopt Ukraine's Innovative Approach To Unmanned Warfare Systems (9)
13:27	Let’s Help Children, Not Trial Lawyers (4)
11:03	Appeals Court Upholds Block Of ICE's BS 'Seven Day Notice' Detention Center Inspection Policy (3)
10:58	Daily Deal: Babbel Language Learning (All Languages) (0)
09:24	Trump's $10 Billion IRS Lawsuit May Become a $1.7 Billion Slush Fund for MAGA's Self-Proclaimed Victims (1)
05:30	Bari Weiss Let Benjamin Netanyahu Pick His Own Softball Interviewer (11)
Thursday
20:15	HHS Is A Chaos Engine: Marty Makary Out At FDA (8)
15:22	Congress Narrowed The GUARD Act, But Serious Problems Remain (1)
13:03	OpenAI's KOSA Endorsement Is Regulatory Capture With A Smiley Face (6)

Clearing Rights For A ‘Non-Infringing’ Collection Of AI Training Media Is Hard

from the public-domain-impossibility-theorem dept

Comments on “Clearing Rights For A ‘Non-Infringing’ Collection Of AI Training Media Is Hard”

Re:

Re:

Re: Re:

AI training is not fair use

Re:

Re:

Re:

Re:

Re:

Re:

Using Unreliable Training Data

Re:

Re:

Re:

Re:

Re:

My god, this is literally an article about nothing

Re:

Re:

Re:

Re: Re:

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Friday

Thursday

More

Tools & Services

Company

Contact

More

Clearing Rights For A ‘Non-Infringing’ Collection Of AI Training Media Is Hard

from the public-domain-impossibility-theorem dept

Comments on “Clearing Rights For A ‘Non-Infringing’ Collection Of AI Training Media Is Hard”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More