German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions

from the one-good-ruling dept

Fri, Oct 25th 2024 01:30pm - Glyn Moody

The copyright world is currently trying to assert its control over the new world of generative AI through a number of lawsuits, several of which have been discussed previously on Walled Culture. We now have our first decision in this area, from the regional court in Hamburg. Andres Guadamuz has provided an excellent detailed analysis of a ruling that is important for the German judges’ discussion of how EU copyright law applies to various aspects of generative AI. The case concerns the freely-available dataset from LAION (Large-scale Artificial Intelligence Open Network), a German non-profit. As the LAION FAQ says: “LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images.” Guadamuz explains:

The case was brought by German photographer Robert Kneschke, who found that some of his photographs had been included in the LAION dataset. He requested the images to be removed, but LAION argued that they had no images, only links to where the images could be found online. Kneschke argued that the process of collecting the dataset had included making copies of the images to extract information, and that this amounted to copyright infringement.

LAION admitted making copies, but said that it was in compliance with the exception for text and data mining (TDM) present in German law, which is a transposition of Article 3 of the 2019 EU Copyright Directive. The German judges agreed:

The court argued that while LAION had been used by commercial organisations, the dataset itself had been released to the public free of charge, and no evidence was presented that any commercial body had control over its operations. Therefore, the dataset is non-commercial and for scientific research. So LAION’s actions are covered by section 60d of the German Copyright Act

That’s good news for LAION and its dataset, but perhaps more interesting for the general field of generative AI is the court’s discussion of how the EU Copyright Directive and its exceptions apply to AI training. It’s a key question because copyright companies claim that they don’t, and that when such training involves copyright material, permission is needed to use it. Guadamuz summarizes that point of view as follows:

the argument is that the legislators didn’t intend to cover generative AI when they passed the [EU Copyright Directive], so text and data mining does not cover the training of a model, just the making of a copy to extract information from it. The argument is that making a copy to extract information to create a dataset is fine, as the court agreed here, but the making of a copy in order to extract information to make a model is not. I somehow think that this completely misses the way in which a model is trained; a dataset can have copies of a work, or in the case of LAION, links to the copies of the work. A trained model doesn’t contain copies of the works with which it was trained, and regurgitation of works in the training data in an output is another legal issue entirely.

The judgment from the Hamburg court says that while legislators may not have been aware of generative AI model training in 2019, when they drew up the EU Copyright Directive, they certainly are now. The judges use the EU’s 2024 AI Act as evidence of this, citing a paragraph that makes explicit reference to AI models complying with the text and data mining regulation in the earlier Copyright Directive.

As Guadamuz writes in his post, this is an important point, but the legal impact may be limited. The judgment is only the view of a local German court, so other jurisdictions may produce different results. Moreover, the original plaintiff Robert Kneschke may appeal and overturn the decision. Furthermore, the ruling only concerns the use of text and data mining to create a training dataset, not the actual training itself, although the judges’ thoughts on the latter indicate that it would be legal too. In other words, this local outbreak of good sense in Germany is welcome, but we are still a long way from complete legal clarity on the training of generative AI systems on copyright material.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Comments on “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions”

Crafty Coyote

October 25, 2024 at 5:20 pm

AI says that there can only be an unlimited amount of soulless computer manufactured “art”.

It’s Order versus Chaos.

What if the proper choice- is not to decide? We are only given these choices, when we could tell both of them to leave us, the artists of this. If we make mistakes, they’ll be our mistakes. I’m sure we can find a way between the stultifying Order of Copyright and the complete Chaos and disarray of AI. Get the hell out, the both of you.

Anonymous Coward

October 26, 2024 at 5:45 am

Re:

I was unaware of copyrights’ ability to vocalize.
Does AI say the same thing every time it is asked?

Anonymous Coward

October 26, 2024 at 2:30 am

It has been interesting how European courts have continued to poke holes in the Copyright Directive’s original maximalist purpose. It was obvious from the word ‘go’ the Directive was squarely meant to crush online communication, and so it’s been surprising to see how courts (so far) haven’t shared in European legislators’ weird corporatism bend. Hopefully that sticks, and we see continued erosion (or a repeal, one can dream) rather than for something like Piracy Shield to expand.

Anonymous Coward

October 26, 2024 at 6:17 am

A trained model doesn’t contain copies of the works with which it was trained, and regurgitation of works in the training data in an output is another legal issue entirely.

Um. The fact that the second half of that sentence can happen at all is a repudiation of the first half of that sentence.

Anonymous Coward

October 26, 2024 at 8:25 am

Re:

No one has said it happens, it was listed as another matter entirely which would merit scrutiny.
When similar things have occurred, it’s because prompters have asked a machine to go get me this thing from the internet – which is a search function.
Do you really think an LLM could literally store every work in its entirety any more than you can keep copies of everything you’ve seen in your head?

Anonymous Coward

October 26, 2024 at 5:55 pm

Re:

Or instead of reading this article, you could google the numerous sources out there, including short videos, that will explain to you how the first part of the sentence is true. Pretending your understanding of a single sentence will suffice for the extent of your understanding is really insulting to yourself.

N0083rp00f

October 28, 2024 at 7:40 am

Re: Logic?

The second sentence accepts the existence of the million monkey premise.
This is where through just randomization and enough time you could get an exact copy of something like Shakespeare.

As for a copyrighted image, copyright comes into play if there is even less than 90% correlation between images.

My guess is that European judges are far more math literate than domestic.

Just AI News (profile)

October 27, 2024 at 2:44 am

Clarity

It’s great to see some much-needed clarity on the intersection of copyright law and AI training datasets.

Add Your Comment

Wednesday
11:06	The California Primary And The Frustrating Absence Of Ranked Choice Voting (0)
11:01	Daily Deal: The Academy of Game Art Bundle (0)
09:38	Lying, Cheating Administration Has Its Ass Handed To It By Judge Overseeing BS Charges Against ICE Protesters (1)
05:31	Brendan Carr Says He Wants Public Input On His Censorship Campaign Against ABC (5)
Tuesday
19:57	Can't Do Anything Right: RFK's ACIP Charter Changes Yanked For Not Following Procedure (11)
15:31	My Kid Vibe Coded Their Way To Actually Learning Math (24)
13:03	Judge Dismisses Charges Against Kilmar Abrego Garcia, Says Gov't Engaged In Vindictive Prosecution (16)
11:06	'The Worst Leak I've Witnessed': A CISA Contractor Left AWS GovCloud Credentials Sitting In A Public GitHub Repo (17)
11:01	Daily Deal: The Complete Arduino, Raspberry Pi & ESP32 Bundle (0)
09:36	Super Meth Isn't The Hero We Want, But It's The Hero We Deserve (24)

German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions

from the one-good-ruling dept

Comments on “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions”

Re:

Re:

Re:

Re: Logic?

Clarity

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Tools & Services

Company

Contact

More

German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions

from the one-good-ruling dept

Comments on “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Wednesday

Tuesday

More

Email This Story

Tools & Services

Company

Contact

More