German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions

from the one-good-ruling dept

The copyright world is currently trying to assert its control over the new world of generative AI through a number of lawsuits, several of which have been discussed previously on Walled Culture. We now have our first decision in this area, from the regional court in Hamburg. Andres Guadamuz has provided an excellent detailed analysis of a ruling that is important for the German judges’ discussion of how EU copyright law applies to various aspects of generative AI. The case concerns the freely-available dataset from LAION (Large-scale Artificial Intelligence Open Network), a German non-profit. As the LAION FAQ says: “LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images.” Guadamuz explains:

The case was brought by German photographer Robert Kneschke, who found that some of his photographs had been included in the LAION dataset. He requested the images to be removed, but LAION argued that they had no images, only links to where the images could be found online. Kneschke argued that the process of collecting the dataset had included making copies of the images to extract information, and that this amounted to copyright infringement.

LAION admitted making copies, but said that it was in compliance with the exception for text and data mining (TDM) present in German law, which is a transposition of Article 3 of the 2019 EU Copyright Directive. The German judges agreed:

The court argued that while LAION had been used by commercial organisations, the dataset itself had been released to the public free of charge, and no evidence was presented that any commercial body had control over its operations. Therefore, the dataset is non-commercial and for scientific research. So LAION’s actions are covered by section 60d of the German Copyright Act

That’s good news for LAION and its dataset, but perhaps more interesting for the general field of generative AI is the court’s discussion of how the EU Copyright Directive and its exceptions apply to AI training. It’s a key question because copyright companies claim that they don’t, and that when such training involves copyright material, permission is needed to use it. Guadamuz summarizes that point of view as follows:

the argument is that the legislators didn’t intend to cover generative AI when they passed the [EU Copyright Directive], so text and data mining does not cover the training of a model, just the making of a copy to extract information from it. The argument is that making a copy to extract information to create a dataset is fine, as the court agreed here, but the making of a copy in order to extract information to make a model is not. I somehow think that this completely misses the way in which a model is trained; a dataset can have copies of a work, or in the case of LAION, links to the copies of the work. A trained model doesn’t contain copies of the works with which it was trained, and regurgitation of works in the training data in an output is another legal issue entirely.

The judgment from the Hamburg court says that while legislators may not have been aware of generative AI model training in 2019, when they drew up the EU Copyright Directive, they certainly are now. The judges use the EU’s 2024 AI Act as evidence of this, citing a paragraph that makes explicit reference to AI models complying with the text and data mining regulation in the earlier Copyright Directive.

As Guadamuz writes in his post, this is an important point, but the legal impact may be limited. The judgment is only the view of a local German court, so other jurisdictions may produce different results. Moreover, the original plaintiff Robert Kneschke may appeal and overturn the decision. Furthermore, the ruling only concerns the use of text and data mining to create a training dataset, not the actual training itself, although the judges’ thoughts on the latter indicate that it would be legal too. In other words, this local outbreak of good sense in Germany is welcome, but we are still a long way from complete legal clarity on the training of generative AI systems on copyright material.

Follow me @glynmoody on Mastodon and on Bluesky. Originally posted to Walled Culture.

Filed Under: , , , , , , , , , ,
Companies: laion

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “German Court: LAION’s Generative AI Training Dataset Is Legal Thanks To EU Copyright Exceptions”

Subscribe: RSS Leave a comment
8 Comments
Crafty Coyote says:

Copyright says that everything should be under control for a full century, freezing everything in place.

AI says that there can only be an unlimited amount of soulless computer manufactured “art”.

It’s Order versus Chaos.

What if the proper choice- is not to decide? We are only given these choices, when we could tell both of them to leave us, the artists of this. If we make mistakes, they’ll be our mistakes. I’m sure we can find a way between the stultifying Order of Copyright and the complete Chaos and disarray of AI. Get the hell out, the both of you.

Anonymous Coward says:

It has been interesting how European courts have continued to poke holes in the Copyright Directive’s original maximalist purpose. It was obvious from the word ‘go’ the Directive was squarely meant to crush online communication, and so it’s been surprising to see how courts (so far) haven’t shared in European legislators’ weird corporatism bend. Hopefully that sticks, and we see continued erosion (or a repeal, one can dream) rather than for something like Piracy Shield to expand.

Anonymous Coward says:

Re:

  1. No one has said it happens, it was listed as another matter entirely which would merit scrutiny.
  2. When similar things have occurred, it’s because prompters have asked a machine to go get me this thing from the internet – which is a search function.
  3. Do you really think an LLM could literally store every work in its entirety any more than you can keep copies of everything you’ve seen in your head?
N0083rp00f says:

Re: Logic?

The second sentence accepts the existence of the million monkey premise.
This is where through just randomization and enough time you could get an exact copy of something like Shakespeare.

As for a copyrighted image, copyright comes into play if there is even less than 90% correlation between images.

My guess is that European judges are far more math literate than domestic.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...