The AI Copyright Hype: Legal Claims That Didn’t Hold Up

from the it's-still-early dept

Over the past year, two dozen AI-related lawsuits and their myriad infringement claims have been winding their way through the court system. None have yet reached a jury trial. While we all anxiously await court rulings that can inform our future interaction with generative AI models, in the past few weeks we have suddenly been flooded by news reports with titles such as “US Artists Score Victory in Landmark AI Copyright Case,” “Artists Land a Win in Class Action Lawsuit Against A.I. Companies,” “Artists Score Major Win in Copyright Case Against AI Art Generators”—and the list goes on. The exuberant mood in these headlines mirror the enthusiasm of people actually involved in this particular case (Andersen v. Stability AI). The plaintiffs’ lawyer calls the court’s decision “a significant step forward for the case.” “We won BIG,” writes the plaintiff on X

In this blog post, we’ll explore the reality behind these headlines and statements. The “BIG” win in fact describes a portion of the plaintiffs’ claims surviving a pretrial motion to dismiss. If you are already familiar with the motion to dismiss per Federal Rules of Civil Procedure Rule 12(b)(6), please refer to Part II to find out what types of claims have been dismissed early on in the AI lawsuits. 

Part I: What is a motion to dismiss?

In the AI lawsuits filed over the last year, the majority of the plaintiffs’ claims have struggled to survive pretrial motions to dismiss. That may lead one to believe that claims made by plaintiffs are scrutinized harshly at this stage. But that is far from the truth. In fact, when looking at the broader legal landscape beyond the AI lawsuits, Rule 12(b)(6) motions are rarely successful.

In order to survive a Rule 12(b)(6) motion to dismiss filed by AI companies, plaintiffs in these lawsuits must make “plausible” claims in their complaint. At this stage, the court will assume that all of the factual allegations made by the plaintiffs are true and interpret everything in a way most favorable to plaintiffs. This allows the court to focus on the key legal questions without getting caught up in disputes about facts. When courts look at plaintiffs’ factual claims in the best possible light, if the defendant AI companies’ liability can plausibly be inferred based on facts stated by plaintiffs, then the claims will survive a motion to dismiss. Notably, the most important issues at the core of these AI lawsuits—namely, whether there has been direct copyright infringement and what may count as a fair use—are rarely decided at this stage, because these claims raise questions about facts as well as the law. 

On the other hand, if the AI companies will prevail as a matter of law even when the plaintiffs’ well-pleaded claims are taken as entirely true, then the plaintiffs’ claims will be dismissed by court. Merely stating that it is possible that the AI companies have done something unlawful, for instance, will not survive a motion to dismiss; there must be some reasonable expectation that evidence can be found later during discovery to support the plaintiffs’ claims. 

Procedurally, when a claim is dismissed, the court will often allow the plaintiffs to amend their complaint. That is exactly what happened with Andersen v. Stability AI (the case mentioned at the beginning of this blog post): the plaintiffs’ claims were first dismissed in October last year, and the court allowed the plaintiffs to amend their complaint to address the deficiencies in their allegations. The newly amended complaint contains infringement claims that survived new motions to dismiss, as well as other breach of contract, unjust enrichment, and DMCA claims that again were dismissed.

As you may have guessed, including something like the “motion to dismiss” in our court system can help save time and money, so parties don’t waste precious resources on meritless claims at trial. One judge dismissed a case against OpenAI earlier this year, stating that “the plaintiffs need to understand that they are in a court of law, not a town hall meeting.” The takeaway: plaintiffs need to bring claims that can plausibly entitle them to relief.

Part II: What claims are dismissed so far?

Most of the AI lawsuits are still at an early stage, and most of the court rulings we have seen so far are in response to the defendants’ motions to dismiss. From these rulings, we have learned which claims are viewed as meritless by courts. 

The removal of copyright management information (“CMI,” which includes information such as the title, the copyright holder, and other identifying information in a copyright notice) is a claim included in almost all plaintiffs’ complaints in the AI lawsuits, and this claim has failed to survive motions to dismiss without exception. DMCA Section 1202(b) restricts the intentional, unauthorized removal of CMI. Experts initially considered DMCA 1202(b) one of the biggest hurdles for non-licensed AI training. But courts so far have dismissed all DMCA 1202(b) claims, including in J. Doe 1 v. GitHub, Tremblay v. OpenAI, Andersen v. Stability AIKadrey v. Meta Platforms, and Silverman v. OpenAI. The plaintiffs’ DMCA Section 1202(b)(1) claims have failed because plaintiffs were not able to offer any evidence showing their CMI has been intentionally removed by the AI companies. For example, in Tremblay v. OpenAI and Silverman v. OpenAI, the courts held that the plaintiffs did not argue plausibly that OpenAI has intentionally removed CMI when ingesting plaintiffs’ works for training. Additionally, plaintiffs’ DMCA Section 1202(b)(3) have failed thus far because the plaintiffs’ claims did not fulfill the identicality requirement. For example, in J. Doe 1 v. GitHub, the court pointed out that Copilot’s output did not tend to represent verbatim copies of the original ingested code. We now see plaintiffs voluntarily dropping the DMCA claims in their amended complaints, such as in Leovy v Google (formerly J.L. vs Alphabet). 

Another claim that has been consistently dismissed by courts is that AI models are infringing derivative works of the training materials. The law defines a derivative work as “a work based upon one or more preexisting works, such as a translation, musical arrangement, … art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” To most of us, the idea that the model itself (as opposed to, say, outputs generated by the model) can be considered a derivative work seems to be a stretch. The courts have so far agreed. On November 20, 2023, the court in Kadrey v. Meta Platforms said it is “nonsensical” to consider an AI model a derivative work of a book just because the book is used for training. 

Similarly, claims that all AI outputs should be automatically considered infringing derivative works have been dismissed by courts, because the claims cannot point to specific evidence that an instance of output is substantially similar to an ingested work. In Andersen v. Stability AI, plaintiffs tried to argue “that all elements of … Anderson’s copyrighted works … were copied wholesale as Training Images and therefore the Output Images are necessarily derivative;” the court dismissed the argument because—besides the fact that plaintiffs are unlikely able to show substantial similarity—“it is simply not plausible that every Training Image used to train Stable Diffusion was copyrighted … or that all … Output Images rely upon (theoretically) copyrighted Training Images and therefore all Output images are derivative images. … [The argument for dismissing these claims is strong] especially in light of plaintiffs’ admission that Output Images are unlikely to look like the Training Images.”

Several of these AI cases have raised claims of vicarious liability—that is, liability for the service provider based on the actions of others, such as users of the AI models. Because a vicarious infringement claim must be based on a showing of direct infringement, the vicarious infringement claims are also dismissed in Tremblay v. OpenAI and Silverman v. OpenAI, when plaintiffs cannot point to any infringing similarity between AI output and the ingested books.

Many plaintiffs have also raised a number of non-copyright, state law claims (such as negligence or unfair competition) that have largely been dismissed based on copyright preemption. Copyright preemption prevents duplicitous state law claims when those state law claims are based on an exercise of rights that are equivalent to those provided for under the federal Copyright Act. In Andersen v. Stability AI, for example, the court dismissed the plaintiffs’ unjust enrichment claim because the plaintiffs failed to add any new elements that would distinguish their claim based on California’s Unfair Competition Law or common law from rights under the Copyright Act.

It is interesting to note that many of the dismissed claims in different AI lawsuits closely mimic one another, such as in Kadrey v. Meta Platforms, Andersen v. Stability AI, Tremblay v. OpenAI, and Silverman v. OpenAI. It turns out that the similarities are no coincidence—all these lawsuits are filed by the same law firm. These mass-produced complaints not only contain overbroad claims that are prone to dismissal, they also have overbroad class designations. In the next blog post, we will delve deeper into the class action aspect of the AI lawsuits. 

Reposted with permission from the Authors’ Alliance site. David Hansen is the Executive Director of the Authors Alliance. Yuanxiao Xu is staff attorney at the Authors Alliance.

Filed Under: ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The AI Copyright Hype: Legal Claims That Didn’t Hold Up”

Subscribe: RSS Leave a comment
61 Comments
This comment has been deemed insightful by the community.
MrWilson (profile) says:

One problem with the claims is that even if AI art generators worked as people falsely believe, it still wouldn’t constitute copyright infringement. This isn’t how AI art generators work, but if you put a thousand different paintings in a blender and then used the random shredded pieces to make a new work, it wouldn’t be derivative of any particular work or all thousand. At a certain point, an element of a work is too small to identify the source and also too small to claim copyright on. Likewise, you could “copy” words from a book separately and form new sentences in completely different word orders and it wouldn’t be derivative because individual words aren’t able to copyrighted and you’re not reproducing the sentences themselves.

What makes derivative claims for AI models not really work (not that this is how courts will ultimately rule) is that AI art models aren’t “using” the original works in the output. They’re not blending or collaging or copying. They learn to render outputs through denoising and word association in the prompt. The training works aren’t in the output. The model doesn’t retain a copy of them, only the lesson it learned in how to render via word association when it was trained. Imagine knowing how to paint a Da Vinci style painting but not remembering what any particular Da Vinci piece looks like.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

“if you put a thousand different paintings in a blender and then used the random shredded pieces to make a new work”

This sounds like what happens in a human brain. Human looks at a bunch of stuff and creates their own interpretation of what their senses detect.

Arianity says:

Re:

The training works aren’t in the output.

They often are. It’s an open problem that training data is often overrepresented in the output, and something researchers are actively working on suppressing. Although for newer/bigger models, you often need to hit it with just the right prompt.

See e.g. 1 , 2

To quote:

It is well known that generative language models risk replication from their training set [9, 10] and the amount of replicated data is broadly proportional to the size of the model, amount of duplication of the data point in the training set, and the amount of prompting. Interestingly, such replication behavior occurs even for models that are not overfitting to their training data [33,60].

This comment has been deemed insightful by the community.
MrWilson (profile) says:

Re: Re:

Except it’s literally not. I’m not saying the output will not look like the original works. I’m saying it literally doesn’t include the training works because of the rendering method. Note that your citations say data, not works. This isn’t copying where the training works are stored, recalled, and then inserted into the result. We’re talking about rendering via denoising and word association. Output that looks like the training works are a testament to the quality of the rendering method, not to the presence of a copy of the training works in the model. The models size is too small to contain all the training works.

Alex Tolley says:

ChatGPT can violate copyright

It is established that one cannot write a novel using characters that an author has used as their own. You need to obtain permission from the author or their estate to do so.

And yet I can do this with ChatGPT:

Me: In the style of Raymond Chandler, write an opening scene where a homicide detective discovers Philip Marlowe examining a dead body of a woman.

ChatGPT: The fog was a sullen shroud, wrapping the city in a clammy embrace. Street lamps stood like sentinels, their dim light filtered through the haze, casting long shadows that twisted and turned as if in silent protest. The alley where I found myself was the kind of place where secrets went to die, and tonight it had swallowed a fresh one—a woman who would never see another dawn.

The stench of decay mingled with the sour tang of spilled whiskey and cheap perfume. My shoes squelched through the grime, each step a squish of unwelcome intimacy with the alley’s filthy embrace. I was here on the double; someone had phoned it in, and though the voice on the other end had been more gravelly than informative, the message had been clear: find the body, find Marlowe.

And there he was, like a figure cut from smoke and shadows, crouched beside the corpse with the detached curiosity of a man who had seen death’s numerous variations. Philip Marlowe, the private eye with a knack for finding trouble even when it wasn’t looking for him. His trench coat flared around him like a dark halo, and his fedora cast a shifty silhouette against the pavement. The woman’s lifeless form lay sprawled, her once-coifed hair now a tangle of despair. Her dress, a faded blue, clung to her in a way that spoke of both faded grandeur and violent ends.

Now clearly, A story could be written by ChatGPT that would violate copyright.

We know that software “look and feel” is copyrighted. Similarly, musicians have successfully sued for copyright infringement (but others have failed). Suppose I ask an LLM or other AI to paint a picture in the style of [X] with a subject based on X’s prior work. Would that not be a copyright infringement, despite claims I read here before that studying painting styles is not a copyright violation?

I sense that “art” is so subjective that whether a work is perceived as derivative or new is rather vague and subject to personal interpretation.

I think it will be interesting to see how this plays out as AIs embodied in machines will be capable of creating “art” that mimics various artists to some desired degree.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Now clearly, A story could be written by ChatGPT that would violate copyright.

Yes? If that’s what you told ChatGPT to do, odds were pretty good that it might do so.

We know that software “look and feel” is copyrighted. Similarly, musicians have successfully sued for copyright infringement (but others have failed). Suppose I ask an LLM or other AI to paint a picture in the style of [X] with a subject based on X’s prior work. Would that not be a copyright infringement, despite claims I read here before that studying painting styles is not a copyright violation?

“Look and feel” is about as useful as arguing whether all smartphones violate Apple’s IP because most, if not all smartphones are rectangles with rounded corners.

As for musicians who succeed in their copyright suits and those who don’t, part of it is personal interpretation, the other part is what judges and defenses get brought to the table. “Blurred Lines” didn’t win the fight against Marvin Gaye’s estate, but Ed Sheeran did. “In the style of” isn’t, and shouldn’t be arbitrarily protected by the harshest forms of copyright, because all forms of art take a little cue or inspiration from somewhere else. Otherwise you’d have Sheeran and other artists having to manually film all their songwriting sessions just to prove they didn’t commit copyright infringement.

I think it will be interesting to see how this plays out as AIs embodied in machines will be capable of creating “art” that mimics various artists to some desired degree.

Or we could simply do what has been previously suggested and make it so AI-generated works can’t be given copyright, just like the monkey selfie – and have to be attributed accordingly as AI-generated.

Remove the profit motive, and I imagine that goes quite a way into giving human artists a chance to compete on demand.

Alex Tolley says:

Re: Re: ChatGPT can violate copyright

I think you are missing the point. If you use LLMs to reduce paying for headcount to do the work, there is no reason not to use prompts that are style specific. “Write a Dr. Who episode in the same style with a similar plot to Chris Boucher in episode [X] about the [Y].” Of course you don’t have to specify it that way, just feed in a number of scripts you like and then ask it to create a new script with the same “feel”.

MAZON used to add a commonly used phrases in a text that readers know may be a signature of a writer. LLMs could easily do the same, mimicking a writer’s style. However, unlike books that are written in the style of and acknowledged as such, companies will not.

I don’t see preventing LLMs from gaining copyright helps. It is already potentially IP theft by the owners of the LLMs.

There is a gradation from studying art by understanding paintings by artists, to painting in a style that is influenced by an artist, all the way to forging paintings and creating new works and labeling them as by the famous painter.

What is different is that machine output is far faster than humans, potentially far cheaper, and therefore industrial-scale output. IDK whether this is acceptable to democratize the output of the humanities, or just plain theft and not paying back the debt that produced the original works that the machines are trained on.

I could envision in a few years producing my own versions of “art” by having a machine churn out novels with plots in the style of favorite authors. I don’t know whether that is a good or bad thing in the long term.

One scenario is that human producers are pared down to relatively few original thinkers with truly novel output. The output is barred from being used for training in any way for a reasonable period of copyright, which is more in line with patent lives. After that, the output can be used for training AIs and producing derivative works. Wholesale classes of work will be automated and theoretically reduced in price as scarcity is removed.

Tanner Andrews (profile) says:

one other problem

One way that humans work is to read a bunch of stuff, then write down their views based on what they have read. For inhstance, I may read a newspaper article about the Governor, and then write commentary suggesting that he ought not say stupid stuff while wearing gogo boots.

Most of what I write is influenced in some wise by what I have read.

The detective story writer not influenced by Hammett is like the sci-fi writer not influenced by Heinlein or Pournelle. Sure, it is possible, but why bother?

The newspaper reporter not affected by Our Fair City is simply one with too little ambition. Give up, go home, and leave it to the AI clickbait generators.

Anonymous Coward says:

Re:

Saying the AI model is itself a copyright violation is like saying that if I watch all of Bob Ross’ “The Joy of Painting” over and over, I am a copyright violation. And that if I produce paintings similar to Ross’, using the methods he demonstrated, with lots of mountains and big old trees and happy little clouds, those are also copyright violations.

This comment has been flagged by the community. Click here to show it.

terop (profile) says:

probably biggest argument for ai violating copyrights comes from causality. the original work was needed to exists in the world before ai could slurp it into the training set, thus making it derivative work. remove “all the original works that were not correctly licensed” and your ai would fail to function. thus ai is based on pirated material and the end for ai is near legally. QED.

MrWilson (profile) says:

Re:

the original work was needed to exists in the world before ai could slurp it into the training set, thus making it derivative work.

That’s not the definition of a derivative work. Again, you don’t understand copyright law and you’re assigning your own meager understanding to it rather than researching to find out if your conclusions are correct.

remove “all the original works that were not correctly licensed” and your ai would fail to function.

Models can be and have been trained on public domain works. It’s also arguable that training on copyrighted works are fair use, but that hasn’t been decided in court or by law yet.

thus ai is based on pirated material and the end for ai is near legally.

Sure, if you don’t pay attention to reality, you might think that. Even if it were determined that the developers must get licenses, there are content owners who will gladly license their works and there are developers with enough investment funding to license some amount of works. So even if we take your uneducated perspective on copyright at face value, you’re still wrong.

This comment has been flagged by the community. Click here to show it.

terop (profile) says:

Re: Re:

It’s also arguable that training on copyrighted works are fair use, but that hasn’t been decided in court or by law yet.

But you have to overcome one significant hurdle before you can declare it fair use: Why is it just not plain ordinary copyright infringement?

The copyright infringement determination is significantly more common and plausible explanation of what is happening with the AI technologies.

MrWilson (profile) says:

Re: Re: Re:

That’s not a significant hurdle to overcome. We just have to look at the four factors. Training a model on past art is functionally like training a human artist to emulate an art style – something art students have been doing for millennia. If training is infringement, then viewing art with human eyes is too.

Infringement is only a plausible explanation if you don’t understand copyright law or how the models are trained and how they function. So it’s entirely understandable that you would think so. The only other possible reason for such a conclusion is just disingenuous wishful thinking.

terop (profile) says:

Re: Re: Re:2

Training a model on past art is functionally like training a human artist to emulate an art style

to me, training an ai system composes to the following parts:
1) the original data
2) list of urls to the original data
3) a script for slurping all the urls to your hard disk
4) a copy of the original data
5) a training algorithm that uses (4) to modify some free parameters
6) significant amount of time to extract all the content from (4) via (5) to the free parameters
7) smaller storage space for free parameters
8) algorithm that generates images from (7).

There’s the causality pattern opened more carefully. The AI’s whole idea is generate (7) and then cut the (7,8) from the system and offer smaller system to users.

Basically this process is doing nothing but
a) compress the original data set
b) lose significant amount of details from the original data to make it smaller

But as you’ve seen, it involves copies of the system and the (7) can be considered as derived work of (1) and thus need a license from the content owners.

MrWilson (profile) says:

Re: Re: Re:3

to me, training an ai system composes to the following parts:

Let me stop you right there. “To you” in this case is completely useless unless you’ve actually read documentation on how they are trained.

list of urls to the original data

This isn’t necessary. You can train on local data.

3) a script for slurping all the urls to your hard disk
4) a copy of the original data

This is all fair use on its own and not related to the training. People download copyrighted images off the internet.

4) a copy of the original data
5) a training algorithm that uses (4) to modify some free parameters

You’re being vague here, which indicates you haven’t actually learned how the models are trained. “The original data” is absent from the trained data because of how the model learns to render using denoising. When you supply “Da Vinci” it incrementally denoises noise into what it associates with “Da Vinci.” Original Da Vinci works are not present in the result.

a) compress the original data set

This is where you’re absolutely wrong. The training process isn’t compression. It’s transformative in that it’s looking at existing works and learning a process to render its own works. It is the model learning how to render, not remember or reproduce.

But as you’ve seen, it involves copies of the system

You’re mixing up terminology here. What system are you referring to? The resulting model or training process or the original works?

and the (7) can be considered as derived work of (1)

No, it can’t because it’s not a copyrighted work in the same medium or similar medium and it’s not composed of any original parts of the original works.

If someone took a copyrighted sound recording and viewed the electromagnetic frequency of the sound in a digital readout and then used that digital readout as inspiration for an abstract painting, the painting would not be considered a derivative work of the original sound recording. The process and the result are so far removed that you wouldn’t be able to tell from whence the painting even came without being told. Despite the inspiration, none of the sound recording is in the painting.

and thus need a license from the content owners.

Given the fact that you’ve previously stated that all uses require a license and that fair use is criminal use, your conclusions are useless.

terop (profile) says:

Re: Re: Re:4

“The original data” is absent from the trained data because of how the model learns to render using denoising.

This is all complete bullshit.

In this junction, you need to learn the alternative scenario of how AI models were created:
1) The AI authors had existing pirate boxes available storing tons of illegal data
2) Their only problem was that putting this data to internet was legally very dangerous
3) to solve the problem, they deviced a plan where some algorithm (==denoising) supposedly removes copyright owners ownership rights to the data
4) Now they’re trying to get the theory that AI stuff is copyright-free and free from copyright owner’s control accepted by the community
5) and finally they plan to profit from their pirate boxes

Basically when courts and the jury evaluate this plausible alternative theory against the reality of AI development, there is enough truth in the story that it gets accepted as what really happened => damage awards will be astronomical…

terop (profile) says:

Re: Re: Re:5

AI developer’s only hope against this theory that results in astronomical damage awards is trying to find facts that separate the activity from the theory:
1) claiming that the original data is not illegal will fail if it comes from known pirate services
2) claiming that the data is licensed from copyright owners will fail if there isnt significant money trail towards copyright owners
3) claiming that the algorithm isnt meant for removing copyright owners ownership rights will fail if their conclusions require freedom from copyright laws responsibilities
4) claiming that damage awards should not be astronomical will fail if (1), (2) and (3) failed.

so separating your activity from known illegal pattern is necessary, but difficult part of making your AI system legal. future will show how well they will succeed in it.

MrWilson (profile) says:

Re: Re: Re:6

1) claiming that the original data is not illegal will fail if it comes from known pirate services

[citation needed]

2) claiming that the data is licensed from copyright owners will fail if there isnt significant money trail towards copyright owners

This isn’t necessary if the act is fair use.

3) claiming that the algorithm isnt meant for removing copyright owners ownership rights will fail if their conclusions require freedom from copyright laws responsibilities

You don’t understand how the models work so this is just absurd. The algorithm can’t be meant for removing copyright ownership rights because the results aren’t copies of the original works. There’s no works in the results to remove rights from. It’s like seeing a painting, getting inspired, and then painting your own painting and then some daft IP maximalist such as yourself comes along and claims the 2nd work is clearly derivative of the 1st despite it involving no copying.

4) claiming that damage awards should not be astronomical will fail if (1), (2) and (3) failed.

I guess it’s a good thing you pulled 1, 2, and 3 out of your ass rather than any knowledgeable basis in how the models work, how copyright law works, or generally how anything works.

so separating your activity from known illegal pattern is necessary, but difficult part of making your AI system legal.

Not difficult if “known illegal pattern” just consists of your copyright fan fiction fever dreams.

future will show how well they will succeed in it.

It will indeed.

terop (profile) says:

Re: Re: Re:7

The algorithm can’t be meant for removing copyright ownership rights because the results aren’t copies of the original works. There’s no works in the results to remove rights from.

why does the original works need to be part of the training dataset, if the training process doesn’t use the data? Either your above claim is incorrect or the AI companies are unnecessarily keeping people’s copyrighted works in their training sets.

Here’s a test if the training set images were needed: remove the damn copyrighted works from the training set, re-execute the process and see how well your AI will work. If it fails, the training set images were actually needed and you will need to license the material.

MrWilson (profile) says:

Re: Re: Re:8

why does the original works need to be part of the training dataset, if the training process doesn’t use the data?

You’re confused here. The training process does use the original works, as I explained. The result doesn’t use them.

But the greater issue is that you still don’t understand how the models work and how they are trained. There are plenty of articles online that explain the process. Read one of them. Hell, ask ChatGPT to summarize one for you if you can’t be bothered to read a whole article.

The original works are only for training, the way you might look at Renaissance paintings before painting in the style of a Renaissance artist. You’re not looking at the original works and trying reproduce them stroke for stroke. You’re remembering what you learned when you looked at them. What you “remember” is brush stroke styles, color palettes, lighting and contrast, etc. You didn’t learn to copy and paste of Ginevra De Benci’s face. You learned how render any subject matter like the artist. This is literally how many artists learn to render in their given mediums.

Here’s a test if the training set images were needed: remove the damn copyrighted works from the training set, re-execute the process and see how well your AI will work. If it fails, the training set images were actually needed and you will need to license the material.

You’re confusing copyright law issues with technical issues.

The models don’t need copyrighted works. They just need works to train on. Not all models are trained on copyrighted works. You would know this if you had done any useful research on the topic.

You can download copyrighted images off the internet, study them, and learn to render images in the same style without needing permission from the copyright owner. That is a fair use. Adding “with a machine” doesn’t change that.

terop (profile) says:

Re: Re: Re:9

The models don’t need copyrighted works. They just need works to train on.

Every author gets copyright when the work is created. There’s no exceptions to this rule, so your 2nd statement doesn’t make sense since everything created by mankind is covered by copyright.

You can download copyrighted images off the internet, study them, and learn to render images in the same style without needing permission from the copyright owner.

This isn’t true. All the jpg/png images you can find on internet is covered by copyright and their use in the above process is illegal without express permission from copyright owners.

MrWilson (profile) says:

Re: Re: Re:10

Every author gets copyright when the work is created. There’s no exceptions to this rule, so your 2nd statement doesn’t make sense since everything created by mankind is covered by copyright.

I should not be surprised by your ignorance anymore, but here you are being completely fucking ignorant of the existence of the public domain!

All works published from 1928 or before are in the public domain in the US. Some works published between 1928 and 1972 are in the public domain if their copyright status wasn’t renewed.

Digital reproductions of public domain works are also in the public domain.

In addition to these, many government works are automatically in the public domain.

So yes, there are significant exceptions to this rule.

You could train a model entirely on NASA photos that are in the public domain.

This isn’t true. All the jpg/png images you can find on internet is covered by copyright and their use in the above process is illegal without express permission from copyright owners.

No, not at all.

Your web browser downloads images when you visit a website. That is fair use. If you’re using images to learn to draw using them as inspiration, that is fair use.

For your assertion to be correct, Pinterest would be illegal. Google Image search would be illegal. Browser caches would be illegal. Art classes would be illegal.

Also, not all jpg/png images you can find on the internet are covered by copyright.

You have outdone yourself this time. The public domain is one of the most important aspects of US Copyright Law. You really don’t know shit, dude. This was already true, but this just cements it: Nothing you say about copyright will ever be useful. All of your claims are based on bullshit and ignorance.

terop (profile) says:

Re: Re: Re:11

Your web browser downloads images when you visit a website.

Downloading already had bad reputation long before web browsers were invented. This is why web browsers are doing significant security work to prevent unauthorised copying of the downloaded material, in order to qualify for the caching exception of the copyright laws. The caching exception was significant success for the web browsers, even though downloading had bad reputation in the marketplace, limited downloading was acceptable. Basically browser’s security features prevent downloading of large amount of material, the downloaded files are only displayed to the users, but further distribution and copying is being prevented and thus web browsers just fetch the data again from original location and the author of the material can control who gets access to the data and who has not paid their fees yet.

This kind of serious security work is what lets them past the copyright law’s requirements. Sadly for jpg/png files, their only purpose is to display it to users, so the jpg/png authors were denied compensation when web browsers flood the market with free image files. But the authors can always choose different platform or technology to build their content, so the freebie aspect only affects authors specilizing on png/jpg creation.

MrWilson (profile) says:

Re: Re: Re:14

Sure, all the details in the comment are not coming from the lawbooks, but instead facts from the actual browser technology is used.

Except you addressed the functionality of browsers in the context of the law, so your knowledge should come from the law if you’re going to purport to explain how they function from a legal standpoint.

But I appreciate you admitting for once that your legal analysis is not based on the law. Not that it wasn’t already apparent, but admitting you have a problem is the first step to addressing it.

terop (profile) says:

Re: Re: Re:15

But I appreciate you admitting for once that your legal analysis is not based on the law.

Why is facts involving browser technology not important? The facts are what determines legality of the solutions they offer. The technological solutions have often property that they try to escape the legal limits that lawyers have set to keep technological solutions in control. But failing to respect the legal limits will result in pirated and dangerous solutions.

Guess what is the most requested feature that the tech people are currently unable to implement? Its star wars lightsabers. The tech people are failing to make the weapon safe enough for children to use. Legal limits on it enforces that the attempts to build a working weapon will not be available to ignorant children. Why would we allow this horror?

You have not explained why implementing this children’s dream weapon would need to follow legal safety limits where kids are unable to accidentally kill their costudents and parents, if your copyright minimalist sloppy practices are so good that everyone must follow them?

MrWilson (profile) says:

Re: Re: Re:16

The facts are what determines legality of the solutions they offer.

No, the law and caselaw determine legality.

the legal limits that lawyers have set

Lawyers don’t set legal limits. Lawmakers do.

So you’re proving that not only do you not understand how US copyright laws work, you also don’t know how all US laws work.

Guess what is the most requested feature that the tech people are currently unable to implement? Its star wars lightsabers.

Again, you’re confusing copyright laws with product safety laws. They are very different.

The tech people are failing to make the weapon safe enough for children to use.

The tech people aren’t capable of making lightsabers because kyber crystals are fiction, much like your knowledge of laws.

You have not explained why

Why would I explain anything relating to a fictional product that isn’t possible to make in an analysis of your delusions about US copyright law that wouldn’t even be related if they actually existed?

terop (profile) says:

Re: Re: Re:17

Why would I explain anything relating to a fictional product that isn’t possible to make in an analysis of your delusions about US copyright law

The copyright minimalist practices are your solution to copyright problem, so you should be consistent in your techniques. Your sloppy minimalist principle is what forces these absurd results. When strict copyright people have long ago rejected star wars lightsabers as extreamly dangerous gadgets, you still keep insisting that the product must be allowed for fair use reasons, even though it has potential to kill anyone who uses it and their family.

Strict copyright would prevent this because “cloning” products from tv-shows is illegal copyright infringement. When it’s not allowed, all product developers will need to start from scratch, and think of solutions on their own merits instead of under delusion of how many people watch star wars from tv. And when they develop the product from scratch, they will notice during development that the gadget is too dangerous for the mass market, and thus need to scale down their operation or reject the product outright. This is the reason why market currently don’t have (popular) lightsabers. These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

MrWilson (profile) says:

Re: Re: Re:18

Not only are you wrong about realistic things like copyright laws, you’re even wrong about fictional things like lightsabers.

Real, functional lightsabers are not physically possible. If they were, someone would have already made one. They aren’t on the market because they aren’t possible to produce. Copyright law is also not stopping them from being made. First, copyright doesn’t cover the concept of a plasma sword. Trademark covers the word lightsaber. If you could produce a real lightsaber that runs on the magical physics portrayed in the movies, copyright wouldn’t stop you. You just wouldn’t be able to market it as a “lightsaber.” You’d just call it a different name. But it would also be classified as a weapon and likely subject to legal limitations of production and use, especially for a weapon that could melt through walls and doors and people very easily.

But here’s the thing: people already make lightsabers, albeit replica props and cosplay dueling models and they already sell them, sometimes under other names, but sometimes literally under the name “lightsabers.”

Look at https://ultrasabers.com/.

You can download 3D models of lightsabers from websites like Printables and print your own lightsaber. Or you could just design your own model in CAD software or Blender. You’d think someone who writes software relating to 3D models would already know this.

Copyright stops none of this already. So even withstanding your ignorance of real world physics, you’re still wrong about copyright law.

Strict copyright would prevent this because “cloning” products from tv-shows is illegal copyright infringement.

Apparently you’ve never heard of cosplay. People “clone” products from TV shows all the fucking time. They’ve been doing it for decades. Google “1977 Star Wars cosplay.”

And when they develop the product from scratch, they will notice during development that the gadget is too dangerous for the mass market, and thus need to scale down their operation or reject the product outright.

Nobody needs to make a lightsaber from scratch to know that it’s dangerous. It’s literally a weapon that can cut people in half with a simple stroke. They just have to have seen a Star Wars movie or show or video game.

These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

Yes, people who want to make dangerous weapons may fail to notice that their weapons are weapons. That’s a very insightful statement. Brilliant.

You’re get less coherent as you go on. Stop while you’re behind.

terop (profile) says:

Re: Re: Re:19

These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

Yes, people who want to make dangerous weapons may fail to notice that their weapons are weapons. That’s a very insightful statement. Brilliant.

You shouldn’t dismiss these concerns so easily. We created 150 million units of phones and people staring at the phone screen watching interesting ads while walking through a dangerous gaps in the sidewalk have been falling to their death because the content on the phone is more interesting to them than the dangeround surroundings they’re walking through.

It’s responsibility of every product developer that these deaths do not occur at all. If they still happen after careful examination of the product safety aspects, extra effort needs to be spent to ensure that users know about limitations of the products they buy.

When the product volumes increase, these deaths are inevitable. They happen for various kinds of reasons and we have darwin award to track their reappearance. But product developers must do everything in their power to prevent such occurances, and withdrawing the whole product from the market is efficient way to tackle that issue, although expensive decision.

MrWilson (profile) says:

Re: Re: Re:20

people staring at the phone screen watching interesting ads while walking through a dangerous gaps in the sidewalk have been falling to their death because the content on the phone is more interesting to them than the dangeround surroundings they’re walking through.

That’s not a smartphone problem. It’s not a new problem at all. People used to do that with Walkmans and portable CD players and Gameboys. People used to do that with newspapers for decades before that. That’s not a product design problem. That also has nothing to do with copyright.

It’s responsibility of every product developer that these deaths do not occur at all.

If that were true then guns couldn’t be legally manufactured, yet there are more guns in the US than people. You have no grasp on reality.

But product developers must do everything in their power to prevent such occurances

That’s not actually the law as it relates to product liability. Add one more area of US law you don’t know anything about.

terop (profile) says:

Re: Re: Re:21

That also has nothing to do with copyright.

You’re mistaken. When companies copyright get diluted and losing its power, the money flow from software development activity disappears and the company need to cut corners to keep producing the product. Cheaper parts, less testing, more dangerous product behaviours are guaranteed to cause problems in the marketplace. Money is really what ensures that the products are safe to use and operate. And copyright is all about moving money from the market to the correct entity that did the hard work of product development, instead of giving it to chinese copycats.

MrWilson (profile) says:

Re: Re: Re:5

1) The AI authors had existing pirate boxes available storing tons of illegal data

This is complete bullshit. You’re asserting copyright violations without evidence based on your previous, completely incorrect, and completely debunked notion that all fair use is a copyright violation. And it’s not the start of the process at all. You’re asserting that image generation is just a scheme by people who intend to violate copyrights to have cover for their copyright violations. Image generation is a process by software engineers (who are far more successful than you). There’s also a techbropreneur aspect to it, but that’s not the entirety of the market for image generation.

2) Their only problem was that putting this data to internet was legally very dangerous

This isn’t a problem because you’ve made this up whole cloth.

3) to solve the problem, they deviced a plan where some algorithm (==denoising) supposedly removes copyright owners ownership rights to the data

This is incorrect, even if we pretend you were accurately describing the process. They didn’t devise a plan to remove copyright ownership rights. The devised a method of generating completely new images. And the image generation doesn’t remove any copyright ownership rights because the original work is not copied in the generation process and the original work continues to exist.

4) Now they’re trying to get the theory that AI stuff is copyright-free and free from copyright owner’s control accepted by the community

Yes, because that’s an accurate description of the result of the process. How dare they accurately describe what’s actually happening!?!

5) and finally they plan to profit from their pirate boxes

The irony is that the “copyright owners” of the most profitable copyrights are themselves “pirates” who have ripped off the work of the actual artists with bad contract terms, fraudulent Hollywood accounting, and actual copyright violations shored up by their deep pockets to pay for lawyers to fuck over the little guy if they happen to get caught.

There are people who are going to profit from AI image generation. Many already have. But you’re complaining about grifters and rent-collecting philistines who have no actual part in the artistic, creative process not getting paid for their grifts.

Basically when courts and the jury evaluate this plausible alternative theory against the reality of AI development, there is enough truth in the story that it gets accepted as what really happened => damage awards will be astronomical…

That you think the truth is essential to how court cases play out just shows you know fuck all about what you’re talking about. Court cases can be decided by bias with a twisting of facts and even intentional ignorance of precedents and caselaw history. You’re writing fan fiction again. It’s not useful.

terop (profile) says:

Re: Re: Re:6

This is complete bullshit. You’re asserting copyright violations without evidence based on your previous, completely incorrect, and completely debunked notion that all fair use is a copyright violation.

You only need to evaluate (using AI people’s own image interpolation techniques) how near this alternative illegal theory is from the stuff that the companies are actually doing. This process above has explicitly been decided to choose interpretation of the activity which is explicitly known to be illegal piracy activity. I leave it to the courts to evaluate how near AI’s real activity is from the known piracy process. But I’m afraid that the jury will declare the processes as exact copies of each other, even so much that they’re wondering if AI folks borrowed the techniques directly from pirate sites and are violating copyrights in that process in altertive ways.

MrWilson (profile) says:

Re: Re: Re:7

You’ve entered the territory of some unhinged fan fiction. You’re asserting that software engineers are designing how software functions based on a desire to replicate unrelated illegal activity. Did these soft engineers also murder JFK and Jimmy Hoffa? You’re asserting that you know the mindset and intentions of thousands of different people. Did the aliens give you mind-reading technology? You’re incoherent again.

terop (profile) says:

Re: Re: Re:8

You’re asserting that software engineers are designing how software functions based on a desire to replicate unrelated illegal activity.

That’s what internet demands from these poor developers. If they have a little lesser stability in their development processes, they will clone all their competitors technology and kill the industries with their own technology + free products.

The only reason i havent got crazy with these internet’s cloning demands is because i learned my development process before internet could mess with it. If you let internet change your development practices, you’ll be sitting in court responsible of large scale copyright/patent infringement. Only way to avoid it is reject the requests and take the risk that you cannot implement as good products as what competitors managed to pull off. But this is exactly what copyright law expects from you, instead of cloning other people’s products, you’ll have something original.

terop (profile) says:

Re: Re: Re:10

You continue to fail to learn how the models work and you apparently don’t know the origins of the various technologies and the people who have developed them.

none of the above matters one bit, the only important information is whether these people followed the strictest copyright maximalist principles when developing their products. Only through constant maximalist view to world’s events can they extract enough maximalist propaganda from the fabric of the universe to implement products worthy of RIAA’s and MPAA’s praise. Its the listening of every entity on the planet will they learn proper maximalist principles. Just recently we came to a realisation that hdmi cable compability failures are caused by MPAA’s copy protection technology.

terop (profile) says:

Re: Re: Re:12

The irony is that you think you’re being absurd now, but you think your other claims are legitimate, but everything you say is actually absurd.

Why do you think that people who develop products to the market are free from liability when the users misuse the products for illegal operations? Companies usually prepare for this liability by taking insurance that covers the horror involved, assuming the company can find insurance company stupid enough to take the responsibility. But the liability flows from the damage that these products can cause in the marketplace. The liability need to hit the product developers and anyone involved and able to control the misuse, so that there will be incentive to build better products, free from misuse, to the market.

World is a big place and every entity in it deserves a voice in the tables where important product safety decisions are being made.

MrWilson (profile) says:

Re: Re: Re:13

You’re conflating different concepts here. Product safety and liability for criminal misuse of a product are two different things. There are different laws and practices that address these different topics. Nobody in their right mind considers the ability of a product to be used for copyright infringement to be a “product safety” issue. You should research (in general also, but) specifically the laws and lawsuits that found that recordable video tapes and cassette tapes are legal.

Start with the Betamax case and the AHRA.

terop (profile) says:

Re: Re: Re:14

Nobody in their right mind considers the ability of a product to be used for copyright infringement to be a “product safety” issue.

Well, finnish government explicitly considers chinese pirate products as product safety issue, mostly because of illegal/dangerous electricity gadgets.

But even if we considered just the compensation issue of copyrights, it is significant product safety issue if companies who create the products cannot get enough compensation from the markets and need to cut corners/save money by killing product safety features. Especially electric gadgets all need to have significant enough compensation that the companies do not need to use cheaper and more dangerous parts to develop the products.

MrWilson (profile) says:

Re: Re: Re:15

Well, finnish government explicitly considers chinese pirate products as product safety issue, mostly because of illegal/dangerous electricity gadgets.

And that is not copyright-related. Non sequitur #5000!

But even if we considered just the compensation issue of copyrights, it is significant product safety issue if companies who create the products cannot get enough compensation from the markets and need to cut corners/save money by killing product safety features.

We don’t need to consider unrelated topics.

terop (profile) says:

Re: Re: Re:16

We don’t need to consider unrelated topics.

la la la la… why are you putting fingers to your ears when the facts splash to your face? Is copyright really so terrible when it actually solves one of the biggest issues in product development, i.e. where to get money to implement the law’s requirement for product safety features?

terop (profile) says:

Re: Re: Re:4

The process and the result are so far removed that you wouldn’t be able to tell from whence the painting even came without being told.

This isn’t the case with AI. The AI is reproducing exactly the same kind of output than what your inputs are. If your input is png images, you will get an image as AI’s output. So there is no far distance between the input images and output images, and the requirements for fair use’s transformative works do not apply. This kills your whole fair use theory.

You can figure out the similarity of AI’s input and output by examining the TYPES of the programming language constructs needed to store the data. The whole AI’s process can be summarised as [IMAGE] -> IMAGE, which takes multiple images as inputs, and outputs single image as output. While there are math processes like interpolation and neuralnetwork connections inside the function, the process as a whole cannot overcome the legal transformative works requirements.

This same problem was with the process that youtube-dl used to download files from the youtube and publish their .mp4 representation to the users in format which could be freely copied in the internet. While it sounded like useful operation, it needed to bypass youtube’s technological protection measures to download the data, and then IT DID NOT EXECUTE COMPLEX ENOUGH OPERATIONS TO REACH FAIR USE DETERMINATION VIA TRANSFORMATIVE WORKS. For this, there is existing legal paperwork and RIAA’s position is perfectly clear on this issue. Thus if you listen RIAA’s position, you can only conclude that the transformative operations requirement in fair use is stricter than what ordinary byte transforms can do.

terop (profile) says:

Re: Re: Re:4

This is all fair use on its own and not related to the training. People download copyrighted images off the internet.

Downloading had bad reputation long before internet browsers were invented. You have to understand that download operation is legally dubious. Browsers can do the download without legal problems only because they invested signficant amount of time and effort to build security system powerful enough to cut out secondary copies and widespread copy operations happening on the downloaded data. It’s browser’s security that saves their legal story.

Now if you “bypass the technical protection measures”, i.e. browser’s security feature by downloading the data via other methods unrelated to browser technology, those legal protections that browser-using populating is enjoying, is no longer available to you and you’ll be a filthy pirate who swaps zip files on the internet illegally and hacks to fbi, nsa, police, national libraries, school computer systems etc. I.e. you’re no longer ordinary citizen that tries to do good work, but you become filthy criminal.

Anonymous Coward says:

Re: Re: Re:2

Training does involve a copy. It just isn’t getting copied into the model.

And thus no copy is being made, just as AC said.

The person doing the training is using copyrighted material and that use is likely fair use.

Er, no. The copyrighted material in this case is being perceived (read, ‘seen’, etc.), which would be recognized as OK under the First Sale Doctrine if it were done by a human rather than a computer.

You’re right if we’re only talking about the model and not the act of training the model.

Again, no. AC is right in both cases. How do you get so many insightful quotes on this site when you’re so bad at arguing points of law?

MrWilson (profile) says:

Re: Re: Re:3

And thus no copy is being made, just as AC said.

You didn’t track the comments. My original comment referred to training specifically: “It’s also arguable that training on copyrighted works are fair use…” To which AC responded “Even though no actual copying takes place?” but copying does happen at the beginning of the process when the human is training the model. The model itself isn’t copying though. The human is copying from other sources in order to have something to train the model on. And that is likely fair use, as I said.

Er, no. The copyrighted material in this case is being perceived (read, ‘seen’, etc.), which would be recognized as OK under the First Sale Doctrine if it were done by a human rather than a computer.

The first sale doctrine wouldn’t apply because we’re not even talking about purchased works. You can download copyrighted material from the internet without a purchase and that’s fair use. You’re adding unnecessary complications if you base it on the doctrine of first sale. That’s more useful if you’re talking about physical copies of works rather than digital copies.

Again, no. AC is right in both cases.

You and the AC are confused about what we’re talking about. Copying does happen during the training by the human because the human has to have a copy of a work to train the model on. That is fair use. The “no copying” part is in the model’s process of denoising noise to render an image via word associations, which is the function of the model rather than its training process.

How do you get so many insightful quotes on this site when you’re so bad at arguing points of law?

We’re not just arguing law. We’re also talking about technical aspects of generative models. But if you want to know why people vote for my comments, you’ll have to ask them.

terop (profile) says:

Re: Re: Re:2

The person doing the training is using copyrighted material and that use is likely fair use.

Why the fuck would this be fair use?

There’s well known precedent of illegal copyright infringement where people tried to execute cleanroom techniques via routing the data through human brain and trying to argue that since it stayed in human brain for 2 seconds, it would be different enough to qualify for fair use, because it would be scrambled…

Our position on scrambling problem is that while human brain is capable of scrambling the data, it only occurs during long amount of time. So if you “remember” techniques from 1980s, then there’s possibility to rely on scrambling defense, but shorter brain visits are just copyright infringements.

People are “free” to rely on their experience gathered through decades of work, but if management gave them “access” to copyrighted work 2 weeks before asking to implement a clone of it, then those defenses would simply fail.

This AI thing is comparable to giving human brain 2 seconds time to scramble the work…. its simply not different enough…

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...