MrWilson (profile)

September 5, 2024 at 4:25 pm

One problem with the claims is that even if AI art generators worked as people falsely believe, it still wouldn’t constitute copyright infringement. This isn’t how AI art generators work, but if you put a thousand different paintings in a blender and then used the random shredded pieces to make a new work, it wouldn’t be derivative of any particular work or all thousand. At a certain point, an element of a work is too small to identify the source and also too small to claim copyright on. Likewise, you could “copy” words from a book separately and form new sentences in completely different word orders and it wouldn’t be derivative because individual words aren’t able to copyrighted and you’re not reproducing the sentences themselves.

What makes derivative claims for AI models not really work (not that this is how courts will ultimately rule) is that AI art models aren’t “using” the original works in the output. They’re not blending or collaging or copying. They learn to render outputs through denoising and word association in the prompt. The training works aren’t in the output. The model doesn’t retain a copy of them, only the lesson it learned in how to render via word association when it was trained. Imagine knowing how to paint a Da Vinci style painting but not remembering what any particular Da Vinci piece looks like.

Anonymous Coward

September 6, 2024 at 5:45 am

Re:

“if you put a thousand different paintings in a blender and then used the random shredded pieces to make a new work”

This sounds like what happens in a human brain. Human looks at a bunch of stuff and creates their own interpretation of what their senses detect.

Arianity

September 6, 2024 at 9:46 am

Re:

The training works aren’t in the output.

They often are. It’s an open problem that training data is often overrepresented in the output, and something researchers are actively working on suppressing. Although for newer/bigger models, you often need to hit it with just the right prompt.

See e.g. 1 , 2

To quote:

It is well known that generative language models risk replication from their training set [9, 10] and the amount of replicated data is broadly proportional to the size of the model, amount of duplication of the data point in the training set, and the amount of prompting. Interestingly, such replication behavior occurs even for models that are not overfitting to their training data [33,60].

MrWilson (profile)

September 6, 2024 at 1:35 pm

Re: Re:

Except it’s literally not. I’m not saying the output will not look like the original works. I’m saying it literally doesn’t include the training works because of the rendering method. Note that your citations say data, not works. This isn’t copying where the training works are stored, recalled, and then inserted into the result. We’re talking about rendering via denoising and word association. Output that looks like the training works are a testament to the quality of the rendering method, not to the presence of a copy of the training works in the model. The models size is too small to contain all the training works.

Anonymous Coward

September 5, 2024 at 5:35 pm

Need a LLM instance that does nothing but train on CMI plus some stats of and about the covered materials. Might be interesting.

Alex Tolley

September 6, 2024 at 7:04 am

ChatGPT can violate copyright

It is established that one cannot write a novel using characters that an author has used as their own. You need to obtain permission from the author or their estate to do so.

And yet I can do this with ChatGPT:

Me: In the style of Raymond Chandler, write an opening scene where a homicide detective discovers Philip Marlowe examining a dead body of a woman.

ChatGPT: The fog was a sullen shroud, wrapping the city in a clammy embrace. Street lamps stood like sentinels, their dim light filtered through the haze, casting long shadows that twisted and turned as if in silent protest. The alley where I found myself was the kind of place where secrets went to die, and tonight it had swallowed a fresh one—a woman who would never see another dawn.

The stench of decay mingled with the sour tang of spilled whiskey and cheap perfume. My shoes squelched through the grime, each step a squish of unwelcome intimacy with the alley’s filthy embrace. I was here on the double; someone had phoned it in, and though the voice on the other end had been more gravelly than informative, the message had been clear: find the body, find Marlowe.

And there he was, like a figure cut from smoke and shadows, crouched beside the corpse with the detached curiosity of a man who had seen death’s numerous variations. Philip Marlowe, the private eye with a knack for finding trouble even when it wasn’t looking for him. His trench coat flared around him like a dark halo, and his fedora cast a shifty silhouette against the pavement. The woman’s lifeless form lay sprawled, her once-coifed hair now a tangle of despair. Her dress, a faded blue, clung to her in a way that spoke of both faded grandeur and violent ends.

Now clearly, A story could be written by ChatGPT that would violate copyright.

We know that software “look and feel” is copyrighted. Similarly, musicians have successfully sued for copyright infringement (but others have failed). Suppose I ask an LLM or other AI to paint a picture in the style of [X] with a subject based on X’s prior work. Would that not be a copyright infringement, despite claims I read here before that studying painting styles is not a copyright violation?

I sense that “art” is so subjective that whether a work is perceived as derivative or new is rather vague and subject to personal interpretation.

I think it will be interesting to see how this plays out as AIs embodied in machines will be capable of creating “art” that mimics various artists to some desired degree.

Anonymous Coward

September 6, 2024 at 5:55 pm

Re:

Now clearly, A story could be written by ChatGPT that would violate copyright.

Yes? If that’s what you told ChatGPT to do, odds were pretty good that it might do so.

We know that software “look and feel” is copyrighted. Similarly, musicians have successfully sued for copyright infringement (but others have failed). Suppose I ask an LLM or other AI to paint a picture in the style of [X] with a subject based on X’s prior work. Would that not be a copyright infringement, despite claims I read here before that studying painting styles is not a copyright violation?

“Look and feel” is about as useful as arguing whether all smartphones violate Apple’s IP because most, if not all smartphones are rectangles with rounded corners.

As for musicians who succeed in their copyright suits and those who don’t, part of it is personal interpretation, the other part is what judges and defenses get brought to the table. “Blurred Lines” didn’t win the fight against Marvin Gaye’s estate, but Ed Sheeran did. “In the style of” isn’t, and shouldn’t be arbitrarily protected by the harshest forms of copyright, because all forms of art take a little cue or inspiration from somewhere else. Otherwise you’d have Sheeran and other artists having to manually film all their songwriting sessions just to prove they didn’t commit copyright infringement.

I think it will be interesting to see how this plays out as AIs embodied in machines will be capable of creating “art” that mimics various artists to some desired degree.

Or we could simply do what has been previously suggested and make it so AI-generated works can’t be given copyright, just like the monkey selfie – and have to be attributed accordingly as AI-generated.

Remove the profit motive, and I imagine that goes quite a way into giving human artists a chance to compete on demand.

Alex Tolley

September 6, 2024 at 9:00 pm

Re: Re: ChatGPT can violate copyright

I think you are missing the point. If you use LLMs to reduce paying for headcount to do the work, there is no reason not to use prompts that are style specific. “Write a Dr. Who episode in the same style with a similar plot to Chris Boucher in episode [X] about the [Y].” Of course you don’t have to specify it that way, just feed in a number of scripts you like and then ask it to create a new script with the same “feel”.

MAZON used to add a commonly used phrases in a text that readers know may be a signature of a writer. LLMs could easily do the same, mimicking a writer’s style. However, unlike books that are written in the style of and acknowledged as such, companies will not.

I don’t see preventing LLMs from gaining copyright helps. It is already potentially IP theft by the owners of the LLMs.

There is a gradation from studying art by understanding paintings by artists, to painting in a style that is influenced by an artist, all the way to forging paintings and creating new works and labeling them as by the famous painter.

What is different is that machine output is far faster than humans, potentially far cheaper, and therefore industrial-scale output. IDK whether this is acceptable to democratize the output of the humanities, or just plain theft and not paying back the debt that produced the original works that the machines are trained on.

I could envision in a few years producing my own versions of “art” by having a machine churn out novels with plots in the style of favorite authors. I don’t know whether that is a good or bad thing in the long term.

One scenario is that human producers are pared down to relatively few original thinkers with truly novel output. The output is barred from being used for training in any way for a reasonable period of copyright, which is more in line with patent lives. After that, the output can be used for training AIs and producing derivative works. Wholesale classes of work will be automated and theoretically reduced in price as scarcity is removed.

MrWilson (profile)

September 6, 2024 at 7:13 pm

Re:

ChatGPT can violate copyright

ChatGPT is just a tool. You can violate copyright by prompting ChatGPT to use someone else’s characters. That’s what happened with your example.

Anonymous Coward

September 7, 2024 at 6:29 am

Re: Re:

ChatGPT can violate copyright

so can my typewriter

MrWilson (profile)

September 7, 2024 at 9:48 am

Re: Re: Re:

Let me know when your typewriter becomes sentient. ChatGPT isn’t.

Tanner Andrews (profile)

September 7, 2024 at 4:58 am

one other problem

One way that humans work is to read a bunch of stuff, then write down their views based on what they have read. For inhstance, I may read a newspaper article about the Governor, and then write commentary suggesting that he ought not say stupid stuff while wearing gogo boots.

Most of what I write is influenced in some wise by what I have read.

The detective story writer not influenced by Hammett is like the sci-fi writer not influenced by Heinlein or Pournelle. Sure, it is possible, but why bother?

The newspaper reporter not affected by Our Fair City is simply one with too little ambition. Give up, go home, and leave it to the AI clickbait generators.

Anonymous Coward

September 11, 2024 at 12:00 pm

Re:

Saying the AI model is itself a copyright violation is like saying that if I watch all of Bob Ross’ “The Joy of Painting” over and over, I am a copyright violation. And that if I produce paintings similar to Ross’, using the methods he demonstrated, with lots of mountains and big old trees and happy little clouds, those are also copyright violations.

MrWilson (profile)

September 11, 2024 at 8:11 pm

Re: Re:

The people saying the models are violating copyright are either ignorant of how the models work and/or they’re disingenuously attempting to claim that any use of a work is a copyright violation, including completely legal fair use written into the law itself and demonstrated a thousand times over in caselaw.

This comment has been flagged by the community. Click here to show it.

terop (profile)

September 7, 2024 at 10:25 am

probably biggest argument for ai violating copyrights comes from causality. the original work was needed to exists in the world before ai could slurp it into the training set, thus making it derivative work. remove “all the original works that were not correctly licensed” and your ai would fail to function. thus ai is based on pirated material and the end for ai is near legally. QED.

MrWilson (profile)

September 8, 2024 at 5:47 pm

Re:

the original work was needed to exists in the world before ai could slurp it into the training set, thus making it derivative work.

That’s not the definition of a derivative work. Again, you don’t understand copyright law and you’re assigning your own meager understanding to it rather than researching to find out if your conclusions are correct.

remove “all the original works that were not correctly licensed” and your ai would fail to function.

Models can be and have been trained on public domain works. It’s also arguable that training on copyrighted works are fair use, but that hasn’t been decided in court or by law yet.

thus ai is based on pirated material and the end for ai is near legally.

Sure, if you don’t pay attention to reality, you might think that. Even if it were determined that the developers must get licenses, there are content owners who will gladly license their works and there are developers with enough investment funding to license some amount of works. So even if we take your uneducated perspective on copyright at face value, you’re still wrong.

This comment has been flagged by the community. Click here to show it.

terop (profile)

September 8, 2024 at 6:30 pm

Re: Re:

It’s also arguable that training on copyrighted works are fair use, but that hasn’t been decided in court or by law yet.

But you have to overcome one significant hurdle before you can declare it fair use: Why is it just not plain ordinary copyright infringement?

The copyright infringement determination is significantly more common and plausible explanation of what is happening with the AI technologies.

MrWilson (profile)

September 9, 2024 at 3:23 pm

Re: Re: Re:

That’s not a significant hurdle to overcome. We just have to look at the four factors. Training a model on past art is functionally like training a human artist to emulate an art style – something art students have been doing for millennia. If training is infringement, then viewing art with human eyes is too.

Infringement is only a plausible explanation if you don’t understand copyright law or how the models are trained and how they function. So it’s entirely understandable that you would think so. The only other possible reason for such a conclusion is just disingenuous wishful thinking.

terop (profile)

September 9, 2024 at 5:22 pm

Re: Re: Re:²

Training a model on past art is functionally like training a human artist to emulate an art style

to me, training an ai system composes to the following parts:
1) the original data
2) list of urls to the original data
3) a script for slurping all the urls to your hard disk
4) a copy of the original data
5) a training algorithm that uses (4) to modify some free parameters
6) significant amount of time to extract all the content from (4) via (5) to the free parameters
7) smaller storage space for free parameters
8) algorithm that generates images from (7).

There’s the causality pattern opened more carefully. The AI’s whole idea is generate (7) and then cut the (7,8) from the system and offer smaller system to users.

Basically this process is doing nothing but
a) compress the original data set
b) lose significant amount of details from the original data to make it smaller

But as you’ve seen, it involves copies of the system and the (7) can be considered as derived work of (1) and thus need a license from the content owners.

MrWilson (profile)

September 9, 2024 at 11:36 pm

Re: Re: Re:³

to me, training an ai system composes to the following parts:

Let me stop you right there. “To you” in this case is completely useless unless you’ve actually read documentation on how they are trained.

list of urls to the original data

This isn’t necessary. You can train on local data.

3) a script for slurping all the urls to your hard disk
4) a copy of the original data

This is all fair use on its own and not related to the training. People download copyrighted images off the internet.

4) a copy of the original data
5) a training algorithm that uses (4) to modify some free parameters

You’re being vague here, which indicates you haven’t actually learned how the models are trained. “The original data” is absent from the trained data because of how the model learns to render using denoising. When you supply “Da Vinci” it incrementally denoises noise into what it associates with “Da Vinci.” Original Da Vinci works are not present in the result.

a) compress the original data set

This is where you’re absolutely wrong. The training process isn’t compression. It’s transformative in that it’s looking at existing works and learning a process to render its own works. It is the model learning how to render, not remember or reproduce.

But as you’ve seen, it involves copies of the system

You’re mixing up terminology here. What system are you referring to? The resulting model or training process or the original works?

and the (7) can be considered as derived work of (1)

No, it can’t because it’s not a copyrighted work in the same medium or similar medium and it’s not composed of any original parts of the original works.

If someone took a copyrighted sound recording and viewed the electromagnetic frequency of the sound in a digital readout and then used that digital readout as inspiration for an abstract painting, the painting would not be considered a derivative work of the original sound recording. The process and the result are so far removed that you wouldn’t be able to tell from whence the painting even came without being told. Despite the inspiration, none of the sound recording is in the painting.

and thus need a license from the content owners.

Given the fact that you’ve previously stated that all uses require a license and that fair use is criminal use, your conclusions are useless.

terop (profile)

September 11, 2024 at 5:42 pm

Re: Re: Re:⁴

“The original data” is absent from the trained data because of how the model learns to render using denoising.

This is all complete bullshit.

In this junction, you need to learn the alternative scenario of how AI models were created:
1) The AI authors had existing pirate boxes available storing tons of illegal data
2) Their only problem was that putting this data to internet was legally very dangerous
3) to solve the problem, they deviced a plan where some algorithm (==denoising) supposedly removes copyright owners ownership rights to the data
4) Now they’re trying to get the theory that AI stuff is copyright-free and free from copyright owner’s control accepted by the community
5) and finally they plan to profit from their pirate boxes

Basically when courts and the jury evaluate this plausible alternative theory against the reality of AI development, there is enough truth in the story that it gets accepted as what really happened => damage awards will be astronomical…

terop (profile)

September 12, 2024 at 4:28 am

Re: Re: Re:⁵

AI developer’s only hope against this theory that results in astronomical damage awards is trying to find facts that separate the activity from the theory:
1) claiming that the original data is not illegal will fail if it comes from known pirate services
2) claiming that the data is licensed from copyright owners will fail if there isnt significant money trail towards copyright owners
3) claiming that the algorithm isnt meant for removing copyright owners ownership rights will fail if their conclusions require freedom from copyright laws responsibilities
4) claiming that damage awards should not be astronomical will fail if (1), (2) and (3) failed.

so separating your activity from known illegal pattern is necessary, but difficult part of making your AI system legal. future will show how well they will succeed in it.

MrWilson (profile)

September 12, 2024 at 8:13 pm

Re: Re: Re:⁶

1) claiming that the original data is not illegal will fail if it comes from known pirate services

[citation needed]

2) claiming that the data is licensed from copyright owners will fail if there isnt significant money trail towards copyright owners

This isn’t necessary if the act is fair use.

3) claiming that the algorithm isnt meant for removing copyright owners ownership rights will fail if their conclusions require freedom from copyright laws responsibilities

You don’t understand how the models work so this is just absurd. The algorithm can’t be meant for removing copyright ownership rights because the results aren’t copies of the original works. There’s no works in the results to remove rights from. It’s like seeing a painting, getting inspired, and then painting your own painting and then some daft IP maximalist such as yourself comes along and claims the 2nd work is clearly derivative of the 1st despite it involving no copying.

4) claiming that damage awards should not be astronomical will fail if (1), (2) and (3) failed.

I guess it’s a good thing you pulled 1, 2, and 3 out of your ass rather than any knowledgeable basis in how the models work, how copyright law works, or generally how anything works.

so separating your activity from known illegal pattern is necessary, but difficult part of making your AI system legal.

Not difficult if “known illegal pattern” just consists of your copyright fan fiction fever dreams.

future will show how well they will succeed in it.

It will indeed.

terop (profile)

September 14, 2024 at 1:25 pm

Re: Re: Re:⁷

The algorithm can’t be meant for removing copyright ownership rights because the results aren’t copies of the original works. There’s no works in the results to remove rights from.

why does the original works need to be part of the training dataset, if the training process doesn’t use the data? Either your above claim is incorrect or the AI companies are unnecessarily keeping people’s copyrighted works in their training sets.

Here’s a test if the training set images were needed: remove the damn copyrighted works from the training set, re-execute the process and see how well your AI will work. If it fails, the training set images were actually needed and you will need to license the material.

MrWilson (profile)

September 14, 2024 at 10:02 pm

Re: Re: Re:⁸

why does the original works need to be part of the training dataset, if the training process doesn’t use the data?

You’re confused here. The training process does use the original works, as I explained. The result doesn’t use them.

But the greater issue is that you still don’t understand how the models work and how they are trained. There are plenty of articles online that explain the process. Read one of them. Hell, ask ChatGPT to summarize one for you if you can’t be bothered to read a whole article.

The original works are only for training, the way you might look at Renaissance paintings before painting in the style of a Renaissance artist. You’re not looking at the original works and trying reproduce them stroke for stroke. You’re remembering what you learned when you looked at them. What you “remember” is brush stroke styles, color palettes, lighting and contrast, etc. You didn’t learn to copy and paste of Ginevra De Benci’s face. You learned how render any subject matter like the artist. This is literally how many artists learn to render in their given mediums.

Here’s a test if the training set images were needed: remove the damn copyrighted works from the training set, re-execute the process and see how well your AI will work. If it fails, the training set images were actually needed and you will need to license the material.

You’re confusing copyright law issues with technical issues.

The models don’t need copyrighted works. They just need works to train on. Not all models are trained on copyrighted works. You would know this if you had done any useful research on the topic.

You can download copyrighted images off the internet, study them, and learn to render images in the same style without needing permission from the copyright owner. That is a fair use. Adding “with a machine” doesn’t change that.

terop (profile)

September 15, 2024 at 11:03 am

Re: Re: Re:⁹

The models don’t need copyrighted works. They just need works to train on.

Every author gets copyright when the work is created. There’s no exceptions to this rule, so your 2nd statement doesn’t make sense since everything created by mankind is covered by copyright.

You can download copyrighted images off the internet, study them, and learn to render images in the same style without needing permission from the copyright owner.

This isn’t true. All the jpg/png images you can find on internet is covered by copyright and their use in the above process is illegal without express permission from copyright owners.

MrWilson (profile)

September 16, 2024 at 8:16 pm

Re: Re: Re:¹⁰

Every author gets copyright when the work is created. There’s no exceptions to this rule, so your 2nd statement doesn’t make sense since everything created by mankind is covered by copyright.

I should not be surprised by your ignorance anymore, but here you are being completely fucking ignorant of the existence of the public domain!

All works published from 1928 or before are in the public domain in the US. Some works published between 1928 and 1972 are in the public domain if their copyright status wasn’t renewed.

Digital reproductions of public domain works are also in the public domain.

In addition to these, many government works are automatically in the public domain.

So yes, there are significant exceptions to this rule.

You could train a model entirely on NASA photos that are in the public domain.

This isn’t true. All the jpg/png images you can find on internet is covered by copyright and their use in the above process is illegal without express permission from copyright owners.

No, not at all.

Your web browser downloads images when you visit a website. That is fair use. If you’re using images to learn to draw using them as inspiration, that is fair use.

For your assertion to be correct, Pinterest would be illegal. Google Image search would be illegal. Browser caches would be illegal. Art classes would be illegal.

Also, not all jpg/png images you can find on the internet are covered by copyright.

You have outdone yourself this time. The public domain is one of the most important aspects of US Copyright Law. You really don’t know shit, dude. This was already true, but this just cements it: Nothing you say about copyright will ever be useful. All of your claims are based on bullshit and ignorance.

terop (profile)

September 19, 2024 at 8:19 pm

Re: Re: Re:¹¹

Your web browser downloads images when you visit a website.

Downloading already had bad reputation long before web browsers were invented. This is why web browsers are doing significant security work to prevent unauthorised copying of the downloaded material, in order to qualify for the caching exception of the copyright laws. The caching exception was significant success for the web browsers, even though downloading had bad reputation in the marketplace, limited downloading was acceptable. Basically browser’s security features prevent downloading of large amount of material, the downloaded files are only displayed to the users, but further distribution and copying is being prevented and thus web browsers just fetch the data again from original location and the author of the material can control who gets access to the data and who has not paid their fees yet.

This kind of serious security work is what lets them past the copyright law’s requirements. Sadly for jpg/png files, their only purpose is to display it to users, so the jpg/png authors were denied compensation when web browsers flood the market with free image files. But the authors can always choose different platform or technology to build their content, so the freebie aspect only affects authors specilizing on png/jpg creation.

MrWilson (profile)

September 19, 2024 at 11:11 pm

Re: Re: Re:¹²

Nope. I’m not even going to pick apart all the bullshit in this comment. You didn’t even know what the public domain was. You don’t know enough about copyright to make any valid claims about it. You’re a dilettante.

terop (profile)

September 21, 2024 at 2:16 am

Re: Re: Re:¹³

I’m not even going to pick apart all the bullshit in this comment.

Sure, all the details in the comment are not coming from the lawbooks, but instead facts from the actual browser technology is used. Since you’ve used browsers, you should be able to verify the claims from your own web browser.

MrWilson (profile)

September 24, 2024 at 6:49 pm

Re: Re: Re:¹⁴

Sure, all the details in the comment are not coming from the lawbooks, but instead facts from the actual browser technology is used.

Except you addressed the functionality of browsers in the context of the law, so your knowledge should come from the law if you’re going to purport to explain how they function from a legal standpoint.

But I appreciate you admitting for once that your legal analysis is not based on the law. Not that it wasn’t already apparent, but admitting you have a problem is the first step to addressing it.

terop (profile)

September 25, 2024 at 9:43 pm

Re: Re: Re:¹⁵

But I appreciate you admitting for once that your legal analysis is not based on the law.

Why is facts involving browser technology not important? The facts are what determines legality of the solutions they offer. The technological solutions have often property that they try to escape the legal limits that lawyers have set to keep technological solutions in control. But failing to respect the legal limits will result in pirated and dangerous solutions.

Guess what is the most requested feature that the tech people are currently unable to implement? Its star wars lightsabers. The tech people are failing to make the weapon safe enough for children to use. Legal limits on it enforces that the attempts to build a working weapon will not be available to ignorant children. Why would we allow this horror?

You have not explained why implementing this children’s dream weapon would need to follow legal safety limits where kids are unable to accidentally kill their costudents and parents, if your copyright minimalist sloppy practices are so good that everyone must follow them?

MrWilson (profile)

September 26, 2024 at 12:35 am

Re: Re: Re:¹⁶

The facts are what determines legality of the solutions they offer.

No, the law and caselaw determine legality.

the legal limits that lawyers have set

Lawyers don’t set legal limits. Lawmakers do.

So you’re proving that not only do you not understand how US copyright laws work, you also don’t know how all US laws work.

Guess what is the most requested feature that the tech people are currently unable to implement? Its star wars lightsabers.

Again, you’re confusing copyright laws with product safety laws. They are very different.

The tech people are failing to make the weapon safe enough for children to use.

The tech people aren’t capable of making lightsabers because kyber crystals are fiction, much like your knowledge of laws.

You have not explained why

Why would I explain anything relating to a fictional product that isn’t possible to make in an analysis of your delusions about US copyright law that wouldn’t even be related if they actually existed?

terop (profile)

September 26, 2024 at 3:49 pm

Re: Re: Re:¹⁷

Why would I explain anything relating to a fictional product that isn’t possible to make in an analysis of your delusions about US copyright law

The copyright minimalist practices are your solution to copyright problem, so you should be consistent in your techniques. Your sloppy minimalist principle is what forces these absurd results. When strict copyright people have long ago rejected star wars lightsabers as extreamly dangerous gadgets, you still keep insisting that the product must be allowed for fair use reasons, even though it has potential to kill anyone who uses it and their family.

Strict copyright would prevent this because “cloning” products from tv-shows is illegal copyright infringement. When it’s not allowed, all product developers will need to start from scratch, and think of solutions on their own merits instead of under delusion of how many people watch star wars from tv. And when they develop the product from scratch, they will notice during development that the gadget is too dangerous for the mass market, and thus need to scale down their operation or reject the product outright. This is the reason why market currently don’t have (popular) lightsabers. These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

MrWilson (profile)

September 26, 2024 at 10:29 pm

Re: Re: Re:¹⁸

Not only are you wrong about realistic things like copyright laws, you’re even wrong about fictional things like lightsabers.

Real, functional lightsabers are not physically possible. If they were, someone would have already made one. They aren’t on the market because they aren’t possible to produce. Copyright law is also not stopping them from being made. First, copyright doesn’t cover the concept of a plasma sword. Trademark covers the word lightsaber. If you could produce a real lightsaber that runs on the magical physics portrayed in the movies, copyright wouldn’t stop you. You just wouldn’t be able to market it as a “lightsaber.” You’d just call it a different name. But it would also be classified as a weapon and likely subject to legal limitations of production and use, especially for a weapon that could melt through walls and doors and people very easily.

But here’s the thing: people already make lightsabers, albeit replica props and cosplay dueling models and they already sell them, sometimes under other names, but sometimes literally under the name “lightsabers.”

Look at https://ultrasabers.com/.

You can download 3D models of lightsabers from websites like Printables and print your own lightsaber. Or you could just design your own model in CAD software or Blender. You’d think someone who writes software relating to 3D models would already know this.

Copyright stops none of this already. So even withstanding your ignorance of real world physics, you’re still wrong about copyright law.

Strict copyright would prevent this because “cloning” products from tv-shows is illegal copyright infringement.

Apparently you’ve never heard of cosplay. People “clone” products from TV shows all the fucking time. They’ve been doing it for decades. Google “1977 Star Wars cosplay.”

And when they develop the product from scratch, they will notice during development that the gadget is too dangerous for the mass market, and thus need to scale down their operation or reject the product outright.

Nobody needs to make a lightsaber from scratch to know that it’s dangerous. It’s literally a weapon that can cut people in half with a simple stroke. They just have to have seen a Star Wars movie or show or video game.

These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

Yes, people who want to make dangerous weapons may fail to notice that their weapons are weapons. That’s a very insightful statement. Brilliant.

You’re get less coherent as you go on. Stop while you’re behind.

terop (profile)

September 29, 2024 at 1:24 pm

Re: Re: Re:¹⁹

These minimalist people who use minimal amount of time for product development might even fail to notice during development that their product has potential to kill people.

Yes, people who want to make dangerous weapons may fail to notice that their weapons are weapons. That’s a very insightful statement. Brilliant.

You shouldn’t dismiss these concerns so easily. We created 150 million units of phones and people staring at the phone screen watching interesting ads while walking through a dangerous gaps in the sidewalk have been falling to their death because the content on the phone is more interesting to them than the dangeround surroundings they’re walking through.

It’s responsibility of every product developer that these deaths do not occur at all. If they still happen after careful examination of the product safety aspects, extra effort needs to be spent to ensure that users know about limitations of the products they buy.

When the product volumes increase, these deaths are inevitable. They happen for various kinds of reasons and we have darwin award to track their reappearance. But product developers must do everything in their power to prevent such occurances, and withdrawing the whole product from the market is efficient way to tackle that issue, although expensive decision.

MrWilson (profile)

September 30, 2024 at 10:19 pm

Re: Re: Re:²⁰

people staring at the phone screen watching interesting ads while walking through a dangerous gaps in the sidewalk have been falling to their death because the content on the phone is more interesting to them than the dangeround surroundings they’re walking through.

That’s not a smartphone problem. It’s not a new problem at all. People used to do that with Walkmans and portable CD players and Gameboys. People used to do that with newspapers for decades before that. That’s not a product design problem. That also has nothing to do with copyright.

It’s responsibility of every product developer that these deaths do not occur at all.

If that were true then guns couldn’t be legally manufactured, yet there are more guns in the US than people. You have no grasp on reality.

But product developers must do everything in their power to prevent such occurances

That’s not actually the law as it relates to product liability. Add one more area of US law you don’t know anything about.

terop (profile)

October 19, 2024 at 1:04 pm

Re: Re: Re:²¹

That also has nothing to do with copyright.

You’re mistaken. When companies copyright get diluted and losing its power, the money flow from software development activity disappears and the company need to cut corners to keep producing the product. Cheaper parts, less testing, more dangerous product behaviours are guaranteed to cause problems in the marketplace. Money is really what ensures that the products are safe to use and operate. And copyright is all about moving money from the market to the correct entity that did the hard work of product development, instead of giving it to chinese copycats.

terop (profile)

December 17, 2024 at 6:04 pm

Re: Re: Re:¹¹

their use in the above process is illegal without express permission from copyright owners.

No, not at all.

You can consider “public domain” as a permission to use the work. The permission is still required, even if you have magic ways of obtaining it.

terop (profile)

December 17, 2024 at 6:48 pm

Re: Re: Re:¹¹

All works published from 1928 or before are in the public domain in the US.

You gotta be kidding us here. None of the AI companies AI databases survive from public domain -test. The book3 data set was directly downloaded from known piracy sites libgen and z-library.

terop (profile)

December 17, 2024 at 5:56 pm

Re: Re: Re:⁷

This isn’t necessary if the act is fair use.

In current legal practise, a quote with the length of 5 words can be considered at the edge of how far fair use extends. 3 Terabytes of data WILL NEVER QUALIFY AS FAIR USE.

terop (profile)

December 18, 2024 at 3:08 pm

Re: Re: Re:⁷

You don’t understand how the models work so this is just absurd.

so your defense is that the person highlighting problems in the system are just crazy idiot?

shouldn’t you be busy trying to find defenses that protects against defects found instead of trying “proof by authority”…

MrWilson (profile)

September 12, 2024 at 8:02 pm

Re: Re: Re:⁵

1) The AI authors had existing pirate boxes available storing tons of illegal data

This is complete bullshit. You’re asserting copyright violations without evidence based on your previous, completely incorrect, and completely debunked notion that all fair use is a copyright violation. And it’s not the start of the process at all. You’re asserting that image generation is just a scheme by people who intend to violate copyrights to have cover for their copyright violations. Image generation is a process by software engineers (who are far more successful than you). There’s also a techbropreneur aspect to it, but that’s not the entirety of the market for image generation.

2) Their only problem was that putting this data to internet was legally very dangerous

This isn’t a problem because you’ve made this up whole cloth.

3) to solve the problem, they deviced a plan where some algorithm (==denoising) supposedly removes copyright owners ownership rights to the data

This is incorrect, even if we pretend you were accurately describing the process. They didn’t devise a plan to remove copyright ownership rights. The devised a method of generating completely new images. And the image generation doesn’t remove any copyright ownership rights because the original work is not copied in the generation process and the original work continues to exist.

4) Now they’re trying to get the theory that AI stuff is copyright-free and free from copyright owner’s control accepted by the community

Yes, because that’s an accurate description of the result of the process. How dare they accurately describe what’s actually happening!?!

5) and finally they plan to profit from their pirate boxes

The irony is that the “copyright owners” of the most profitable copyrights are themselves “pirates” who have ripped off the work of the actual artists with bad contract terms, fraudulent Hollywood accounting, and actual copyright violations shored up by their deep pockets to pay for lawyers to fuck over the little guy if they happen to get caught.

There are people who are going to profit from AI image generation. Many already have. But you’re complaining about grifters and rent-collecting philistines who have no actual part in the artistic, creative process not getting paid for their grifts.

Basically when courts and the jury evaluate this plausible alternative theory against the reality of AI development, there is enough truth in the story that it gets accepted as what really happened => damage awards will be astronomical…

That you think the truth is essential to how court cases play out just shows you know fuck all about what you’re talking about. Court cases can be decided by bias with a twisting of facts and even intentional ignorance of precedents and caselaw history. You’re writing fan fiction again. It’s not useful.

terop (profile)

September 14, 2024 at 1:32 pm

Re: Re: Re:⁶

This is complete bullshit. You’re asserting copyright violations without evidence based on your previous, completely incorrect, and completely debunked notion that all fair use is a copyright violation.

You only need to evaluate (using AI people’s own image interpolation techniques) how near this alternative illegal theory is from the stuff that the companies are actually doing. This process above has explicitly been decided to choose interpretation of the activity which is explicitly known to be illegal piracy activity. I leave it to the courts to evaluate how near AI’s real activity is from the known piracy process. But I’m afraid that the jury will declare the processes as exact copies of each other, even so much that they’re wondering if AI folks borrowed the techniques directly from pirate sites and are violating copyrights in that process in altertive ways.

MrWilson (profile)

September 14, 2024 at 10:07 pm

Re: Re: Re:⁷

You’ve entered the territory of some unhinged fan fiction. You’re asserting that software engineers are designing how software functions based on a desire to replicate unrelated illegal activity. Did these soft engineers also murder JFK and Jimmy Hoffa? You’re asserting that you know the mindset and intentions of thousands of different people. Did the aliens give you mind-reading technology? You’re incoherent again.

terop (profile)

September 15, 2024 at 6:23 pm

Re: Re: Re:⁸

You’re asserting that software engineers are designing how software functions based on a desire to replicate unrelated illegal activity.

That’s what internet demands from these poor developers. If they have a little lesser stability in their development processes, they will clone all their competitors technology and kill the industries with their own technology + free products.

The only reason i havent got crazy with these internet’s cloning demands is because i learned my development process before internet could mess with it. If you let internet change your development practices, you’ll be sitting in court responsible of large scale copyright/patent infringement. Only way to avoid it is reject the requests and take the risk that you cannot implement as good products as what competitors managed to pull off. But this is exactly what copyright law expects from you, instead of cloning other people’s products, you’ll have something original.

MrWilson (profile)

September 16, 2024 at 8:28 pm

Re: Re: Re:⁹

You continue to fail to learn how the models work and you apparently don’t know the origins of the various technologies and the people who have developed them. The internet has this information available. This level of ignorance you’re displaying just reeks of laziness and confirmation bias.

terop (profile)

September 17, 2024 at 8:12 am

Re: Re: Re:¹⁰

You continue to fail to learn how the models work and you apparently don’t know the origins of the various technologies and the people who have developed them.

none of the above matters one bit, the only important information is whether these people followed the strictest copyright maximalist principles when developing their products. Only through constant maximalist view to world’s events can they extract enough maximalist propaganda from the fabric of the universe to implement products worthy of RIAA’s and MPAA’s praise. Its the listening of every entity on the planet will they learn proper maximalist principles. Just recently we came to a realisation that hdmi cable compability failures are caused by MPAA’s copy protection technology.

MrWilson (profile)

September 17, 2024 at 3:27 pm

Re: Re: Re:¹¹

The irony is that you think you’re being absurd now, but you think your other claims are legitimate, but everything you say is actually absurd. You’re beyond Poe’s Law at this point.

terop (profile)

September 22, 2024 at 3:05 pm

Re: Re: Re:¹²

The irony is that you think you’re being absurd now, but you think your other claims are legitimate, but everything you say is actually absurd.

Why do you think that people who develop products to the market are free from liability when the users misuse the products for illegal operations? Companies usually prepare for this liability by taking insurance that covers the horror involved, assuming the company can find insurance company stupid enough to take the responsibility. But the liability flows from the damage that these products can cause in the marketplace. The liability need to hit the product developers and anyone involved and able to control the misuse, so that there will be incentive to build better products, free from misuse, to the market.

World is a big place and every entity in it deserves a voice in the tables where important product safety decisions are being made.

MrWilson (profile)

September 24, 2024 at 6:59 pm

Re: Re: Re:¹³

You’re conflating different concepts here. Product safety and liability for criminal misuse of a product are two different things. There are different laws and practices that address these different topics. Nobody in their right mind considers the ability of a product to be used for copyright infringement to be a “product safety” issue. You should research (in general also, but) specifically the laws and lawsuits that found that recordable video tapes and cassette tapes are legal.

Start with the Betamax case and the AHRA.

terop (profile)

September 24, 2024 at 10:18 pm

Re: Re: Re:¹⁴

Nobody in their right mind considers the ability of a product to be used for copyright infringement to be a “product safety” issue.

Well, finnish government explicitly considers chinese pirate products as product safety issue, mostly because of illegal/dangerous electricity gadgets.

But even if we considered just the compensation issue of copyrights, it is significant product safety issue if companies who create the products cannot get enough compensation from the markets and need to cut corners/save money by killing product safety features. Especially electric gadgets all need to have significant enough compensation that the companies do not need to use cheaper and more dangerous parts to develop the products.

MrWilson (profile)

September 26, 2024 at 12:36 am

Re: Re: Re:¹⁵

Well, finnish government explicitly considers chinese pirate products as product safety issue, mostly because of illegal/dangerous electricity gadgets.

And that is not copyright-related. Non sequitur #5000!

But even if we considered just the compensation issue of copyrights, it is significant product safety issue if companies who create the products cannot get enough compensation from the markets and need to cut corners/save money by killing product safety features.

We don’t need to consider unrelated topics.

terop (profile)

December 17, 2024 at 5:51 pm

Re: Re: Re:¹⁶

We don’t need to consider unrelated topics.

la la la la… why are you putting fingers to your ears when the facts splash to your face? Is copyright really so terrible when it actually solves one of the biggest issues in product development, i.e. where to get money to implement the law’s requirement for product safety features?

terop (profile)

December 17, 2024 at 6:17 pm

Re: Re: Re:⁴

The process and the result are so far removed that you wouldn’t be able to tell from whence the painting even came without being told.

This isn’t the case with AI. The AI is reproducing exactly the same kind of output than what your inputs are. If your input is png images, you will get an image as AI’s output. So there is no far distance between the input images and output images, and the requirements for fair use’s transformative works do not apply. This kills your whole fair use theory.

You can figure out the similarity of AI’s input and output by examining the TYPES of the programming language constructs needed to store the data. The whole AI’s process can be summarised as [IMAGE] -> IMAGE, which takes multiple images as inputs, and outputs single image as output. While there are math processes like interpolation and neuralnetwork connections inside the function, the process as a whole cannot overcome the legal transformative works requirements.

This same problem was with the process that youtube-dl used to download files from the youtube and publish their .mp4 representation to the users in format which could be freely copied in the internet. While it sounded like useful operation, it needed to bypass youtube’s technological protection measures to download the data, and then IT DID NOT EXECUTE COMPLEX ENOUGH OPERATIONS TO REACH FAIR USE DETERMINATION VIA TRANSFORMATIVE WORKS. For this, there is existing legal paperwork and RIAA’s position is perfectly clear on this issue. Thus if you listen RIAA’s position, you can only conclude that the transformative operations requirement in fair use is stricter than what ordinary byte transforms can do.

terop (profile)

December 18, 2024 at 3:36 pm

Re: Re: Re:⁴

This is all fair use on its own and not related to the training. People download copyrighted images off the internet.

Downloading had bad reputation long before internet browsers were invented. You have to understand that download operation is legally dubious. Browsers can do the download without legal problems only because they invested signficant amount of time and effort to build security system powerful enough to cut out secondary copies and widespread copy operations happening on the downloaded data. It’s browser’s security that saves their legal story.

Now if you “bypass the technical protection measures”, i.e. browser’s security feature by downloading the data via other methods unrelated to browser technology, those legal protections that browser-using populating is enjoying, is no longer available to you and you’ll be a filthy pirate who swaps zip files on the internet illegally and hacks to fbi, nsa, police, national libraries, school computer systems etc. I.e. you’re no longer ordinary citizen that tries to do good work, but you become filthy criminal.

Anonymous Coward

September 9, 2024 at 3:41 am

Re: Re:

It’s also arguable that training on copyrighted works are fair use…

Even though no actual copying takes place? Ion think so. No copying = no potential infringement = no fair use determination needs to be made.

MrWilson (profile)

September 9, 2024 at 11:26 am

Re: Re: Re:

Training does involve a copy. It just isn’t getting copied into the model. The person doing the training is using copyrighted material and that use is likely fair use. You’re right if we’re only talking about the model and not the act of training the model.

Anonymous Coward

September 10, 2024 at 4:09 am

Re: Re: Re:²

Training does involve a copy. It just isn’t getting copied into the model.

And thus no copy is being made, just as AC said.

The person doing the training is using copyrighted material and that use is likely fair use.

Er, no. The copyrighted material in this case is being perceived (read, ‘seen’, etc.), which would be recognized as OK under the First Sale Doctrine if it were done by a human rather than a computer.

You’re right if we’re only talking about the model and not the act of training the model.

Again, no. AC is right in both cases. How do you get so many insightful quotes on this site when you’re so bad at arguing points of law?

MrWilson (profile)

September 10, 2024 at 11:42 am

Re: Re: Re:³

And thus no copy is being made, just as AC said.

You didn’t track the comments. My original comment referred to training specifically: “It’s also arguable that training on copyrighted works are fair use…” To which AC responded “Even though no actual copying takes place?” but copying does happen at the beginning of the process when the human is training the model. The model itself isn’t copying though. The human is copying from other sources in order to have something to train the model on. And that is likely fair use, as I said.

Er, no. The copyrighted material in this case is being perceived (read, ‘seen’, etc.), which would be recognized as OK under the First Sale Doctrine if it were done by a human rather than a computer.

The first sale doctrine wouldn’t apply because we’re not even talking about purchased works. You can download copyrighted material from the internet without a purchase and that’s fair use. You’re adding unnecessary complications if you base it on the doctrine of first sale. That’s more useful if you’re talking about physical copies of works rather than digital copies.

Again, no. AC is right in both cases.

You and the AC are confused about what we’re talking about. Copying does happen during the training by the human because the human has to have a copy of a work to train the model on. That is fair use. The “no copying” part is in the model’s process of denoising noise to render an image via word associations, which is the function of the model rather than its training process.

How do you get so many insightful quotes on this site when you’re so bad at arguing points of law?

We’re not just arguing law. We’re also talking about technical aspects of generative models. But if you want to know why people vote for my comments, you’ll have to ask them.

terop (profile)

December 17, 2024 at 7:20 pm

Re: Re: Re:²

The person doing the training is using copyrighted material and that use is likely fair use.

Why the fuck would this be fair use?

There’s well known precedent of illegal copyright infringement where people tried to execute cleanroom techniques via routing the data through human brain and trying to argue that since it stayed in human brain for 2 seconds, it would be different enough to qualify for fair use, because it would be scrambled…

Our position on scrambling problem is that while human brain is capable of scrambling the data, it only occurs during long amount of time. So if you “remember” techniques from 1980s, then there’s possibility to rely on scrambling defense, but shorter brain visits are just copyright infringements.

People are “free” to rely on their experience gathered through decades of work, but if management gave them “access” to copyrighted work 2 weeks before asking to implement a clone of it, then those defenses would simply fail.

This AI thing is comparable to giving human brain 2 seconds time to scramble the work…. its simply not different enough…

Saturday
12:00	This Week In Techdirt History: May 10th - 16th (0)
Friday
19:39	Developer Promises To Keep Failed Online Game Servers Up: Art Deserves To Be Preserved (2)
15:24	Why The US Can't Adopt Ukraine's Innovative Approach To Unmanned Warfare Systems (14)
13:27	Let’s Help Children, Not Trial Lawyers (6)
11:03	Appeals Court Upholds Block Of ICE's BS 'Seven Day Notice' Detention Center Inspection Policy (3)
10:58	Daily Deal: Babbel Language Learning (All Languages) (0)
09:24	Trump's $10 Billion IRS Lawsuit May Become a $1.7 Billion Slush Fund for MAGA's Self-Proclaimed Victims (1)
05:30	Bari Weiss Let Benjamin Netanyahu Pick His Own Softball Interviewer (11)
Thursday
20:15	HHS Is A Chaos Engine: Marty Makary Out At FDA (8)
15:22	Congress Narrowed The GUARD Act, But Serious Problems Remain (1)

The AI Copyright Hype: Legal Claims That Didn’t Hold Up

from the it's-still-early dept

Part I: What is a motion to dismiss?

Part II: What claims are dismissed so far?

Comments on “The AI Copyright Hype: Legal Claims That Didn’t Hold Up”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Saturday

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More