You keep pretending like not naming a person means that no poor people exist. You do realize that reality doesn’t change based on what a person argues, right? This is the most absurd demand. I’m not going to provide you with a list. I’ve given you an entire category of American citizens that consists of about 300 million people. Do a google search, find references to poor Americans, there’s your name. Look at any public voter roll data in the US. There’s about an 88% chance the names you find match my definition.I don't care about your "reality". I would disregard it as if the judge dismisses a claim due to lack of evidence. Even though this is a casual debate and not a court battle. I still expect the rule of whoever brings up a claim must supply it with evidence. So, dismissed.
check this article about meta’s AI branch to fight publishers about meta leeching their AI data via torrent from pirate sites, and meta lost the fight [URL omitted; Kadrey v. Meta case]@terop It's not completely lost. The summary judgement is still pending, but it's unlikely that Meta will win. Meta's only last bet is claiming torrenting to train AI is fair use, but the judge has expressed doubt on that. I'm also waiting to see the judgement coming out. It would be the first case regarding generative AI training and fair use (and more authoritative than USCO).
it’s because it learned to generate based on its analysis of the training data. Text LLMs predict the next word based on weights and tokens. Image generation works through a similar process with denoising.Weight prediction = expression "fixed in a tangible medium" -> copyright issue. Denoising = a very-aggressive compression technique that does not effect the analysis of copyright. The judge in Andersen v. Stability AI had already disagreed with you. "That these works may be contained in Stable Diffusion as algorithmic or mathematical representations – and are therefore fixed in a different medium than they may have originally been produced in – is not an impediment to the claim at this juncture." (Judge Orrick) https://admin.bakerlaw.com/wp-content/uploads/2024/08/ECF-223-Order-Granting-in-Part-and-Denying-in-Part-Defendants-Motions-to-Dismiss.pdf
LLMs don't contain any images. That's the whole point. The training dataset is not in the model.Because it exists not in the form of images but in the form of mathematical expression! See the quote from the judge again.
do you think you’re “pirating” and violating copyrights when you come to Techdirt and the logo loads in your browser?Straw man.
You literally can’t buy a lot of content that is legally available for free. There’s no purchase button on a Wikimedia page offering a free license on a photograph that the copyright holder has released with a permissive license. That your mind jumps immediately to copyright violations indicates your bias. There’s plenty of legally free content on the internet that nobody can or needs to pay for. Are you like Terop and think the public domain doesn’t exist?No, my question is, why the fuck can't the AI models be trained with only public domain materials. When you suggested many things on the internet can be "legally downloaded for free", you ignored the reality that AI companies train with pirated materials that means copyright infringement! Suggest "legally downloaded for free" content won't help because the AI companies are not accused of this part! Your replies suggests me again and again that training the AI with only public domain content is the only non-infringing way to go! So fuck!
The right to reproduction and the right to derivatives cover AI training. And you dodged my original question of what the fuck is “the right to demand license”.What the fuck is “the right to demand license”?There is no law or court case that has ruled definitively that all LLM training on copyrighted content requires a license, ergo, copyright owners do not possess such a recognized right.
listing actual names would take too longI only required you to list one name. So "too long" is an excuse.
Anyone with a net worth less than a few million is on the list. 88% of the US population is a part of a household whose net worth is less than a million.Why this definition? Why is a household with net worth less an a million can't even buy one work (music, book or movie)? Bullshit. (This definition of "poor people" is made up and not reflecting the actual economic ability to purchase copyright ed works.)
Except the training dataset is not in the generated outputThen why can the AI generate output that is similar to copyright works? By magic? That's common sense. (Remember the example of reproducing the photo of Anne Graham Lotz that USCO has cited.)
you’re pretending the internet doesn’t exist and you’re pretending that it’s illegal to view images on the internet and you’re pretending that LLMs contain all images on the internet and you’re pretending that LLMs can just reproduce all images on the internet.
You're predicting internet users will forget the internet exists and just try to recreate all content with an LLM.Another slip of your mouth. You support piracy period. Before the need to argue whether it's practical to recreate copyrighted content with LLMs, you simply suggest users to download the content from the internet, which implies, you pirate. (Otherwise you would say to buy instead. Some of the creative works are exclusive to physical media, some others allow downloading but are behind a paywall. You are worse than the AI companies trying to defend their training data are "publicly available" content, because you suggested pirate ways.)
There isn’t a license for AI training in most casesSay what? https://authorsguild.org/advocacy/artificial-intelligence/ai-licensing-what-authors-should-know/
There also isn't an established legal right to demand licenses for trainingWhat the fuck is "the right to demand license"?
@terop
Pirates will find a way to insert their bullshit to software projects. Things like requiring software to be open source so that pirate can then modify it and disable all the protections against copyright infringements.Open source software does not need protections against copyright infringements, since it's part of the license to permit distribution almost anywhere and to anyone (except in jurisdictions where an open source license cannot be enforced, which is a rare case). This comment shows your misunderstandings with open source software. Open source software ≠ piracy. And BitTorrent have legal uses. What can make you in trouble is when you permit users to pirate materials through your platform (software) AND you benefit from that illegal uses. Both conditions must satisfy in order for you to become liable. Cases where there is no liability include: (1) You develop an open source BitTorrent client (e.g. Transmission). But you do not profit from the users using it (through subscriptions or other means) nor secondary benefits such as ad revenues. (2) You develop a video game that supports BitTorrent protocol as a way to download game updates, and your video game client does not allow users to torrent arbitrary files on it. (That is, it allows only game updates.) So make it clear. BitTorrent has legal uses.
You've already accused poor people of not existing and being freeloaders (positions which contradict each other). So you hate poor people.I disregard the "poor people" arguments (bullshits) because you cannot name any single person or organization that fits your definition. In other words the "poor people" you mentioned don't exist. And how would it make sense to claim that I "hate" people who don't exist at all. I could compassionate poor people if you can name a single person or organization who is "poor". Otherwise, all of these are bullshit to argue about.
Attribution next to the generated output, you idiot! Say "this AI generated images incorporates materials from [arthor-name] [image-url], which is licensed under CC-BY 4.0"and Stable Diffusion didn’t attribute the source or the original photographer.Should they post the attribution internally in their offices on a bulletin board? On a post-it note next to their computer?
But also, why would they waste so much time and effort to try to use an LLM to replicate copyrighted materials when they could just find it somewhere else and in some cases, legally for free?As I've said, you slipped it out of your mouth. You support piracy, period. You assume every image that AI model has been "trained" with is legally obtained for free (which is not; OpenAI, Meta and Anthropic all use pirated materials), otherwise you would not make that question at all. A person who respects copyright would say, buy the license for AI training. Even on a fair use scenario of, Google Books for example, the result page of Google Books would show the user where to buy the book (e.g. buy from Amazon). You didn't consider that, because you support pirates. And I should stop making you smarter by recognizing your mistakes.
Except my analogy doesn't assume as you do that the intended use is copyright violation. You have a bias that assumes that's the primary use.Well, that's the contributory copyright infringement liability is about (plus vicarious infringement)! If your damn company benefitted from the copyright infringements done by end users, why the fuck can you not be liable? Even when it's not the primary use of the tech. (See also: Types of secondary copyright infringements as listed by Copyright Alliance https://copyrightalliance.org/education/copyright-law-explained/copyright-infringement/secondary-copyright-infringement/) MGM v. Grokster. As Grokster had ruled to be liable, you can't shield ChatGPT or whatever generative AI there from this liability.
Search for all the articles about the tips people are giving each other about using ChatGPT or Claude. They're all saying stuff like, "here's how to be more productive by having ChatGPT write you a task list," "use these five prompts to improve your self-care regimen," "Seven prompts to make you more productive at work."I'm ignoring this argument because it doesn't tell how it is necessary to train the AI model with copyrighted content in the first place. This is distraction.
In the US, most of Picasso’s works are protected under copyright until 2043, which is 70 years after his death. You could have searched for this fact easily, but you chose not to.What about countries whose copyright laws set the lifespan to be 50 years after death (Berne Convention)? If you're advocating for shorter lifespan of copyright, you shouldn't bring this straw man, as you would contradict yourself.
If humans don’t know they’re purchasing LLM-generated content, then the person claiming human authorship and copyright on the content is making a false copyright claim. That would make the competition argument moot.No it doesn't. One of the purpose of copyright is to protect the market from fake imitations. Even when the buyer may know they are fake, because the fake painting or books or music may be sold in a much cheaper price than the genuine one. Creating a huge temptation for buyers to look away from genuine goods. And that is the market competition in the factor 4. Imagine this: You go to a video game store and look for a Nintendo Switch 2 Pro Controller to buy, and there is a illegal clone (from an unnamed manufacturer) with same boxing and priced US$10 cheaper (say, US$75 rather than US$85). Would you not be tempted to buy that cheaper knockoff?
As the article you didn't read notes, the US Copyright Office's opinion is non-binding and isn't law. Find case law that actually supports your position or else your assertions are moot.You cannot cite case laws that support your fair use claims either, why the hell should I listen? Or should I wait until one of the AI copyright cases gets appealed to the Supreme Court and see who ultimately wins?
You quoted the US Copyright Office: “When a model takes the prompt “Ann Graham Lotz”…” That image to which they referred is released under a permissive license. It’s not piracy to find and download it.And you are fuckingly arguing the wrong thing! First, permissive licenses require attribution, and Stable Diffusion didn't attribute the source or the original photographer. Second, people will be tempted to extract copyrighted materials from the AI, like it or not, even when it takes millions of attempts to do it (it's relatively easy to automate generations of a million images these days). Third, you are assuming everything on the internet can be downloaded "for free" when you made that argument. That's why it's your true colors! You advocate piracy, period. Don't refute me on the third point. You just slipped it when you wrote the reply. Not my fault.
@teropTorrenting is just a file transfer protocol. It’s not itself illegal.
User interfaces are needed before your file transfer protocol is useful to anyone and their mother. And there are strict rules that the user interface must not display pirated material in its user interface.Which rule? For what I've seen in the Napster and Grokster cases, I didn't remember there's a rule saying the user interfaces must not display pirate materials. The cases of Napster and Grokster were not that. The problem with both P2P platforms is that the companies making the P2P software benefitted from the illegal copying done by their customers, and that the piracy were the primary use of the P2P software. Of course it doesn't make BitTorrent-as-a-protocol illegal. The real catch of it is, if you're developing a technology that can be used for illegal purposes, make sure you don't contribute to those activities or profit from them (or else you will be liable).
Wait. This post is mine (Explorer09). Due to some mistake in the comment system I posted it as an anonymous coward by accident.
You'd have to argue that people who could otherwise do a google search for the "copied" content and find it easily would rather intentionally download an obsolete LLM model and spend hours and hours trying to get a reproduction of a widely available image that you can download easily from the internet for free.Emphasis added. You showed your true colors! So you advocate piracy after all. All of the "fair use" claims about AI from you are just decoys to conceal this true intention of you!
I think you've lost track of the conversation. I was responding to terop's false claim that to use free software, you still have to get it from an "authorized vendor."I didn't argue out of context. Except that you mistook terop's argument about "authorized vendor". For free software you technically still need to get from what terop called an "authorized vendor" except this "authorized vendor" is "everyone that can distribute this software legally". Note that in jurisdictions where GPL cannot be fully enforced, distributing GPL software would also be illegal. This is the "liberty or death" clause since GPLv2.
This is about free software, not AI.Free software does not always mean public domain. For example, by training AI with GPLed code and release the AI model not under GPL, it's still a copyright violation. Free software doesn't mean an always green light regarding AI training (it's a "mostly free" except when you release it as proprietary or combine with proprietary code).
Precisely speaking, the other way is so called "fair use" defense, but it requires court decisions before you are greenlit. Rather than wasting time arguing whether AI is "fair use", my best way is to wait for court decisions to come up and see you guys lose horribly. And even if the AI companies win the "fair use" defense (in which I highly doubt), there is still DMCA section 1202(b) that require users - including AI companies - to preserve the copyright management information (CMI). That is pending appeal in the Doe v. GitHub case.It’s explicitly stated in the copyright law that 1) some operations are exclusive to author 2) you need explicit license to do those operations 3) only way to properly obtain the license is to find authorized vendor and ask a permission to do the stuff you want to do.Um. You are wrong. Number 3 is not in copyright law at all. Why are you lying?
I do actually. That's why I made that hypothetical example. If you can't even tell what works the parts are from, it can't be derivative because you must have identifiable copyrightable elements to have a derivative work.That's the difficulty of proof on the plaintiffs, but they doesn't mean it's no t infringing. And that's why in the UK the House of Lords is now making a bill that forces transparency on AI companies to disclose all copyrighted data during training. And I would no longer reply with bullshits about AI is not infringement unless the copyright holders can proof it.
If you bought a bunch of books and cut out singular words from each book and pasted them together into a completely different work that said completely different things than the books that the words came from, it would not be derivative because copyright doesn’t protect individual words but rather the expression that consists of more than just singular linguistic parts. If that resulting work qualified as a derivative work, then all written works would be derivative because people learn to write from reading other people’s work. We learn to speak by repeating what other people say. We repeat the phrases that our parents spoke when we were learning to speak.Bullshit. The words are not copyrightable but the specific arrangements of words form creative expressions that are copyrightable! And when the AI "learns" from those specific arrangements it copies the protected expression. USCO report, pp. 47-48:
In providing this analysis, the Office rejects two common arguments about the transformative nature of AI training. As noted above, some argue that the use of copyrighted works to train AI models is inherently transformative because it is not for expressive purposes. We view this argument as mistaken. Language models are trained on examples that are hundreds of thousands of tokens in length, absorbing not just the meaning and parts of speech of words, but how they are selected and arranged at the sentence, paragraph, and document level - the essence of linguistic expression. Image models are trained on curated datasets of aesthetic images because those images lead to aesthetic outputs. Where the resulting model is used to generate expressive content, or potentially reproduce copyrighted expression, the training use cannot be fairly characterized as "non-expressive."Refute this one please. MrWilson
Big Media companies exist by exploiting the creative works of others.I disagree. That's your claim, not mine. Even when Big Media companies do exploit, it's irrelevant to the AI companies who are alleged to "steal" works.
They do need other people's IPs to profit, specifically the IPs they get assigned to them from the actual creators.Technically the IPs produced by their own employees, yes. But not "people from other companies". Trying to argue the definition of "other people" wasn't helpful.
You've actively advocated for copyright maximalist corporations to be able to exploit poor people.F-ck you as I've said there is no "poor people"! This argument is moot and useless.
You cited a video creator claiming how to tell people how to do something. That doesn't prove that it can actually compete or that human audiences will pay for it in favor over human-authored works.Yes. And the only reason that humans would buy AI generated works is when they can't tell it's AI generated. In other words, when the AI can deceive human audience. An also in another words, Turing Test. So your idea would be to let AI flood the book market with AI generated "slop" and force the potential human buyers to participate this giant Turing Test, which I haven't even argued it's all ethical to being with. (An ethical Turing Test requires informed consent, that human participants are aware they are being tested and the content they see during the test may be AI made.)
I googled more examples and found someone who spent 3 hours coming up with a "book" that was 8000 words long.And that supports my position that someone can make a book very quickly with AI, which in turn competes with the book authors that the AI was being trained with. It doesn't have to be a single prompt, but the fact the people can sell AI generated books is sufficient for this claim.
And they had to try millions of times to get something that looked close enough to make the claim.One in a million chance is greater than zero, and that's sufficient for the claim. A purely coincidential resemblance of a copyrighted work would be less than a quintillionth chance to make it. By quintillionth I mean 2^(-64). Or even less, because modern cryptographic hashes has been a least 160 bits long. And the chance of a random monkey making a Shakespeare chapter is much less than that (taking the infinite monkey theorem into account).
If it were reproducing copyrighted works (and perfectly, which isn't even being alleged)Infringement doesn't require perfect reproduction. Imperfect copies can also constitute infringement.
you're just quoting the copyright office report instead of having researched this particular example yourself before citing it, proving that you are only looking for claims you agree with rather than researching all the nuances of the topic.You are the one that should cite the counter-claim, not me. You are more professional than the USCO then show us your papers, then.
The results aren't derivative. They can't be. If you put every famous physical artwork in a giant blender and made a collage out of random, pulped, minute little pieces (which isn't even what an LLM does, but I'm being generous here with the metaphor), you couldn't legally argue that the resulting collage was a derivative work of every single artwork that was pulped to make it.Unfortunately yes. It IS derivative. It is the derivative of "every single work pulped to make it" as you say. At least that's what the copyright holders alleged. You really don't have the idea of how "derivative works" in copyright law works (no pun intended). That's why DJs mashing many popular songs together would get legal trouble if they don't seek bulk licenses first. And this derivative work rule is also critical of how open source licenses like GPL can enforce their "copyleft". Because large software projects like Linux Kernel is technically a large "collage" of many developers' contributions, each being small commits until everything is merged together.
Big media might actually like AI generated content to not be copyrighted because they wouldn’t have to license it to use it in their own copyrighted works.No no no. The Big Media can get away with AI copyright pretty easily by making their own AI models with their own IPs. They just cannot exploit other companies' IPs with their AIs. And the motives for Big Media to use AI is not to "steal" (they hold big IPs already and don't need other people's IPs to profit). It's to cut labor costs by firing minor creative workers in the process.
AI generated art not being able to be copyrighted means it can be integrated and everyone else would have to only extract the AI generated parts but couldn't legally use the copyrighted parts that aren't AI generated, which can be difficult in certain forms of media.I don't see any problem with this.
Because you are assuming I am a "copyright maximalist" and I am saying I'm not, and this is one of the reasons. You are making a wrong assumption that Big Media would want AI generated works to be copyright denied. The reality is actually different, as I mentioned above.Why not? Why the copyright should be denied for AI generated works?Why did you start arguing for a point that wasn’t being argued?
Youtube video "How To Create And Sell E-books Using ChatGPT | How TO Earn Money Using ChatGPT"There are already evidences that ChatGPT generate work that compete on the same market as the book authors,[citation needed]
Page 28 of the USCO report part 3: 'As discussed in the Technological Background, the extent to which models memorize training examples is disputed. When, however, a specific model can generate verbatim or substantially similar copies of a training example, without that expression being provided externally in the form of a prompt or other input, it must exist in some form in the model's weights. When a model takes the prompt "Ann Graham Lotz" and outputs an image that is nearly identical to a portrait found in the training data, the expression in that image clearly comes from the model.' (Emphasis added) Refute this one, please, and stop saying bullshit.AI pre-training involves reproduction.Not in the model, not in the results. You're demonstrating, again, that you don't understand the technology.
It's disgusting for you and Techdirt to keep the attitude of "steal it first, and ask for forgiveness later". That's what the Big Tech companies are thinking. They gamble on they have more money to win in courts than the authors that would sue them. Look, I know of AI uses that could win the fair use arguments, but for many generic, big AI models, they probably won't. There are already evidences that ChatGPT generate work that compete on the same market as the book authors, and that should give you a warning sign when you make your AI application based on that.If those commercial AI models use only licensed data (or public domain data) for training, then we have no problem.Even leaving aside the still undecided question of whether or not training on licensed work infringes, that wouldn't make any AI system "illegal" as the original comment suggested.
An infringing use does not make an entire product illegal.Say that to Napster and Grokster, please. You guys ignoring the important ruling of MGM v. Grokster make me feel you guys are intentionally deceiving the public.
An alternative is that no copyright protected uses are implicated.Even when this is true, how can the AI companies defend for fair use on this?
No reproduction is made, and no derivative works are created. So fair use is a defense, but hardly the only one.AI pre-training involves reproduction. So the first argument is false for generative AI already. The next is fair use, which is the only defense for Meta for its Llama AI models, currently.