The U.S. Copyright Office’s Draft Report On AI Training Errs On Fair Use
from the fair-use-matters dept
Within the next decade, generative AI could join computers and electricity as one of the most transformational technologies in history, with all of the promise and peril that implies. Governments’ responses to GenAI—including new legal precedents—need to thoughtfully address real-world harms without destroying the public benefits GenAI can offer. Unfortunately, the U.S. Copyright Office’s rushed draft report on AI training misses the mark.
The Report Bungles Fair Use
Released amidst a set of controversial job terminations, the Copyright Office’s report covers a wide range of issues with varying degrees of nuance. But on the core legal question—whether using copyrighted works to train GenAI is a fair use—it stumbles badly. The report misapplies long-settled fair use principles and ultimately puts a thumb on the scale in favor of copyright owners at the expense of creativity and innovation.
To work effectively, today’s GenAI systems need to be trained on very large collections of human-created works—probably millions of them. At this scale, locating copyright holders and getting their permission is daunting for even the biggest and wealthiest AI companies, and impossible for smaller competitors. If training makes fair use of copyrighted works, however, then no permission is needed.
Right now, courts are considering dozens of lawsuits that raise the question of fair use for GenAI training. Federal District Judge Vince Chhabria is poised to rule on this question, after hearing oral arguments in Kadrey v. Meta Platforms. The Third Circuit Court of Appeals is expected to consider a similar fair use issue in Thomson Reuters v. Ross Intelligence. Courts are well-equipped to resolve this pivotal issue by applying existing law to specific uses and AI technologies.
Courts Should Reject the Copyright Office’s Fair Use Analysis
The report’s fair use discussion contains some fundamental errors that place a thumb on the scale in favor of rightsholders. Though the report is non-binding, it could influence courts, including in cases like Kadrey, where plaintiffs have already filed a copy of the report and urged the court to defer to its analysis.
Courts need only accept the Copyright Office’s draft conclusions, however, if they are persuasive. They are not.
The Office’s fair use analysis is not one the courts should follow. It repeatedly conflates the use of works for training models—a necessary step in the process of building a GenAI model—with the use of the model to create substantially similar works. It also misapplies basic fair use principles and embraces a novel theory of market harm that has never been endorsed by any court.
The first problem is the Copyright Office’s transformative use analysis. Highly transformative uses—those that serve a different purpose than that of the original work—are very likely to be fair. Courts routinely hold that using copyrighted works to build new software and technology—including search engines, video games, and mobile apps—is a highly transformative use because it serves a new and distinct purpose. Here, the original works were created for various purposes and using them to train large language models is surely very different.
The report attempts to sidestep that conclusion by repeatedly ignoring the actual use in question—training —and focusing instead on how the model may be ultimately used. If the model is ultimately used primarily to create a class of works that are similar to the original works on which it was trained, the Office argues, then the intermediate copying can’t be considered transformative. This fundamentally misunderstands transformative use, which should turn on whether a model itself is a new creation with its own distinct purpose, not whether any of its potential uses might affect demand for a work on which it was trained—a dubious standard that runs contrary to decades of precedent.
The Copyright Office’s transformative use analysis also suggests that the fair use analysis should consider whether works were obtained in “bad faith,” and whether developers respected the right “to control” the use of copyrighted works. But the Supreme Court is skeptical that bad faith has any role to play in the fair use analysis and has made clear that fair use is not a privilege reserved for the well-behaved. And rightsholders don’t have the right to control fair uses—that’s kind of the point.
Finally, the Office adopts a novel and badly misguided theory of “market harm.” Traditionally, the fair use analysis requires courts to consider the effects of the use on the market for the work in question. The Copyright Office suggests instead that courts should consider overall effects of the use of the models to produce generally similar works. By this logic, if a model was trained on a Bridgerton novel—among millions of other works—and was later used by a third party to produce romance novels, that might harm series author Julia Quinn’s bottom line.
This market dilution theory has four fundamental problems. First, like the transformative use analysis, it conflates training with outputs. Second, it’s not supported by any relevant precedent. Third, it’s based entirely on speculation that Bridgerton fans will buy random “romance novels” instead of works produced by a bestselling author they know and love. This relies on breathtaking assumptions that lack evidence, including that all works in the same genre are good substitutes for each other—regardless of their quality, originality, or acclaim. Lastly, even if competition from other, unique works might reduce sales, it isn’t the type of market harm that weighs against fair use.
Nor is lost revenue from licenses for fair uses a type of market harm that the law should recognize. Prioritizing private licensing market “solutions” over user rights would dramatically expand the market power of major media companies and chill the creativity and innovation that copyright is intended to promote. Indeed, the fair use doctrine exists in part to create breathing room for technological innovation, from the phonograph record to the videocassette recorder to the internet itself. Without fair use, crushing copyright liability could stunt the development of AI technology.
We’re still digesting this report, but our initial review suggests that, on balance, the Copyright Office’s approach to fair use for GenAI training isn’t a dispassionate report on how existing copyright law applies to this new and revolutionary technology. It’s a policy judgment about the value of GenAI technology for future creativity, by an office that has no business making new, free-floating policy decisions.
The courts should not follow the Copyright Office’s speculations about GenAI. They should follow precedent.
Reposted from the EFF’s Deeplinks blog.
Filed Under: ai, copyright, copyright office, fair use, market harm, substantially similar, training


Comments on “The U.S. Copyright Office’s Draft Report On AI Training Errs On Fair Use”
The EFF and other tech groups are banging the table and yelling phrases from the same “Revolutionary techology! Democratization! Won’t somebody please think of the Innovation?!?” playbook that they used in embarrassing attempts to legitimize cryptocurrency and NFTs in the eyes of the public. Only this time, the yelling is in defense of these machine models.
Re:
Cryptocurrency were initially a means of preventing governments from controlling currencies. They turned into techbro pump and dump schemes. EFTs have always been a grift. LLMs can actually do some things that humans actually value and benefit from.
Re: Re:
Were they, really? Every alt-scrip scheme is a scam.
Re: Re:
Oh? In what way are electronic funds transfers a grift?
Re: Re: Re:
Doh. I wish Techdirt comments had EFTs—editable fucking typos.
These words indicate a lack of intent to deceive. I like to imagine we all know better by now.
“To work effectively, today’s GenAI systems need to be trained on very large collections of human-created works—probably millions of them. At this scale, locating copyright holders and getting their permission is daunting for even the biggest and wealthiest AI companies, and impossible for smaller competitors.”
If you can’t compensate people adequately for feeding their stuff into your regurgitation engines, you shouldn’t be doing it.
Re:
Fully Agreed. Maybe the best thing that can happen, is for development to slow down, and we can properly assess the risks and benefits of this techology, rather than moving fast and breaking things.
But then orgs like the EFF and Techdirt will take lines from Trump and talk about how we’re ceding the [insert technology here] race to China.
Re:
I’m just going to make up fictional rights and argue for compensation. If you’re reading this sentence, you owe me one million dollars!
Re: Re:
The right to be compensated for your work is pretty fundamental to this whole “capitalism” thing we’ve got going here.
Re: Re: Re:
No, it’s not. Capitalism involves the right to most of the value of someone else’s work if you’re the owner of the means of production. For workers, it means getting fucked. For copyright creators who aren’t big corporations, it means having significantly less income from your work than if only humans could own copyrights.
But even so, that’s still not an articulated right under the law. And it’s uselessly vague. The doctrine of first sale and fair use doctrine in copyright law cover the usage.
Re: Re:
“Pay people? For working? Could you imagine what that would cost? You’d never be able to make money growing cotton! What an imaginary right.”
-You, apparently.
Re: Re: Re:
I didn’t say people shouldn’t be paid for their work. That you have to make up a straw man to argue with me is telling.
You’re making an argument like a record company claiming that format shifting or making a mix tape is copyright infringement because you want to maximize profits.
Do you think libraries are copyright infringement hubs?
Re: Re: Re:2
If you think regurgitation engines are analogous to libraries you’re not worth further conversation with.
Re: Re: Re:3
Wow. Another straw man. You apparently can’t engage with anything I’m actually saying.
Libraries enjoy the same rights based on the first sale doctrine that anyone else does. It was a single example. Why do you think rights should be exclusive to a library and not to a patron of a library? Why should independent researchers trying to cure cancer have to pay millions just to conduct research with the same material the library paid significantly less to access?
Re: Re: Re:2
My frustration with the arguments of people claiming it’s not fair use and that all training must be licensed is that many people seem to think they’re championing the little guy when they’re inadvertently advocating for the benefit of the wealthy and corporations.
First, it is fair use whether you like the idea of corporations profiting off of the largely unpaid work of poor and creative people. That’s just capitalism in general. Argue against that instead of licensing costs for AI training if that’s what you want.
Second, the ability of a large corporation to train AI on publicly available data without paying every single copyright owner is the same ability you have to do the same. It’s the same that an independent university researcher trying to study potential treatments for rare and unprofitable diseases. You’re arguing, ultimately, that only wealthy corporations with large treasure troves of corporate profits should be allowed to build LLMs. You’re opposing the democratization of the technology. It’s like saying you don’t like Microsoft’s business practices so Linux should also be outlawed. Or because a CEO drunk drove in his sports car and killed someone then ambulances should be illegal.
We don’t have individual power to control the the use of AI. It will be used in systems that you will be forced to interact with. It is absolutely important to advocate for regulations and oversight and ethical laws that control how this happens. But arguing for licensed training is ensuring only the wealthiest, most profit-driven organizations will be able to afford to develop AI and that’s who will win government contracts for software systems that will be used against you.
Actual creators, such as myself, will get virtually nothing even if licensing is required. Other corporations hold the rights to the most profitable copyrighted content. You’re just arguing over which big pot of money gets smaller or bigger, neither of which you will ever have access to. And licensed training will do nothing to prevent ethical violations in the use of AI.
Re: Re: Re:2
Because yes it is infringement. I know there are exceptions, which are known as “fair use”, but it is the defendants that have a burden to proof their uses are “fair”. Fair use is never granted as a rule. Exceptions are exceptions.
U.S. Copyright Act, section 108, “Limitations on exclusive rights: Reproduction by libraries and archives”. Please read the law.
Re: Re: Re:3
No, not it’s not. Just saying it is doesn’t make it true.
Ah, yes, the “fair use is only an affirmative defense” lie again. You are wrong. Fair use is written into Section 107 of the Copyright Act. It is the law. It is granted as a rule. It is a limitation on copyrights and it is something copyright owners must consider before claiming violations, before issuing DMCA takedowns, before filing a lawsuit. This is literally stated in many cases where courts chide copyright owners for suing over blatantly obvious fair use scenarios.
You should read the law. Section 107. Also case law, because the law as written isn’t the only functional aspect of the law. You should also look into the Doctrine of First Sale, which also covers what libraries do when they lend out copies, regardless of Section 108. It also covers media rental companies and loaning your vinyl collection to a friend and selling your old 8 track on Ebay.
Re: Re: Re:4
Do you think I had no idea about the Section 107 specifying fair use? You are the one that should read it, because the U.S. Copyright Law doesn’t explicitly say which use in particular is fair and which is not.
Instead, it mandates “four factors” for courts to evaluate what is fair use and what is not. And there had been cases that higher courts rule differently from lower courts even when the mandated criteria are the same.
No. In the U.S., fair use is a defence raised by defendent only. The copyright owner do not have burden of proving fair use. And so your claim of “this can be fair use, you can’t sue me” is blatantly false.
It’s actually “I can sue you, but you tell the judge to dismiss it by convincing the judge it’s fair use”.
Re: Re: Re:5
You apparently don’t understand it. I’m also saying you should read case law as well because the statute isn’t the full law.
It provides four factors which are used to determine which particular uses are fair use.
Four factors are decided by courts because that’s how lawsuits work, but four factor analysis should and can be done by anyone, including users and copyright holders.
Yes, different humans interpret the same thing differently. we learned this in kindergarten.
No, it’s not. Fair use isn’t only a defense. It is a positive, legal use.
Review the Lenz decision from the Ninth Circuit.
Lenz held that copyright holders must do a fair use analysis prior to issuing a DMCA takedown notice.
Of course not. Why would they prove something that goes against their claim? They do have a burden to use a four factor analysis prior to issuing a takedown though.
Quote me where I said you can’t be sued because of fair use. You can be sued for just about anything.
It’s actually “I can sue you, but the judge might rule in your favor and admonish me for not considering fair use in advance.”
Re: Re: Re:6
Good point for citing the Lenz case. I suggest it’s Lenz v. Universal Music you are talking about.
The problem: The Plantiffs have no obligations to prove fair use on behalf of the defendant. The decision was only to address false takedown notices as the judges warn the copyright owners not to abuse it. It had nothing to do with the plantiffs’ ability to sue (for copyright infringment).
The case didn’t say plantiffs have burden of proof on fair use, especially in court. It said that if there is a chance of the use being fair, don’t issue a takedown – because that’s the wrong tool – instead, sue.
By the way, I found several criticisms on the decision on the Web, and it’s worth linking here just for information
(1) https://www.aei.org/technology-and-innovation/intellectual-property/splitting-dancing-baby-9th-circuits-lenz-decision-may-mostly-meaningless/
(2)
https://truthonthemarket.com/2015/09/23/a-takedown-of-common-sense-the-9th-circuit-overturns-the-supreme-court-in-a-transparent-effort-to-gut-the-dmca/
Re: Re: Re:7
I wasn’t going to continue to respond but I’m a sucker for people who keep doubling down on their ignorance with arrogance.
It’s plaintiffs, not plantiffs. Spell check isn’t hard.
Nobody asserted that plaintiffs have any obligation to prove fair use. You’re arguing with a straw man. I said plaintiffs are obligated to consider fair use prior to issuing a DMCA takedown because that’s what the Lenz decision said.
It was a false takedown notice because the use was fair use and the plaintiffs hadn’t considered it first! You either didn’t read it or didn’t understand it.
Nobody said they plaintiffs couldn’t sue. I literally said “Quote me where I said you can’t be sued because of fair use. You can be sued for just about anything.”
You aren’t reading or understanding anything you respond to.
Re: Re: Re:3
This article is specifically about US law, which holds that format shifting of copies of works owned by the format shifter for their personal use is not copyright infringement even if no licensing fees are paid. You were saying?
Re: Re:
There’s no fictional right here. AI “reading” or “training” or whatever you called it involves copying the works in the digital form from one computer memory to another, and that’s the “prima facie” copyright infringement as the law would call it.
Analogizing machine reading things with humans reading is useless in analysing the copyright issue with AI training.
Re: Re: Re:
Hell no. You just declared the most basic function of the Internet to be copyright infringement. You reading these words on your computer would be the result of copyright infringement according to your analysis.
Just saying that without justification doesn’t make it true.
Re: Re: Re:2
“You just declared the most basic function of the Internet to be copyright infringement”.
That was an actual legal discussion at the time and I’m old enough to’ve been around for it. It had to get resolved and we were able to draw a distinction between the necessary processes to transmit and display something as people intended and copying and distributing it in a way that wasn’t.
And then people tried to argue that ‘you can’t copyright a number’ so’s to justify piracy because all data is fundamentally a number, and they got their asses kicked too, because the law is in fact capable of drawing distinctions.
People trying to treat legal concepts like they’re immutable mathematical theorems is a peeve of mine.
Re: Re: Re:3
As am I. Pulling the old person card doesn’t work against other old people.
I’d love to see your case law citations rather than this vague summary based on trust me bro. In many cases, it didn’t get resolved because some cases were settled and other cases were only addressed in a district court and not taken to the Supreme Court and actually decided as a precedent. In many cases, those making bullshit copyright claims simply changed tactics because it wasn’t profitable to continue suing children.
You appear to be misremembering the DeCSS case. Separately, you actually can’t copyright a number.
Re: Re: Re:4
“Separately, you actually can’t copyright a number.”
And yet, digital works are copyrighted. Because laws can draw distinctions that apparently completely elude people who think they’ve found One Weird Trick.
“I’d love to see your case law citations rather than this vague summary based on trust me bro.”
Direct from statute:
“Temporary Reproductions for Technological Processes
30.71 It is not an infringement of copyright to make a reproduction of a work or other subject-matter if
Copyright Act (R.S.C., 1985, c. C-42)
Re: Re: Re:5
You missed the point. You claimed a history and I’m asking for citations of that history. Which cases?
Your statutory citation doesn’t prove or explain the claim that “It had to get resolved and we were able to draw a distinction between the necessary processes to transmit and display something as people intended and copying and distributing it in a way that wasn’t.”
Re: Re: Re:6
A direct statutory citation of the statute that unambiguously resolved the issue isn’t enough for you to grasp that there was an issue to be resolved and that without that statute ‘the most basic functions of the internet’ would be copyright infringement?
Like, that’s why that statute was passed. You can look it up yourself. I’m not a big fan of the “You have no proof” brings proof “Not that kind of proof!” dance; I’ve seen it far too often for it to appeal to novelty.
Re: Re: Re:7
I didn’t even google it before because I was looking for case law, not statutes, but I just noticed that your statute is Canadian law, not US law.
I’m not a big fan of the “I have proof!” “What kind of proof is it?” “proof in a different country” dance. I haven’t seen it often, but it’s both hilarious and exhausting.
Once was perhaps forgivable. But citing Canadian law twice in an article explicitly about US law is beyond sloppy. This undermines everything you can say on the topic.
Re: Re: Re:4
Nope. It’s numbers that lack human originality that are uncopyrightable. Numbers generated in a mostly random manner are uncopyrightable (such as, crypto hashes and encryption/decryption keys). But number representing an ASCII encoded chapter of a Harry Potter fiction is copyrightable.
Re: Re: Re:5
Show me a copyright registration that includes an ASCII encoded numeric representation of a Harry Potter book chapter.
Re: Re: Re:6
Copyright vests whether ‘registered’ or not. Registered copyright used to be a thing, but it was replaced by automatic copyright a long time ago and for good reasons.
More to the point, for digital stuff:
“3 (1) For the purposes of this Act, copyright, in relation to a work, means the sole right to produce or reproduce the work or any substantial part thereof in any material form whatever”
Copyright Act (R.S.C., 1985, c. C-42)
ASCII encoding is a material form. If you think you can distribute Harry Potter online by virtue of it being ‘a number’, feel free to test your sovereign-citizen style theory in a court.
Re: Re: Re:7
I wish people would research before pretending to correct others.
Registered copyright is still a thing. You must register a copyright in order to get statutory damages and attorney’s fees. Large corporations register their copyrights.
Since we’re discussing Harry Potter, here’s the search page for copyright registrations:
https://cocatalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First
Find me that copyrighted number.
I never said you could distribute Harry Potter online as a number legally. I said you can’t copyright a number, which you can’t. Just because you can convert data into numbers doesn’t make numbers copyrightable.
You’re also ignoring fair use. Depending on how you use the numbers, it could very well pass the four factors test.
But beyond all that, you need to cite the case law. The statute is interpreted by the courts. What have the courts said?
Re: Re: Re:8
Idiot. If that registration page shows which “number” (here I mean the full text of the novel, not the registration number) is copyrighted, then you can copy it fully, defeating the purpose of copyright which is meant to prevent you from copying the whole thing.
You surely had no idea about number being copyrightable. I suggest that you read the Wikipedia article about “infinite monkey theorem” and correct yourself.
Re: Re: Re:9
These responses just keep getting dumber.
The point is that you can’t find a registration for it BECAUSE YOU CAN’T COPYRIGHT A NUMBER!!! And just because you’re going to continue to be obtuse, I’ll also play along with your literal interpretation and say that even if you could copyright a number, you wouldn’t have to list then entire number in the registration. For instance, we generally abbreviate pi as 3.14 or 3.14159… The abbreviation could be sufficient for registration. So you’re even wrong while being very wrong.
The purpose of copyright isn’t to prevent your from copying a text. Copyright is meant to secure for a limited time the right for creators to control their work. You can own the copyright on a work and release it under a permissive license that allows someone to copy the entirety of the text. Copyright doesn’t prevent that. It just empowers the creator or their assignees to determine in what circumstances it can be copied, with the exception of scenarios where their rights of control are limited, such as in the case of fair use.
You are admitting here that you don’t even understand what copyright is. If your bootlicking of wealthy corporations, lack of understanding of the doctrine of first sale, fair use, and roughly every other issue relating to copyright weren’t already disqualifying, this alone would render your assertions laughably dismissible.
I surely did and still do know that you can’t copyright a number.
Why would I need to read an article about a concept I’m already familiar with and that has no bearing what the topic? Randomization has not been shown to produce the exact character patterns of large text works such as Hamlet. The theorem is stipulated on having infinite time, which we do not have. Non sequitur!
Re: Re: Re:10
Do you have any idea about what you are saying. In the registration website you linked, I can find about 25 entries when I search the title “Harry Potter” there.
And why the fuck must I release my copyrighted work in a permissive license?
You are now making an assumption that not every creator would agree: That the works must be released for free.
Look, even Techdirt didn’t promote that idea. Techdirt encourage voluntarily releasing content for free that can make fans wanting to fund new content in a new kind of business model, and yet it was never mandatory to release content for free.
First sale doctrine doesn’t cover the reproduction (i.e. copying) of the work even if you own a legal copy of it.
Fair use is a court ruling, not automatic when you say it is fair use.
Well, it’s not about infinite time. It’s about your denial of the fact that every digitized data is a number. You wanna proof it?
hexdump some_random_document.pdfand see it!Re: Re: Re:11
AND NONE OF THEM ARE A REGISTRATION FOR A NUMBER THAT REPRESENTS THE TEXT!!! This ignorance seems willful at this point. At least, that’s the most graceful interpretation.
You don’t have to. I didn’t suggest you did. I just pointed out that copyright isn’t what you think it is. You seem to have a very myopic perspective on a complex topic. You seem to think copyright is just a security to make sure you make money off of your content. That’s not what it actually is.
I did not say that. Again, you are not reading or understanding what I am writing. I am correcting your misunderstandings and you’re just misunderstanding more.
Nobody said it should be mandatory to release content for free! Quote me where I claimed that. Except don’t bother because you can’t because I didn’t.
I didn’t say it was automatic when you say it is. It is fair use whether you say it is or not when it is actually fair use and copyright owners have an obligation to consider it before taking action. Yes, you can just sue, but you can also lose and have to pay legal fees for your loss. That doesn’t mean court is the only place fair use comes up.
Yes, actually the infinite monkey theorem is about infinite time. You literally referenced it and suggested I should look it up, but you’re demonstrating you don’t even understand it…which, at this point, seems expected.
I have never denied that data can be depicted as a number. I have only ever pointed out, correctly, that you cannot copyright a number as a creative work. Copyrighted works require human authorship and creativity. Converting text into numerical data is not creative. Nobody writes whole novels in binary.
You keep trying to fight straw men so hard. Are you a crow or something?
Re: Re: Re:8
“Registered copyright is still a thing. You must register a copyright in order to get statutory damages and attorney’s fees.”
That is incorrect.
“Remedies
Civil Remedies
Infringement of Copyright and Moral Rights
Turning to the question of ‘but you can’t find a number in a registered copyright database’, you’re missing the point. All digital works are necessarily numbers, but they can be different numbers depending on encoding and etc. The intent of the copyright law isn’t that you can copyright the number but that you can’t copy the work that number represent. Thus what you’d find in a list of registered works are works.
Which is why the argument that ‘well since numbers can’t be be copyrighted, I can copy this number as much as I want, and if that number happens to be a way to represent this copyrighted work, I’ve found the One Neat Trick to not be hit for copyright infringement’ was so bloody stupid.
You’re really quick to accuse people of not understanding stuff, Mr. Regurgitation Engines Are Like A Library. Maybe slow down and try reading.
Re: Re: Re:9
You do realize that you quoted Canadian, not US law, right? This is the greatest own-goal I’ve seen in a long while.
And just so you don’t waste time trying to find something to support your claim in US law, here’s the relevant statute:
17 U.S. Code § 412 – Registration as prerequisite to certain remedies for infringement
In any action under this title, other than an action brought for a violation of the rights of the author under section 106A(a), an action for infringement of the copyright of a work that has been preregistered under section 408(f) before the commencement of the infringement and that has an effective date of registration not later than the earlier of 3 months after the first publication of the work or 1 month after the copyright owner has learned of the infringement, or an action instituted under section 411(c), no award of statutory damages or of attorney’s fees, as provided by sections 504 and 505, shall be made for—
(1)any infringement of copyright in an unpublished work commenced before the effective date of its registration; or
(2)any infringement of copyright commenced after first publication of the work and before the effective date of its registration, unless such registration is made within three months after the first publication of the work.
That’s exactly what I’m saying. Encoding is not creative effort. It is not able to be copyrighted. You seem to be missing the point.
And I never supported that argument, so I don’t why you’re arguing with that straw man.
Yes, because you didn’t understand at all. You didn’t even understand that you were quoting Canadian law when you thought you had a one-up on me.
And you prove you don’t understand because I’ve never said that an LLM is like a library. I said reading is like reading. A library is an example of where you can read. It was a pretty straightforward analogy. That you misunderstood it, along with the other points, is a pattern.
Sage advice. Heal thyself, doctor!
Re: Re: Re:10
Let me rephrase that anonymous user’s comment a bit:
Copyright protects not the numbers themselves, but what the number represent as creative works. This in turn also makes that number protected. The claim about “numbers not being copyrightable” is an oversimplification of the fact in a way that becomes misleading and useless in copyright debates.
If you still didn’t get it, you can try test out your theory in court and see who wins.
Re: Re: Re:11
The thing that the number represents is the copyrighted work. The number is just an incidental result of encoding by a particular method.
“Protected” is doing a lot of heavy lifting in this sentence. It doesn’t mean the number is copyrighted. If you randomly wrote a string of numbers and posted it and it just so happened in a random encryption scheme to correspond to a copyrighted work, that doesn’t mean that posting that number would be a copyright violation.
It’s an accurate statement. The irony is that this was the AC’s own straw man they made up. I never even endorsed it. I’m just arguing with the finer points because it shows lack of knowledge on the topic.
It’s not my theory! Reread the thread. I didn’t bring this up at all.
You’re so desperate to argue and assume I’m wrong about something that you’re arguing with other people’s straw men instead of just the ones you’ve invented. You are desperate.
Re: Re: Re:10
I am well aware I quoted Canadian law. I’ve been quoting Canadian law the whole time, because a. I’m Canadian, b. it’s actually drafted clearly unlike the godawful U.S. equivalents, and c. it has been a test of whether you are actually at all interested in the citations you kept demanding.
Which you’ve spent a whole-ass week failing. Thanks for playing.
Re: Re: Re:11
Aha! My ineptitude was just a ruse, by george, and you fell for it! I really intended on arguing about things I knew were completely different to prove the point that you weren’t interested in the things you didn’t ask for!
This is rule of goats territory. It doesn’t matter why you did or why you pretend you did it. You have undermined all your arguments. They are useless here. That’s not a gotcha.
But also, if you’ll notice, I asked for citations of a type and you responded with a different type.
For example:
I said: “Show me a copyright registration that includes an ASCII encoded numeric representation of a Harry Potter book chapter.”
You quoted a statute. Even if you had quoted a US law statute, it would still not be a copyright registration that includes an ASCII encoded numeric representation of a Harry Potter book chapter.
So even your excuse doesn’t hold water.
Thanks for playing indeed. You’ve spent a whole-ass week being provably wrong. I guess that’s worth something to you?
Re:
How much you want for me reading your comment? Cuz you ain’t gettin’ it.
How much you want for an LLM reading it? Because guess what.
[In no way do i endorse the hot garbage that is commercial “AI”, and as to “AI” being as important and central to life as electricity, fuck off, dime store Kurzweils.]
“at the expense of creativity and innovation?”
Yeah, because nothing says creativity like thirty million shrimp Jesus flight attendants.
“It would be hard” is not a factor in fair use. (Never minding that licensing companies/solutions are popping up)
It pretty explicitly considers them separately. To quote:
The use of a work in initial pre-training, for instance, may be distinct from its use in subsequent training or RAG. A number of commenters opined that the fair use analysis requires treating these different uses separately. Similarly Because generative AI models may simultaneously serve transformative and non- transformative purposes,264 restrictions on their outputs can shape the assessment of the purpose and character of the use. As well as some uses of copyrighted works for generative AI training will qualify as fair use, and some will not. On one end of the spectrum, uses for purposes of noncommercial research or analysis that do not enable portions of the works to be reproduced in the outputs are likely to be fair. etc.
The report explicitly covers this. One might argue that although copyright
owners do not have a right to charge for fair uses as such, they do have a right to charge for access to their works. Which is true. While an author can’t control how you use their book, you do generally have to actually buy the book (or go to a library or equivalent).
There are literally already examples of people turning to AI work. This is also a bit of a red-herring, by only considering existing fans (or bestselling authors- people who aren’t bestsellers are also covered by copyright). New potential fans would have no reason to know/love/be loyal that author. (It also affects things like licensing). The only speculation is how far it will go, which will be highly dependent on how good it gets as it continues to improve.
The situation is literally unprecedented. While it’s nice to have precedent to look back at, novel situations do sometimes require new precedent. As the report notes, it does get back to the fundamental reasons why copyright law exists in the first place.
This is a massive strawman. It’s not claiming that all works are substitutes.
[a]dvise Congress on national and international issues relating to copyright,”2
However, we are equally as interested in what the law should be in the future.
Love that the Electronic Freedom Foundation is choosing to have conniptions about hypothetical threats to creativity, innovation and freedom in order to defend technology and corporations that are currently causing actual harms to people’s ability to use the internet to communicate freely.
This is also causing zero-click searches which don’t profit any content creators.
Content quality will suffer and we’ll return to a patronage model.
This suggests that the trained model itself is the thing of value — but that is demonstrably not true. No one wants the model — the trillions of parameters and weights are not meaningful or useful by themselves. What they want is the ability to produce outputs from that model.
So I might grant you that the training is fair use, but the moment you use the model, it sure feels like that’s infringement.
Re:
The model would have been useful if it were an aggregation of all human knowledge in the way that Wikipedia has been making. And yet no AI companies insist their models are such a thing. I would have grant them fair use if those AI companies really made the models for the benefits of the general public, but it’s not the case. The Big Tech as we see it made AI for private profit. They are not charity, they are not nonprofits that can argue about fair use in this manner.
These CEOs ought to be in prison
Aaron Schwarz died in prison for “stealing” tens of gigabytes of copyrighted data to give it freely to the world for the benefit of us all. These executives “steal” petabytes purely to fill their own bulging wallets and rather than throwing them in prison everyone is bending over backwards to justify and excuse it!
CEOs are not gods. They do not deserve such worship.
EFF on fair use, they are wrong this time
This is one of the times I disagree with EFF with the fair use argument, seriously. The generative AI isn’t just a “innovative” tech but also a tech that exploits creative labor of other people. The evidences are clear enough that Meta torrented books (read: pirated) to train their Llama AI models, and EFF can’t defend anything on why Meta couldn’t just buy legal copies of books and train the AI with just those. That’s one of the failures in EFF’s argument.
I do agree that BitTorrent and other P2P platforms should not be illegal by themselves (as EFF argued), but as Meta used those pirated content in commercial applications, that is a strong factor to rule against fair use for AI training, despite many AI advocates suggest the opposite.
And note that the arguments of generative AI could promote free speech for minor groups of people are nonsense. AI generated content is not protected speech in many countries that deal with issues with AI (US and EU included). Therefore it could be legal for web platforms to force disclosure of whether content is AI generated from users uploading it and not violate, e.g. first amendment in the US.
As for generative AI training (the main topic here), what EFF argued about being fair use largely ignored the rulings of Warhol Foundation v. Goldsmith. The EFF suggested decoupling on the fair use analysis of AI training from the “ultimate use” of AI is erroneous. Warhol rulings is the opposite of what EFF is suggesting right now and EFF didn’t learn.
Yes, inevitably this would be a corporations vs. corporations fight. Specifically, Big Hollywood vs. Big Tech. And like it or not, I have to pick a side.
And I admit I picked the Hollywood side, not because I like them or they are all moral, but they simply respect creative workers and their unions, while the Big Tech ignored them mostly.
Warhol v. Goldsmith can question your claim about fair use, and before you reply, let me tell you I have read all the amici curiae briefs in the Kadrey v. Meta regarding the summary judgement motion. In other words, I know all arguments of both sides.
Nay. The fair use analyses in US courts don’t work that way.
(1) University researchers doing on copyrighted data would more likely find fair use than for-profit corporations, as university researches serve more purposes, including profit and non-profit (educational) ones.
(2) An AI on researching medical treatments are very unlikely to need data about fiction books, music and
visual arts. Yet the general-purpose AIs like ChatGPT are train with those artistic works without justification. Saying that it is fair use undermines common sense.
This argument is flawed for two reasons. (1) It is Big Tech we are fighting and they are already “wealthy corporations with large treasure troves of corporate profits” and yet they don’t pay content creators a single penny. You assume only Hollywood that is wealthy, which is far away from reality. (2) There is no “democratization” at all with AI. You still have Big Tech monopoly. Like it or not, and the rest of the “democratization” argument is straw man.
Linux didn’t copy creatives works of others without consent. False analogy.
Does the Trump administration ever want to regulate AI when they are trying to pass a bill that restricts states from regulating AI for 10 years?
At least they would no longer train on your work without consent.
Re:
You don’t.
This is a case where both sides deserve to lose.
Re:
Hollywood is movies. We’re talking about content/media companies that are more than just film. It’s film, music, video games, books, podcasts, audiobooks, et al.
Sounds like you have a personal bias that clouds your judgement. Also, you absolutely don’t have to pick either of those sides. That’s a false dilemma. If you pick either of those sides, they will both continue to win and you will always lose.
Holy fuck! Have you been awake for the last thirty years? Writers’ strike mean anything to you?
Hollywood is the industry that has violated and exploited my copyrights as a creator more than anyone else. You sound like you’re saying you like one abusive boyfriend more than another because your preferred boyfriend beats you less often. That’s fucked up.
No, not really.
Oh, this is definitely going to be a non sequitur…
“Knowing” all argument of both sides doesn’t mean you’re right. Also, there are more sides than just the amicus briefs.
You missed the point of the statement. Rights are universal. Meta’s right to moderate its website is the same right I enjoy to moderate my website. If it’s fair use for me to copy public data (e.g. use a web browser), it’s fair use for a corporation and vice versa. And that act is functionally the same as a deaf person downloading the same data and having a screen reader read it out loud or a blind person using a device to convert it to braille or a person with ADHD copying a long article and having an LLM summarize it for them or a person with a good memory being able to read and recite an entire work.
Except university researchers have smaller legal funds to defend against corporate lawsuits, so this analysis is useless in the face of punishment-by-lawsuits. Also, some corporate LLMs can be used by researchers, so the issue is more gray than you’re pretending. Just look at the case law surrounding copyright issues with academic journals.
Except the arguments against corporations training data isn’t against the purpose but rather the process. The arguments include the claim that the copying is necessarily copyright infringement, so it doesn’t matter what the content is that is being copied and it doesn’t matter what purpose it is being copied for. You’re saying there are exceptions, but I’m specifically critiquing the people who aren’t allowing for any exceptions in their analyses.
Saying it’s not fair use undermines the law.
Common sense is often claimed by myopic people who thinks everyone does or should think as they do.
I don’t assume only Hollywood is wealthy. Not sure where that came from. You’re also not only fighting big tech. You’re fighting anyone who would make the same fair use argument, which includes researchers, libraries, college kids, and precocious middle schoolers playing with technology in their mom’s basement. You’re using a nuclear argument to eliminate an enemy city but also every member of innocent wildlife living in the adjacent rural areas. Big tech will be fine regardless of how the lawsuits play out. The little guys will not. Research will suffer. Some kid is going to get sued for training an LLM the way his dad was sued for downloading MP3s on Limewire. Poor people will be further crushed because you’re handing corporations a cudgel.
That just shows you don’t know anything about the AI field. There are a lot of open source and independent projects. There are people training specifically anti-corporate LLMs and public advocacy LLMs. That you are proudly ignorant of them speaks to the limited basis of your argument.
First, you don’t seem to understand what analogies are. If analogues were exactly the same as what they’re compared to, they wouldn’t need to be analogized. Second, fair use doesn’t require consent, so this argument is pointless. But also, the fact that Linux didn’t improperly copy the works of others (hello SCO Group!) is why the analogy is meaningful. I’m saying you’re targeting innocent people because you are blindly attacking them while thinking you’re attacking large corporations.
I would never suggest that the Trump Administration would ever produce anything ethical. Also, the Trump Administration isn’t the legislature actually responsible for passing laws.
My copyrighted works get infringed all the time. LLM training isn’t my problem. In ten years when a middle schooler asks their AI teacher about a topic I’ve produced content in relation to, I’d love for my contributions to the field to come up rather than be completely forgotten because I was butthurt I didn’t get a five dollar settlement from a useless lawsuit that only made corporations and lawyers wealthier.
You seem ignorant about the inevitable nature of this. Read op-eds about smart phones from twenty years ago or rants about the internet or the telephone or the newspaper and how they were going to ruin civilization. It’s all just tools. The wealthy will always use available tools to make themselves wealthy. You can’t stop that. You can avail yourself of your own tools though. Killing your own tools to spite the wealthy who won’t feel the minor prick is shortsighted and arrogantly stupid.
Re: Re:
Unfortunately that’s the situation. It’s indeed f__ked up whether you like it.
That’s the big IF there. I can explain the case of Warhol Fund. v. Goldsmith if you are willing to listen. Your fair use assumption is no longer holding after the Supreme court decision of that case.
Copyright groups already call this “Data Laundering” and specifically oppose this. It’s not my position. Yet I agree with their reasoning. The academics can make their own LLMs from scratch that are transparent in which data they have been trained with credit the copyright owners appropriately. No justification to use a commercial LLM that might be illegal in the first place to do it.
Fair use depends on the ultimate purposes of the AI models and it would not be “all yes” or “all no”. My position on this is same as USCO. It’s dangerous to greenlight all of them because some of the training are really unethical to begin with.
Why not? Nuke all LLMs until one is built up that respect people’s rights. What’s the problem with that?
Because LLMs is quite a new tech, I don’t see any problem of forcing everyone to the pre-LLM ways of working and lifestyle. It’s your issue of making the unethical tech your life.
Unfounded argument.
In other words, the Napster era.
I do not share compassion of people pirating music even though the tech like Limewire had other legal uses. This argument is exactly the Napster case. It’s not the VCR (Betamax) case. And I had enough of such arguments that ignored the court rulings. Sorry, don’t try to change my mind.
The software being open source is independent of the data model being legal.
I don’t advocate for blocking them. Why do you think that I do? As long as the models are all trained from scratch.
Saying you can buy a commercial pre-trained model (which might be copyright infringement in the first place) and augment that model and claim it’s all yours is dishonest. I specifically condemn this kind of lying.
The SCO v. IBM case, no?
That case didn’t rule anything about fair use according to Wikipedia. The question was whether SCO is entitled to sue Linux as the copyright claim from SCO over so called Unix code was unclear. Nothing to do with AI training or fair use.
You have no idea what kind of rights we are fighting for, for you. You are so “innocent” that you got tricked to throw aways rights that should be yours to hold. Pathetic.
Feel free to license you works under a free license such as CC-BY. There’s nothing stopping you from doing to. Also you can give explicit permission for AI training if you want to. You are simply not forced to do it.
Off-topic. But look at the issues of young children gotten addicted with phones and internet in general. And the modern landscape of internet is loaded with tons of misinformation and disinformation, the fear of smartphone ruining the civilization isn’t without merits.
Re: Re: Re:
Except that’s not the situation. You don’t have to side with any wealthy people at all. That you think you do is what’s fucked up.
Except Warhol is about a photograph, not text. Warhol is about licensing a derivative work, not LLMs producing non-derivative works. Warhol is about two entities attempting to use the same content for the same commercial purpose. If you can cite an author who wrote their text with the intent to train an LLM, I’m interested in seeing that.
That doesn’t make you or them correct. The training is fair use regardless of who does it. Making up a term for it doesn’t make it illegal.
Not according to the people whose arguments I’m referring to. They are claiming that all training is a de facto copyright violation. If you’re not arguing that, then there’s no reason for you to respond to my post.
No, that’s only one of the four factors. You’re being myopic.
That’s for a court to decide.
Unethical isn’t a legal argument. We can talk about ethics all day but it has little to do with our laws or government. We live in an unethical system where laws are bought by corporations. If you’re only making an ethical argument, then the nature of copyright laws are themselves unethical and the whole discussion is moot.
Nuking all LLMs means there’s no until. If you successfully argue that LLMs require paid licensing for training, then no ethical LLMs can ever be trained by anyone because only unethical corporations will be able to afford the entry fee of licensing costs.
Except it’s not me at all. I’m not even using LLMs much. I’ve played with them to see what they’re capable and been encouraged by their failures in performance to provide me with the confidence that they won’t as yet be able to replicate my creativity. I’m not afraid to test them to find out their limitations and there are many limitations to document. But that doesn’t mean that an LLMs can’t improve our collective performance in areas we actually value. I’m saying ignore the cute but absurd LLMs generating twelve finger art and look at the progress of the LLMs diagnosing medical patients better than experienced doctors. You’re obsessed with getting one over on Big Hollywood so much that you’re willing to fuck over a cancer patient.
[citation needed]
Yes, an era where big corporations fucked over poor people. Why are you cheering on a return to that?
You don’t share compassion with poor people who can’t afford to purchase overpriced luxury commodities in a capitalist system that undervalues worker contributions such that they resort to illegal means that then subject them to disproportionate penalties based entirely on legislation passed by corrupt legislators bribed by the very democracy-undermining corporations who benefit from said lawsuits? Color me completely surprised and definitely sarcastic in this particular sentence. I feel like this says everything I’ve been trying to say. You don’t mind the little guy getting crushed because you’ll defend to the death corrupt corporate-bribed laws that magically change when they want another unethical payday for work they never performed themselves.
The open source projects benefit from the same fair use analysis. LLMs are significantly less useful without large data sets. Independent development will never be able to afford arbitrary licensing fees.
What the fuck does “trained from scratch” mean? Are you suggesting that an LLM only be trained from the 40,000 words a researcher uses to define parameters? Do you not understand at all the technology we’re talking about?
Straw man. Who said anything like this?
So you’re saying you weren’t previously aware that SCO was claiming that Linux did in fact contain “stolen” code from Unix? So you’re saying you don’t understand the point and your analysis is limited by your limited understanding? I’ll accept that.
You don’t get to represent me without my consent. The sheer arrogance of this statement is absurd. I’m not throwing away any rights I’m able to retain.
I have already licensed many of my works using a Creative Commons Attribution license. I love that you think you’re educating me on this topic. A CC-BY license doesn’t mean your works aren’t subject to copyright violations. If there’s no attribution, it’s a violation. Also, big media corporations don’t care about that. I have first hand experience.
Not off-topic at all. Highly relevant. There’s a moral panic with every new technology that pops up. Everything gets labeled as bad for children. I learned math from playing DnD as a kid despite being told it would cost me my soul. I learned programming from editing computer games as a teenager despite being told video games would rot my brain. Telephones meant people would no longer visit each other in person and we’d all become disconnected entirely. Newspapers meant people would read in public instead of engaging socially with the people around them. Clutch those pearls!
This isn’t a technology issue. Before phones it was video games. Before video games it was TV. Before TV it was radio. Before radio, it was marbles and jacks. Before marbles and jacks, it was billiards and pool and that rhymes with T and that stands for trouble right here in River City. Anyone, especially kids, looking for a distraction or an obsession, will find it, regardless of whether it’s a complex machine or a stick that they can pretend is a sword or a gun. That you think phones and the internet are the problem rather than the symptom is, again, telling about your own myopic perspective.
You should be more concerned with the lack of education on how to be skeptical of uncited claims (including yours). There will always be mis- and disinformation regardless of the medium of communication. Critical thinking and analysis is important, but you’re advocating for a dumbed down logic of “copying bad.” Human behavior will always be the problem regardless of the medium of communication or any tools or technology. You’re abetting human behavior by pretending technology is the problem.
Re: Re: Re:2
The assumption of yours is LLM not being a derivative. I disagree. And yet there’s no court ruling on that yet. We can wait on how the AI copyright lawsuits go.
Court case citation needed. (I have cited mine, this is your burden of proof.)
Look at Fairly Trained, an organisation proving the opposite of your claim.
What the heck does this have anything to do with AI? The AI companies are not poor people. They made millions or billions of dollars exploiting the creative works of others.
The first of what you said is disputed. The second, as I replied in another post, fair use is a bad solution for AI startups because it still exploits works of artists, big or small.
What you were advocating is the legalisation of exploitation by Big Tech companies under the shields of small companies. You don’t know what you are defending against.
There is an initial state for the neural network parameters before the network start getting fed with human content for the parameters to automatically adjust themselves. That is what they called “pre-training” and I knew that pretty well. When Adobe can train a model from scratch, without using a pre-trained model from others (which might be accused of copyright infringement), there is no reason small companies can’t do it.
What does this anything to with AI? Hell, if this case matter at all, the AI companies would have been citing it as a defense. This SCO v. IBM has even lower importance than, say, Google v. Oracle (the accusation of Google copying Java code in its Android operating system).
I say pathetic. You don’t know we are fighting for your rights to and you mistook us as enemies out of that ignorance.
Re: Re: Re:3
You literally just said there wasn’t a case yet, so why are you asking for one?
Except they’re not. It’s run by a guy who thinks training is stealing. They claim to certify ethical AI training. That’s just a claim. I can form an organization that claims the opposite. That doesn’t make the claim true.
Also, organization is typically spelled with a Z in the US. Are you an American?
You don’t share compassion with poor people who can’t afford to purchase overpriced luxury commodities in a capitalist system that undervalues worker contributions such that they resort to illegal means that then subject them to disproportionate penalties based entirely on legislation passed by corrupt legislators bribed by the very democracy-undermining corporations who benefit from said lawsuits?
You’re admitting here that you haven’t understood anything about my position. This has everything to do with AI.
Literally the first thing I said to which you responded was: “My frustration with the arguments of people claiming it’s not fair use and that all training must be licensed is that many people seem to think they’re championing the little guy when they’re inadvertently advocating for the benefit of the wealthy and corporations.”
I’m not defending AI companies at all. I’m defending the little people. The AI companies have enough money or can get enough money to license their content and the little people will still be fucked. What you’re advocating for is making it difficult for the little guy to do things by putting a massive price tag on the activity that only a wealthy corporation will be able to afford. Have you not read anything I’ve written?
Experiment yourself. Test the performance of an LLM trained on an extremely small dataset and one on a large dataset. The difference should be obvious. “Disputed” here is like “vaccine skeptic.”
That’s an entirely subjective claim. Do you consider it exploitation of an author if someone reads their work and the reader becomes a better writer because of the experience? Should the reader ask for permission to learn from their experience?
You don’t know what I’m fighting for, even when I explain it to you. I’m fighting for the little guy who needs open source and independent AI to provide alternatives to the inevitable corporate AI dominance. And your position is that both the little guy and the corporation should pay millions of dollars to develop their models, therefore, only the corporation will be able to offer anything useful and then AI will be dictated by profiteering corporations rather than democratized and more useful for securing and enhancing rights and freedoms.
Also, there’s the Z = S conversion again in “legalisation.”
Yeah, so you don’t understand the technology. Pre-training is the point at which the large dataset of “human content” is fed into the LLM.
“Train from scratch” in the context of an LLM just means that you’re providing your own chosen dataset rather than copying from someone else’s. That doesn’t have any effect on the legality of the process. You can “train from scratch” with copyrighted material.
You don’t seem to be able to follow the conversation.
You said “Linux didn’t copy creatives works of others without consent. False analogy.” I pointed out that SCO actually accused Linux of copying the work of others without consent. I was pointing out that you didn’t understand the history of what you were referring to, so your analysis that it was a “false analogy” is useless. If I make a knitting metaphor and you don’t know what a knitting bobbin is, your analysis of my metaphor is useless. But you still missed the entire point of the analogy. It was to say that you are attacking big corporations but hurting innocent independent non-profits, researchers, students, and poor individuals. That’s the whole point here.
Yes, trying to represent someone without their consent is pretty pathetic. Telling that person that they don’t understand their own interests when you clearly don’t understand their interests is patronizing.
You aren’t fighting for my rights. You are fighting against my right to train an LLM based on content I can find in the world, the entire breadth of human knowledge that is available online. You’re saying I should have to pay millions of dollars to a big media company in order to scan copies of works I already own a copy of (hello, first sale doctrine). You’re saying I should be stuck with only having access to LLMs that profitable corporations develop, that future authoritarian administrations will try to adjust with “official” takes that erase actual history and human rights violations. Your inadvertent position is that only large corporations will be able to shape the future.
Did you know that I’ve tried to get LLMs to output my work and they haven’t gotten close to it at all. That’s a useless right if the output isn’t a derivative work.
You seem to think that means something contrary to my position. Again, you’re arguing against a straw man because you don’t understand what I’m saying, despite my explicit statements.
Re: Re: Re:4
Thomson Reuters v. ROSS Intelligence: District court ruled that Ross’s use is not fair use. The decision is pending appeal.
Kadrey v. Meta: This is the case closest to recieve a decision on whether generative AI training is fair use. I keep watching on this one.
It is your argument that there can be no “ethical” training when you use “ethical” in my definition. When I gave you an example of the opposite, you then deny the concept of “ethical”. WHAT THE F*CK is wrong with you?
(Before you reply, let me tell you Meta made the same arguments in their defense in Kadrey v. Meta. So I know how f*cking evil with it.)
Re: Re: Re:5
That wasn’t fair use because of the four factor analysis finding that it was for the purpose of building a competing service so it failed the market effect factor. That’s not universal to all LLM training.
You haven’t given me an example of the opposite. You referenced an organization apparently headed by biased individuals who operate on unproven bases for their approach. You take their claims at face value. I do not. I will wait for actual evidence, not claims.
Before you reply, let me tell you that everything you keep saying after “before you reply” hasn’t changed how incorrect you are. Your claimed knowledge apparently isn’t helping you make sound arguments. Trying to pre-empt my arguments when you don’t even understand my position doesn’t seem fruitful.
Except you said: “I don’t advocate for blocking them. Why do you think that I do? As long as the models are all trained from scratch.” Since it’s possible to “train from scratch” with copyrighted material, this distinction is useless. You’re saying it’s both copyright infringement to train an LLM on copyrighted material, but also ethical if you do the copying yourself rather than copying someone else’s copied dataset. That doesn’t make any sense.
Yes, and large corporations can afford to either license content or pay lawsuit settlements. Independent researchers and students and precocious middle schoolers can’t. You’re saying the little guy should be financially crushed should he train an effective LLM on a large dataset.
Yes, because it is fair use, the same as if I read a book at the library and remember every word in it and that memory informs my ability to write, but not just regurgitate the exact same text. It is fair use for children to learn to write by reading what others have written. For your position to be consistent, you would have to insist that children should pay to learn to read.
And yet you pretend to judge the ethics of others. This is really all you have to say to prove you have no moral stance here.
I’ve literally been referencing little guys. You are proving you haven’t actually understood what I’ve said.
Again, false dilemma. You don’t have to pick either side. You can oppose both.
Here you’re just admitting you don’t understand how case law and precedents work. I have already explained this, but you still miss this point: If corporations lose cases and the result is a legal precedent that all training requires financial compensation, poor people will not be able to afford to train LLMs and therefore only wealthy corporations will be able to. Full stop.
No, I didn’t. I literally said: “Except that’s not the situation. You don’t have to side with any wealthy people at all. That you think you do is what’s fucked up.”
I’m saying they’re both “evil,” though it’s clearer to say greedy and unethical. You aren’t protecting your own value. You’re selling your soul to the company store while pretending corporations are just opposing sports teams you have to pick from to align yourself with. You don’t have to align with any of them. It’s fucked up that you think they’re evil but you’re still picking one of them. That really kills any pretense of a moral argument from you.
This is where I don’t think you’re American again, or maybe not an American English first speaker. In the US, “pay tuitions to your teacher” isn’t a thing. You pay tuition at college or a private school, and not directly to an individual teacher who is more commonly called a professor or instructor at that level, but most K-12 schools where students actually learn to read are public schools that don’t involve tuition. Also, tuition is plural, not tuitions, unless you’re talking about different types of tuition. And you completely avoided answering my question of whether you’re an American or not, which likely indicates you think the answer would weaken your argument.
Not in the US where public libraries are tax funded and little free libraries are giving away books on every other street corner. Not on the internet where most websites are free to read.
I’m going to stop responding at this point. You don’t appear to be an American or have an understanding of American institutions, so arguing about American laws isn’t useful.
Re: Re: Re:6
Thumb up for pointing a fact, and I agree with it, too.
And, is it not true that some of the LLMs would compete with the original author on the written works? Asking this in another way: How can a LLM not compete with the original authors in the market of written works?
By “copying yourself” I mean when you own the copyright of the contents. Did I explain not clear enough?
Since I assume the LLMs are “derivative works” of the training data in terms of copyright, this is all logical to me.
Why not make the large corporations pay? That’s the whole point of the authors suing!
Why shouldn’t pirates be “financially crushed”? Look, being poor is never an excuse of doing illegal things. Except when you are advocating to legalize thing that should have been illegal.
Two issues: (1) Did you legally buy the book or rent one? (2) Writing a thing that you remembered in every word does not make it free from infringement.
It sounds like you are trying to “read” a chapter of Harry Potter, English version, and remember it word by word, and write down a whole chapter in French, or Spanish. Yeah, you didn’t “regurgitate” technically, and yet you still reproduce the creative expressions J.K. Rowling made in her novels.
My position is protecting small authors (writers, painters and musicians) from AIs quickly generating works “in their style”, potentially copying original authors’ expressions that is infringement, the things that copyright law was originally designed to protect.
No public school teacher would teach for free. The question of whether I’m an American is not relevant. Th point is even for public schools, the teachers got paid with government’s money, which in turn comes from your pockets through “taxes”. That’s enough. A question on this argument is a straw man and wastes my time.
Do you really think people put knowledge on the internet for you to read for free are truly “free”? There Is No Free Lunch as the economists always say.
Most websites got revenues through advertising, a few other put content on paywalls if their advertising revenues can’t cover their costs, and yet others, like Wikipedia, rely on donations.
Imagine what could happen when AI took the knowledge from Wikipedia, and generate content without citing Wikipedia as the source. Fewer people would read or edit Wikipedia. There would be less contributions from volunteers, and less donations to cover server and staff costs, eventually.
Re: Re: Re:7
That’s for a court decide in individual cases. Plaintiffs should make that claim if it’s actually true. I’m arguing against the broad generalization claim that all training must be licensed.
Easily. It doesn’t. I’m not sure why you need clarification on that point. Do you think LLMs are only meant to replace the authors of the content used to train them? That might be your problem. You’re imagining the worst case scenario and ignoring or are just unimaginative regarding all the other possible uses. LLMs are useful not to replace artists or human creativity. They’re useful for doing the shit work we don’t want to do, like writing an email to your boss that summarizes recent progress on a project you don’t even want to be working on. Again, again, I will say, again, you don’t seem to understand the facets of the technology you’re blindly frothing at the mouth over.
That said, I’d actually recommend you type these conversations into an LLM. A machine that hallucinates random data would likely understand it better than you.
No, because “train from scratch” doesn’t imply you own the copyrights of what you use to train the LLM with, only that you’re doing the training yourself. But this is absurd. Perhaps only someone as prolific as Stephen King would have enough self-generated text to train a decent LLM.
You’re also saying that researchers shouldn’t be able to train LLMs since they didn’t do all the research themselves. That would cripple scientific research. You’re fussy over which billionaire corporations are profiting off of your content while you kick actual human progress in the face. You’re trying to kill a technology in its infancy because, as happens with all technology, it inevitably gets used for profit. Automobiles put so many horse-related businesses out of business and created so many new jobs around manufacturing and maintenance and customization and delivery.
And since your assumption is wrong, this is a useless claim.
Again, you don’t understand legal precedents. If a precedent is set that you must pay to license copyrighted material for training an LLM, then the wealthy corporations will just license it. But if any poor person, non-profit, middle schooler, etc. wants to do the same thing, they will not be able to afford it. Therefore, the only effective LLMs will be own and controlled by wealthy corporations. And they will win government contracts to teach your children and rewrite your history curriculum and censor things authoritarian governments want censored.
Fair use doesn’t make one a pirate.
Actually, it’s quite often the only available alternative to starving for many people because we live in a corrupt system where the wealthy have seized the means of production and the executive, legislature, and judiciary, whereby laws are written for the benefit of the wealthy.
No, because the library is tax funded! It is free at the point of service and people who don’t pay taxes like homeless people are still able to read in the library. I’m sorry this isn’t a thing in whatever place you live, but in the US, the libraries are free to read in.
You’re changing the scenario. I didn’t say you remember every word and rewrite every word. I’m saying you use your knowledge and experience of the text to become a better writer, to write your own words. This is an analogy about human beings learn to read. It’s odd I have to say that.
That your immediate assumption is that any scenario is an intention to commit a copyright violation is telling. You’re paranoid about everything being “piracy.” It informs your misguided assertions.
Copyright law is currently meant to protect the profits of the wealthy, especially the corporations that can retain copyright ownership after a creator dies. It’s extended long after the creator’s death now such that few people living during the author’s lifetime will be alive when it hits the public domain. Those same corporations, the ones you are siding with, violate the copyrights of small authors all the time. Your stance is a moral contradiction.
Actually, many of them do teach for free. They often spend their own money on classroom supplies. They’re often underpaid for what they do. They often spend extra time before or after class that isn’t paid. They grade papers at home after hours. Your ignorance of the American education system makes all of your assertions on this topic useless.
It’s the most relevant of anything anymore. It speaks to your ignorance of our laws and how they work, of our courts and how they work. It speaks to your ignorance of how our libraries and schools work. It speaks to your inability to vote for representatives who vote on these laws, who vet the nominated justices who analyze the constitutionality of these laws, etc, etc, et al. Your opinion is too uninformed to matter.
Poor and homeless children are entitled to a free public education even if their parents can’t afford to pay taxes.
You missed the point. It’s free to the reader. You didn’t directly pay money to Mike or the EFF to read this article (though at this point I doubt you actually read the article). Free has multiple meanings. Libre. Gratis. Free as in beer. Free as in kitten. Don’t be obtuse.
If you want to suggest the US Congress pass a law stating that LLMs providing encyclopedic knowledge must cite a source, go for it. But are you suggesting that human beings haven’t already been copying Wikipedia without citation? I’ve found plenty of people who have just copied and pasted Wikipedia text without citation. Why would an LLM be any different?
You do realize that Wikipedia editors are mostly unpaid volunteers, right? And uncited copying already happens. You don’t understand this at all.
I value Wikipedia’s significant contribution to humanity. You apparently don’t because it’s a non-profit and you’re all about locking up copyrighted content for profit, so your mock concern is transparent. But also, the proliferation of LLMs will make human verification of LLM content more important, not less. There might even be an increase in paid content reviewers once major hallucinations lead to major disasters. We’re probably less than 15 years away from some notable crisis occurring because someone trusted LLM content at the wrong time. The Hollywood script about the hapless nuclear power plant programmer using ChatGPT to fix an error and cause a meltdown writes itself. I’m guessing it’ll actually be something more banal like an massive internet outage because a junior developer trusted generated code to fix a server. Any such crisis will prompt further distrust in unverified content.
Re: Re: Re:4
It’s just terminology difference, but the same concept.
And it’s your liability for training it with copyrighted material then. My idea was clear: Training with copyrighted materials = infringement. It is you that insist that is “fair use” and keep denying it.
And I didn’t deny that I help the big corporations here. Because there is no “little guy” that you claimed to be defending. It’s corporations vs. corporations, like it or not. So your “frustration” with the issue was based on misunderstanding of it. Not my fault.
You thought the Big Tech side were the good guys. I say no they aren’t. It’s better to admit they are both evil, but I have my own value to protect.
You should pay tuitions to your teacher. Everyone pays someone when they learn. There may be people who can self teach, but when they read book they must either buy or rent a legal copy of it. Does this not make sense?
Re: Re: Re:2
(continued)
Did you know you can sue AI companies when they output your paper without citing you as the author? That’s your right.
The Doe v. GitHub case, now pending for appeal in the Ninth Circuit, is defending for this. And in case you didn’t know, the Doe v. GitHub plantiffs are open source software developers.
Re: Re:
The authors of web browsers have worked very hard to make this possible. It’s not based on fair use. Instead, they got a new law passed, where temporary copies of the internet transferred data is allowed by the law. So it does not rely on fair use any longer.
But when you consider what work web browsers needed to do to get this passed:
1) browser security sandbox prevents unauthorised large scale copying of web site data
2) internet downloaded data is stored in two places: computer memory, and persistently in user’s local encrypted cache files.
3) internet downloaded data is never stored in “plaintext” in the user’s computer
4) the download bar, which gives users impression that unrestricted downloads are allowed in reality limits heavily the amount of data transferred and number of files given to users and what user operations are needed to start and execute the downloads..
So when you’re wearing your torrent leeching hat, you need to think web browsers as limiting your download habits.
Re: Re: Re:
Hey, you made a claim that purports to be a fact. Which “new law” are you referring to? Surely such a law has a citation available. I eagerly await for you to prove you’re actually talking about reality and not just making up bullshit.
So here you demonstrate that you don’t understand US law or how internet protocols work. Huzzah!
Re: Re: Re:2
https://www.eff.org/files/filenode/temporary_copies_fnl.pdf says relevant cases you should look at are CoStar v. LoopNet and Cablevision remote DVR case.
Re: Re: Re:3
You said new law, not new case.
But that citation does reference a law, or specifically an update to the 1976 Copyright Act that was made in 1998, which makes it hardly “new.”
You continue to demonstrate that you don’t understand what you’re talking about.
Re: Re: Re:4
So the year number is the only bit of information you managed to find from the case law references? Shouldn’t you be examining the “temporary” keyword/the actual limits of what browsers are allowed to do under these cases?
Basically important things you should check are the limits of the decision, i.e. scraping probably is outside of the scope, while browser is able to download the material, the browser is not allowed to give the downloaded material to the user. Browsers obviously display the web page to user, but the files stay locked inside the browser sandbox.
Persistent caching has been significant issue/since it saves the data to persistent storage and makes a copy that doesn’t disappear. Browsers have implemented timeouts for caching and allowing reloading from the original source.
I kinda expected this level information from the case law reference, but guess the year number is good enough find.
Re: Re: Re:5
No, you should be providing citations that prove the claim you made. But the claim you made is “new law” not cases.
You’re describing technical functions, not legal requirements.
Browsers don’t have agency. They haven’t implemented anything. Also, it’s not illegal to use an older browser that does permanently cache what it loads from a server. Netscape Navigator/Communicator used to do this specifically. It’s not illegal.
No, you’re just confused about what I said earlier about how I’m not doing your legwork for you. You made a claim, you offer the proof. But it’s a trick question anyway because there is no proof. You’re just going to continue to make false claims about copyright. You’ve admitted to not wanting to actually understand US copyright law. That says all you need to say and undermines any claims you’ve made.
Re: Re: Re:6
There’s nothing that can undermine all my claims. Basically my info is based on international treaties and established copyright cases as reported by news media.
Re: Re: Re:7
No, your info is based on your misunderstanding about applicability of international treaties to US copyright law and your misunderstanding of what copyright cases actually established as precedent.
Re: Re: Re:8
I don’t care if the case is a precedent or not. If some poor soul is subjected to the ruling, then the same could happen to anyone, and thus it is the law as established by the courts.
My pattern is such that I listen to all players in the marketplace. This gives me the widest possible exposition to the rules that govern our world. Closing out some players (like RIAA) from the analysis is not my way. I instead use the information to my advantage, even if I don’t agree with anyone’s position.
Re: Re: Re:9
That’s what a precedent is!!! That’s the whole point! A precedent is a judicial case ruling that does apply to other instances and creates case law that courts will consider in later cases. You’ve admitted to not paying attention while pretending you absorb pertinent information. You’re demonstrating that your casually ignorant stance is not informed.
Not everyone is correct. Not everything said is a rule that actually dictates policy or practice or law. You can’t use some random person’s uninformed opinion in lieu of an expert’s learned perspective.
Re: Re: Re:10
of course you can. you just need to be a rule expert like myself, who can follow thousands of rules simultaniously and finding logical inconsistencies from the rules. If you only follow the experts, you never get the information what is actually happening in the marketplace. Experts have idealistic view of the situation, and the ground level information is also needed.
Re: Re: Re:10
@terop @MrWilson
Just a reminder I don’t like RIAA as they had a bad reputation of push an anti-copying technology (SCMS) that didn’t work (to stop illegal copying), and hurt independent musicians that use consumer equipments for legal copying. (See this: https://en.wikipedia.org/wiki/Audio_Home_Recording_Act)
This copy restriction went beyond the Sony (Betamax case) safe harbor. Restricting a function on consumer equipment that has perfectly legal uses.
Re: Re: Re:11
RIAA’s position in the marketplace is significantly better than position of random pirates. Mainly because RIAA and the music publishers worked hard to get working products to the consumers on large scale. Pirates have no such defense.
While I don’t like RIAA’s sue-grandmother-for-swpping-music-files-on-kazaa lawsuits, RIAA’s position is still significantly better.
Re: Re: Re:12
The RIAA is not a music publisher. It doesn’t work hard to get working products to consumers. It’s a lobbying organization. That you conflate them is telling.
Re: Re: Re:13
RIAA’s large music collection and contacts to top level artists means they did the work that was expected from them. The only reason they are able to speak for the artists, is because they have contracts to many artists. And if riaa didn’t do their job, those contracts(and thus riaa’s position on the marketplace) would not exist.
Re: Re: Re:14
You don’t understand what the RIAA is. It doesn’t have a large collection of music. It is not a music publisher. It is not a record company.
Re: Re: Re:15
The lawsuits RIAA have done in courts say otherwise. They had no problems claiming copyright ownership of songs from top-level artists in the court paperwork and they used those copyright bits to harass single mothers and elderly people and some pirates. See recordingindustryvspeople for more info.
Re: Re: Re:16
Sueing on behalf of artists isn’t the same thing as being the owner of the copyrights. RIAA is a membership organization composed of big record companies. The companies sign the artists, not the RIAA.
Re: Re: Re:12
No. The point is RIAA only works for the best interests of the big record labels, and doesn’t care about independent artists. RIAA can lobby to make law that makes independent artists’ life harder.
Re: Re: Re:13
How is RIAA able to get contracts to top-level artists, if they’re doing nothing to the benefit of those artists? Copyright gives copyright ownership to the artists when the product is created, so riaa had to do something to get access to the copyright ownership.
Re: Re: Re:14
Only top-level artists. RIAA doesn’t care small artists along the way. So small artists have to file separate lawsuits against Suno and Udio (AI music generators) in order to demand a share from them. (And they have filed suits, Justice v. Suno and Justice v. Uncharted Labs.)
Re: Re: Re:15
When these small artists are rejected early in their career, how long do you think these artists remember this treatment? If RIAA doesn’t support small artists, when the artists are further in their career, I bet many of them don’t want to take RIAA’s contract simply because how they were treated when they were starting their career.
This is what I do with steam. They let my beginner’s game rot in greelight for 2 years, which made it completely outdated. Thus I have bad experiences with steam, and now that I’m more experienced, I’m not giving my products to steam at all. Instead, the people who supported me early days and let my product get published (this would be itch.io), gets my business. This way companies are just digging their hole downwards, if they treat starting artists/developers badly, and it takes significant perks before that hole is filled.
Re: Re: Re:11
Yet you link to their propaganda ministry, the Copyright Alliance…
Re: Re: Re:12
Copyright Alliance isn’t just RIAA, mind you. You shouldn’t treat their opinions as mere propaganda before you go actually read them and understand what they are talking about. Unless you are all anti-copyright can you disregard them.
Re: Re: Re:13
It was founded by Jack Valenti for dogs sake!
I’m not anti-copyright. I’m against large corporations that abuse and exploit creators and workers and customers for profit.
That you turn a blind to them while decrying AI companies is a monumental hypocrisy.
Re: Re: Re:14
As if the AI companies don’t exploit creators or put workers out of their jobs…
Re: Re: Re:15
Sure, and…?
You keep pretending like it’s okay when big media exploits people because you can find examples of AI companies exploiting people. Maybe big corporations exploiting people is wrong on principle regardless of the products or services they profit from…
[On the discussion of Linux and SCO case]
Please name an “innocent independent non-profit, researcher, student, or poor individual” you are talking about.
Or is it just me that I sense no one but some “bad students” who just want to freeload and use ChatGPT to complete their homework, ignoring academic ethics?
TRUE. Because you are not a registered non-profit that is entitled to an exemption on infringement. Libraries have that exemption, but not you.
Personal copy of the book does not imply a license for commercial use of it. The “first sale doctrine” does not permit reproduction of a work (precisely speaking, the exception is only given for personal archival in the US Copyright law as far as I remember).
Sigh. Another misunderstanding. I’ve said that granting fair use isn’t the solution.
The solution is allowing rental of licensed, legally-trained LLMs to smaller businesses, and allowing small businesses to tweak the models without the need to negotiate licenses from original authors. The details of this is to be discussed in the future. You can’t say authors’ works should be exploited by big corps when you claim to protect small businesses in the way. ChatGPT and Google are f*cking big. That’s the reality.
Derivative work isn’t judged by similarity alone. It’s irony, because all the debates about AI “regurgitating” was really about proving the works are contained in the model (which is copyright infringement), and yet they all argue that regurgitation is a bug, while the true goal of such was to avoid copyright infringement claims.
When GitHub Copilot sucked most of the open source code it found on the internet. It didn’t obey the license of free and open source code by crediting the authors or release the entire model under GPL. That was the reason of the lawsuit.
Re:
I’d hit the character limit before I got beyond just the names of people I know personally. I am literally talking about every American who isn’t a wealthy corporation. That’s how rights work in the US. Everyone (theoretically) has the same rights.
Your lack of imagination is your problem, not anyone else’s. But it’s also your lack of research. Do some research into academic use of LLMs and get back to me. And by research, I don’t mean just googling your confirmation bias.
It’s fair use. Everyone has that right.
Training an LLM isn’t necessarily a commercial use. You’re not arguing against commercial use elsewhere. You’re broadbrushing all training as a copyright violation without regard to the purpose of the LLM. I’ve literally cited non-profit research.
It’s not granting. It is fair use. This isn’t a negotiation. This is interpretation of the law as it is.
Exactly. You’re saying only wealthy corporations can develop LLMs. Any attempt you make to pretend that you care about “small authors” is bullshit. You’re advocating for enriching the already wealthy. Full stop.
At this point, I’m guessing you work for a big media company.
You aren’t an American. You don’t have a say in US law. And if you actually do, that’s bribery and corruption and you should be prosecuted.
I’m saying author’s works shouldn’t be exploited by big corporations, whether it’s big media or big tech. You’re advocating for big media to continue to fuck over the little guy.
Again, you don’t understand the technology. The original trained text is not included in the model. It can’t be. The size of the model is too small to contain all the text that it was trained on. You don’t understand tokenization.
You seem to assume that I’ve somehow claimed that every lawsuit against an LLM company is the same or that every lawsuit against an LLM company is without merit. There are some scenarios where an LLM can be trained illegally. Reread my first post to which you responded. Note what I didn’t say that you’ve pretended I’ve said. Stop arguing with straw men.
Re: Re:
I can type a “John Doe” name here. And I requested you to name just one person or organization. You claim about you would hit the character limit is a lie. You simply don’t want to, because you just want to “freeload”.
No, I won’t. The burden of proof on this part is yours, and while I can suspect one such academic use exists. This doesn’t mean I should let go commercial uses of these LLMs. Fair Use Factor One (in the U.S. Copyright law), you know that.
When an LLM has both commercial and non-commercial uses, it’s the commercial part that would be argued in court in the aspect of fair use. You can’t shield commercial LLMs from liability simply because they have non-commercial benefits. (And I have argue this before. Same position as in that USCO draft report.)
At the “wealthy corporations” can be forced to pay me when they use my work for training; do you like that? I simply don’t want big corporations to take my works for free! Even I would hurt what-you-called “poor people” who are lazy and just want to freeload.
Because they are already fcking, you get it? And how would it be any better if the Big Tech fcked you rather than the Big Media do it?
Data compression. Information entropy. There is no such a thing as free intelligence. I know all these.
I didn’t assume that. But you seems to have no idea who you are standing with. You are blinded by the idea that AI can give you free energy, or free intelligence. There’s no such a thing, and that’s why the creator are fighting. To protect the sources of intelligence from unlawful stealing (read: copying, piracy).
Note: I’m even against AI training with Creative Commons-licensed content because, AFAIK, no LLM do bother to attribute the original sources of what the AI has been trained with.
Re: Re: Re:
What the hell are you even talking about? First, your request for names doesn’t entitle you to names. But, as I said, “I am literally talking about every American who isn’t a wealthy corporation.” Do you know any non-wealthy Americans? Add them to the list yourself. Google “Joe Smith” and an American city and you’ll find random people with names and ages and addresses. They’re all included. It would be easier to name the people who I’m not talking about, such as Elon Musk, Jeff Bezos, Mark Zuckerberg, and other wealthy assholes who can afford to license as much content as they want.
But the burden of education is on you. You’re ignorant on the topic. And as you said, teachers should be paid. You’re not paying me to educate you.
Except, as you would know if you actually understood fair use and the case law relating to it, commercial use doesn’t bar a fair use determination because a single factor in favor of the copyright owner doesn’t necessarily negate a court finding a use to be fair use.
False. Copyright owners have opposed training as not fair use regardless of the purpose. It’s just that the most prominent uses have been commercial. But again, the arguments are that training is not fair use. The arguments have not been that non-profit, personal use is fair use but commercial use is not. You’re moving the goalposts here.
Literally the first line of the first comment I made on this topic to which you responded was: “people claiming it’s not fair use and that all training must be licensed…”
If you think non-commercial training is fine, you haven’t said so until now and you’ve been responding to my assertions about non-commercial training as if they are also copyright violations, so at best you’re backtracking because you know I’ve caught you in a contradiction, but at worst, and more likely, you don’t have a clue what you’re talking about.
Actually, you could. That has actually been a determination in some courts in copyright cases. Your ignorance harms your assertions.
They don’t have to take it for free. They can license it. What is your content worth? You’ll get pennies.
Keep showing that contempt for poor people and making assumptions about their character and intentions. It says so much about your position.
Yes, that’s literally what I’m saying. You don’t seem to get it. And you’re cheering them on while they make it worse.
The answer to your question is literally in the line you’re responding to: “I’m saying author’s works shouldn’t be exploited by big corporations, whether it’s big media or big tech.” It wouldn’t be better if Big Tech exploited it over Big Media. Neither is good. But you’re advocating for one. I’m advocating for neither.
So you admit you don’t understand how LLMs are trained. You could just have said that from the beginning.
You seem to have no idea what my position is, even after I correct your misperception.
Huh what? Quote me where I said anything like that. Intelligence isn’t something a machine is capable of giving you, unless you’re using it in the sense of information like an “intelligence report” that a military unit might receive from scouting units.
Copyright violations aren’t stealing or piracy. Stealing involves a rivalrous, scarce commodity and the deprivation of the owner of that commodity. Copying data only creates more data. Piracy is something that happened a lot in the late 1600s and in poorer parts of the world like the coasts of Somalia. Your scare words and moral equivocating is just biased propaganda that shows your intellectual dishonesty.
Not all Creative Commons licenses require attribution. Holy fuck you don’t understand what you’re talking about.
Re: Re: Re:2
In other words, the right to freeload. I got it. This has never been a right before. As if, the right to a “free lunch” where in economic reality there is no such thing.
Note: It is a different matter to advocate public access to knowledge with taxpayers’ money. And yet you don’t seem to be doing that. You just want to enjoy the copyrighted works for free without paying the creators. Advocacy for legalizing things that was illegal.
“Google Books” case and Perfect 10 v. Amazon. Did you think I have no idea about the courts finding fair use?
While I’m in this position, it’s their right to argue it’s not fair use.
Training cannot be granted as a fair use regardless of purpose, including commercial and non-commercial AI. What’s the problem here? Because you are suggesting an extreme end in this spectrum: that ALL training must be fair use regardless of the AI being commercial or not.
I didn’t. I’ve said my position on AI training is same as USCO: that the “fairness” of AI training with copyrighted content depends on the ultimate purpose of AI.
In case you are still confused, I can name (hypothetical) examples:
(1) An AI-powered article summary system: Generated content mostly depends on the article the user provides as input. Almost no “regurgitation” of the training data is possible. Training this AI with copyrighted materials could be fair.
(2) A machine translation system (Google Translate and like): Generated content again depends on user input (i.e. text to translate) and not training data. In this case the training with copyrighted materials has little effect on the books’ market and thus could be fair use.
(3) AI image upscaling: Again generated content mostly depends on user input. Almost no regurgitation possible. Training could be fair use.
But (4) general purpose generative AI, including ChatGPT, Gemini and Grok, are NOT these categories.
No, because “non-commercial” training could be unfair when the model had a use that could be commercial. Like, how Internet Archive (a non-profit) lost the Hachette v. Internet Archive case about the “digital lending” of books. Being “non-commercial” isn’t a sufficient criterion to rule for fair use.
In fact they took it for free. You didn’t read the case of Meta, and are making a wrong assumption.
In Andersen v. Stability AI, the judge denied the defendent’s motion to dismiss the copyright infringement claim, the plantiffs (many visual artists) cited a compression saying by Stability AI’s CEO. (Emad Mostaque: “We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can recreate any of those [images] and iterations of those.”) And so the judge ruled in favor of the plantiffs.
It’s useless trying to accuse me of not knowing about how LLM is trained, because you simply have no clue about it either and suggest it’s still magic. There’s no magic. By aggregating lots of pictures of apples and compressing the aggregation aggressively and in a very lossy manner, a “ridiculous” size decrease can be achieved.
This technology is incredible by itself, but cannot rule out the claim of copyright infringement.
Intelligence as in “intellectual property” and the “artificial intelligence” word itself. So do you agree there is no intelligence in AI?
So you don’t believe the personal data leak is a issue, including password leaks, cracking of people’s secrets?
CC0, the public domain dedication, does not require attribution. Other CC licenses have the BY clause.
Re: Re: Re:3
You’re calling everyone who isn’t a wealthy billionaire a freeloader? What is wrong with you?
There are plenty of rights that allow people to do things for free. Gratis isn’t the only kind of “free.”
I’m literally saying we already do that. They’re called libraries! I don’t have to advocate for what is already happening and has been for hundreds of years!
Quote me where I said that. You keep arguing with straw men. You seem to think anyone who disagrees with you is only interested in getting things for free. You are paranoid and obsessed.
Fair use is legal, as I’ve been saying. But even then, changing laws is also a thing that happens in societies.
I think you citing cases or claiming to have knowledge of issues relating to the topic doesn’t do a damn bit of good in your flawed analysis because, as you’ve pointed out, you’re just spewing a biased, paranoid, profit-driven perspective in which apparently everyone owes you something.
They can argue whatever they want.
Another contradiction. Earlier, you stated: “Because you are not a registered non-profit that is entitled to an exemption on infringement.”
If you were keeping up, you would have noticed that I noted that Thomson Reuters was decided based on the competing service aspect. I said that people who claim all training is a copyright violation and requires licensing are wrong.
Holy fuck dude. Again, “Literally the first line of the first comment I made on this topic to which you responded was: “people claiming it’s not fair use and that all training must be licensed…””
You are literally changing your position here. If you think not all training must be licensed because some of it is fair use, then you shouldn’t have started arguing with me! I have said multiple times you haven’t understood anything I’m saying and you just keep proving it.
You’ve literally conceded my point here. If there are example, then not all training isa copyright violation or requires licensing. That was my whole point!
You missed the point (again). I’m saying that if it is determined that training is not fair use and all training must be licensed, the wealthy corporations can pay for the licensing, but poor people can’t. I didn’t make a claim about what happened in a particular scenario. I’m talking about the future implications of these cases. You know- legal precedents, something you don’t seem to understand.
We’ve been discussing text LLMs, not art generators. You’re shifting the goalposts again. Also, Andersen v. Stability AI hasn’t been decided yet.
I haven’t said anything about magic. You cannot stop yourself from making up straw men to argue with.
You missed the point again. An AI cannot give a person intelligence. An AI can be intelligent or a human can be intelligent, but unless you’re doing some kind of science fiction mindlink between the AI and the human, the AI cannot give you intelligence.
Another straw man. I said a copyright violation isn’t theft and I explained why. I didn’t say leaking personal data isn’t a problem. That is a non sequitur that has nothing to do with it. Hacking data is a separate crime from copyright violations. But you don’t understand US law, so here where are again.
So you concede the point.
What’s wrong with you for wanting the content for free?
Then why not paid the content you use for AI training, then?
Not everyone, but someone, and that’s enough.
Because YOU are not, period. A non-profit can weigh in favor of fair use, but it’s just one factor out of the four. You’re not the non-profit that can even argue on the Factor One. Is it not clear enough?
I think I no longer need to reply on this straw man, because you simply don’t want to pay any dime to the authors, and simply want to use the copyrighted content for free. And that’s why you are dodging questions and try to impose your own “fair use” theory to others, ignoring the USCO that had debunked it.
Core question: Isn’t generative AI able to create content that compete with the authors that created the works that made the training data?
You’re not answering this question, and while you argued that “not all AI trainings are copyright violation”, you suggested instead that “all AI trainings are not copyright violation”. F*ck off with the logic trick.
There is no contradiction. All training must be licensed. Because fair use is legislated as an exception, not as a rule. If you run a company that uses someone else’s copyrighted works for profit, you must seek license for them first. Only if the licensing deal fails could you seek for fair use arguments in court. Not the other way around.
Stop playing with the law.
Yes, set up a legal precedent that all AI training must seek license! Your assumptions of the “poor people” are nonsense, so f*ck!
“Future” + “poor people” -> cases that are not happening now and are moot to discuss.
Same. Text LLMs do regurgitate and those are proofs that the copyrighted works are in the data. Don’t ask me for concrete proofs, because those are parts of the legal discovery processes. It would be the AI companies that disclose the training data and training process, not me.
As for the discovery of particular cases, I know one fact, that Meta did torrent books through “shadow libraries”, i.e. pirate sources. Whether the pirated book content would end up in Meta’s Llama model is irrelevant, as the plaintiffs have already moved for a summary judgement that Meta infringes copyright.
Except no authors are buying your theory. The discussion of this part is moot because ther is no definition of “theft” in copyright laws, it’s laypeople’s saying about copyright infringement when out of the legal context.
And you didn’t seem to want to know why they call copyright infringement “theft”, so be it.
Re:
I have never claimed to want content for free. Why are you arguing with a straw man? Further, why have you extended this straw man to every one in the US who isn’t wealthy?
You missed the point, I think because you don’t understand the concept of gratis versus other forms of “free,” such as libre. I’m referring to libre.
That’s not justification for creating a new right for copyright owners just because you’re paranoid and greedy.
You have admitted to contradicting yourself here. Is that not clear enough? Non-profits are included in the non-wealthy people you have called freeloaders.
That is indeed a straw man. I haven’t never claimed this at all. I am an author. I’m also a designer. The most egregious violation of my copyrights has been perpetrated by the Big Media companies you are sided with. Accusing me of wanting stuff for free when Big Media exists to profit off of poor creators is rich. You’re propping up your own abuse at the hands of wealthy corporations and taking it out on others.
I’m not dodging questions. I’m directly addressing all of your bullshit.
The USCO hasn’t debunked it. They issued a non-binding opinion. This is something that gets decided in courts or the legislature.
Compete is a subjective term. I don’t personally think so. As I already said, I literally attempted to get an LLM to read a vast amount of my work (I am a published author, dude, not some rando who wants free shit), just to test it, to see what everyone is afraid of. Not only could it not replicate my writing style, but it was also full of boring prose. So no, I don’t think it can compete with authors.
You literally misquoted me there. Search the page for the phrase “all AI trainings are not copyright violation”. You are the only one to have said that in this discussion. You are creating a straw man here.
That is a contradiction. All training doesn’t have to be licensed. You’ve said so yourself.
The exception of fair use is a rule. It is built into the law. It is not just a defense in court but an actual part of the law itself. You seem to be running into a confusion about the concept of exception vs rule, which is a common English linguistic juxtaposition, but that doesn’t apply to this scenario.
This is not true in many cases!!! Plenty of companies use someone else’s copyrighted works for profit without needing a license. Fair use allows many uses that don’t require a license. Parody doesn’t generally require a license. Commentary doesn’t generally require a license. There’s a world of content out there generated using other copyrighted content that doesn’t require a license. Again, this claim just demonstrates how your bias is limiting you to a myopic viewpoint.
Not at all. I’ve already cited the case law that proves this wrong. You aren’t even arguing against me at this point. You’re arguing with reality. People have used fair use prior to and in lieu of going to court.
Start reading the law and the case law so you’re not so wrong.
You’re admitting here that it isn’t yet. Which means you’re admitting you’re wrong.
There are no assumptions. It’s pattern recognition. “So fuck” isn’t a complete sentence.
No, they aren’t. Current actions have future repercussions. That’s how laws and legal precedents work.
Sure, you don’t have to actually prove any claims you make. I bet you’re a Nigerian prince too!
So you’re admitting you haven’t seen the proof, but you believe it anyway. That’s magical thinking.
You just changed the scenario again. We’re talking about results, not training. And I don’t support Meta. Zuck can get fucked for all I care (as I have already said).
It is relevant to the discussion we’re having.
I am an author. Cory Doctorow also thinks the same. Plenty of others too.
I know why they call copyright infringement “theft.” It’s because they want to make a moral equivocation to charge the discussion and depict copyright violators as petty thieves. You seem to be in the same boat.
Re: Re:
If you didn’t want content for free, then you should please STFU in these AI lawsuits because you really had no idea what those AI companies have done.
Bullsh-t. (1) Those literary works AI companies have taken have no “libre” things to talk about. (2) The “libre” idea, advocated by Free Software Foundation, Creative Commons and similar group have nothing to do with AI scraping works, the works have been “non-libre” from the start. (I am talking about LLM scraping here, not the GitHub case that scraped the open source software, but even with the open source software, attribution is a minimum requirement before the licensee receives any freedom to distribute the software.)
Whether the AI can compete with YOU is not important in the lawsuits.
What’s important is that you are against the authors who want to being a suit because you are selfish and disregarding their works and creative labor.
I think I need to remind you one important point: Fair use was not enacted to protect technological innovations. Fair use was enacted to protect free speech.
Therefore fair use are traditionally granted for parodies and commentaries. Technological innovations themselves are not reasons for fair use. Saying that AI is innovative enough so it can be “fair use” is clearly misunderstanding of fair use.
Generative AI does not fit the cases of parodies or commentaries, therefore the fair use argument of this part is useless. (I didn’t say this. This is mentioned in an amicus brief of the Kadrey v. Meta case, by “copyright law professors”.)
I didn’t say I’m on the position of a judge.
The fairness of the AI training depends on the ultimate uses of the model. And you admitted that it’s all f-cked up when you pirate books for training. Anthropic (the company behind Claude AI) is also f-cked up because they also pirate.
True. And its even truer as the AI companies never attribute the authors when they use the data.
Re: Re: Re:
Again, you’ve missed the entire point of everything I’ve said, including the first post that you responded to. If you didn’t want to argue about this, you didn’t have to respond, so you can STFU yourself.
As I said from the very beginning, I am not defending the AI companies. But in the course of broadbrush claiming that all training is a copyright violation, you are actively trying to deprive Americans of rights. You are advocating for making the big media companies wealthier and more powerful at the expense of corrupting my democracy. You’re not even an American. Stop lobbying for changes to our laws. If anyone should butt out, it’s you!
[citation needed] They have trained on vast amounts of content. Some of that content was libre. Hell, some of the content is already in the public domain, and you can’t get more libre than that. This is a weird claim.
So you just admitted that there was some libre content and you’re just pretending it doesn’t exist for the sake of your argument. That’s some intellectual dishonesty right there.
It is important to me and to other creators. You are arguing entirely from a personal bias, but you think my opinion shouldn’t matter because I’m arguing from my own perspective. That’s some hypocrisy right there.
No, not at all. That you think that is your straw man in action again. You keep thinking I’m arguing against the people who filed the lawsuits. I’m arguing against people like you who make these broad statements while being ignorant of the implications and being contradictory in the intentions and effects of your support for wealthy corporations while blindly thinking you’re helping the little guy. You keep citing the lawsuits as if I’ve claimed everything in all the lawsuits should be decided in favor of the AI companies. I have said no such thing, but you keep making up straw men like that.
Of course not. Fair use as a concept originated in the 18th century. Quote me where I claimed that.
The origin of the concept in the UK was to protect the right of a publisher to make an edit to a treatise that it had published and had nothing to do with free speech.
Sometimes the technological innovation is indistinguishable from a transformative use, so a technological innovation can be a reason for fair use.
I haven’t claimed that. You continue to argue with straw men.
Parody and commentaries aren’t the only aspects of a fair use analysis.
So you are admitting that you’re just taking it on faith. You want it to be true, so you believe it. That’s a terrible basis for a belief.
Not necessarily. That can be a factor in the analysis of fair use.
I didn’t actually say that. I also don’t support it either. But you keep claiming I’ve said things I haven’t said. You’re not arguing with me at all. You’re arguing with some shadow you think represents my position. And no matter how many times I try to tell you or point out that you’re attributing things to me that I have not said or supported, you just ignore those statements and make up new straw men.
Re: Re: Re:2
Which rights?
Re: Re: Re:3
Reread everything I have said on this post and try to understand it at least once.
Two problems with the draft
1) training infringes reproduction right
2) fair use is not available if the AI outputs substitute the original
These two rules together pretty much means that if you want to follow the cutting edge of AI research, you need to move your focus from western societies to the communist china.
Re:
@terop
China is a bad excuse for AI companies in an attempt to legalize exploitation of creative labor. There’s no saying from the USCO draft that you can’t train AI for purely research purpose, what it said is that many commercial AI training are not likely to pass the fair use test as what these AIs generate have the potential of competing with the original works and thus would be at disadvantage in the Factor 4 analysis in the fair use section in US copyright law.
Re: Re:
That’s rich considering you yourself have called non-profit researchers freeloaders and pirates who just want free content.
Re: Re: Re:
Since the draft explicitly says that AI practices are illegal according to copyright office, I’ve decided to disable AI features in gameapi builder tool and meshpage.org. Listening to every entity in the marketplace is necessary to do correct decisions, but copyright office opinion matters more than opinion of random pirates. Thus AI features are disabled for the future of gameapi.
Re: Re: Re:2
In no way does it actually say that.
Re: Re: Re:3
To be precise, the training of the AI models with copyrighted data is prima facie copyright infringement. After such prima facie infringement happened can the court evaluate whether the accused infringement is fair use. “Fair use” is evaluated with several factors together, and there is no blanket saying that AI training is fair use or not fair use. EFF has been misleading the general public (and the AI users and AI companies) that AI training has always been fair use. The U.S. Copyright Office warned that it’s not. In other words, don’t expect courts to rule in favor of AI.
Re: Re: Re:4
If training is fair use, then it is not copyright infringement. A fair use is de jure not infringement.
You should reread the article.
No really, you should reread the article. “Though the report is non-binding, it could influence courts…”
Re: Re: Re:5
There is a distinction between “prima facie” infringement and the infringement after court’s judgement.
The “prima facie” infringement refers to the action that constitute infringement before the court can find fair use on the defendent. The defendent only needs fair use arguments after the plaintiff has successfully alleged the “prima facie” infringement action.
You are deliberately confusing the court ruling process. When the plaintiff has alleged the infringement action, it’s only the “prima facie” infringement and the courts need such distinctions, or else the definition of “copyright infringement” in the court ruling process will becomes a circular loop.
Re: Re: Re:6
You’re just parroting the Copyright Office Report, which multiple people disagree with, which is the point of this article.
Fair use isn’t just an affirmative defense.
I’m guessing at this point that your confusion isn’t deliberate and you genuinely just don’t understand what you’re talking about about, or what anyone else is talking about for that matter. Honestly, an LLM could do a better job of arguing.
Re: Re: Re:7
I’m now treating MrWilson’s argument more like a spam and will not reply anything more.
Re: Re: Re:8
It’s weird to refer to me in the third person when you’re responding to my comment.
Re: Re: Re:7
1) courts only consider fair use if you
a) raise the issue beforehand in your court paperwork
b) and you were found violating copyrights and have
no other place to go than damages calculation
if you havent followed google vs oracle paperwork, google tried fair use argument but needed to spend millions on lawyers fees before court considered the fair use arguments.
2) court’s decision is required before you can consider something fair use. This 2nd rule kinda negates all your bullshit that you can violate copyright first and then claim fair use. You need to obtain courts decision before it can be declared fair use.
Re: Re: Re:8
Incorrect. A rightsholder can determine fair use and not sue or issue a takedown. The court doesn’t make it fair use, it determines that it already has been fair use.
Re: Re: Re:9
There is no automatic-fair-use-just-because-you-say-so. The courts and judicial authorities are there for a reason.
For DMCA takedowns, what you are referring to is “potentially fair use” but being “potentially fair use” does not mean it always is. Especially it’s the courts who have the authority to declare fair use. Not you.
Re: Re: Re:10
Yes, the court is there to chide you for not considering fair use in the first place before filing and rule in favor of fair use when you just thought you could squeeze money out of someone for a right you don’t actually possess.
“Potentially fair use” is not a legal term.
Courts have the authority to declare fair use in a litigated determination. But I can still say something is fair use based on an analysis of the four factors and a reading of related case law and that can hold up in court if the court agrees. You’re pretending like everything goes through the court. Plenty of lawyers advise their “potentially” litigious clients that some instances are fair use and don’t suggest suing unless you want to lose and “potentially” be liable for court costs. Plenty of copyright holders determine fair use on their own and don’t sue. I’ve literally determined that some of the use of my works is fair use and I haven’t sued. Oh look, I do have the authority to declare something to be fair use!
You seem to think every copyright holder should sue every user to find out if the court agrees with a fair use analysis. That’s not realistic or sensible.
Re: Re: Re:11
Yes. Four factors. Then why the hell you disagree with the Copyright Office’s analysis and instead pick only the case laws that you think that would rule for fair use for your particular scenarios?
It isn’t that I don’t understand case laws, but you have not refused any point about how USCO is wrong, and so such an argument of you is a waste of my time.
No really. I’ve read the common defenses, Authors Guild v. Google a.k.a. Google Books case (which is about book search engines, not generative AI), Sony v. Universal a.k.a. Betamax case (which ruled for fair use only for personal video copying and notably does not apply to cases like Napster and Grokster; the Grokster case is important here as the Supreme Court pretty much denied there can be fair use for P2P copying), Google v. Oracle (which is limited to software code copying only and does not apply to other kind of works such as books). You guys who tried to defend fair use on AI pretty much need to notice the Warhol Foundation v. Goldsmith case, because that one is the closest to generative AI on fair use. That the copyright holders would cite to rule against you. Rather than I explain what that is about, you should study yourself. Make your “fair use” arguments able to win on that case, or else you won’t win.
Re: Re: Re:12
Sorry I missed another one. Campbell v. Acuff-Rose Music (often cited by AI companies for fair use but that’s limited to parodies, and generative AIs are obviously not parodies for the case to apply).
Re: Re: Re:12
Because that’s how human thought processes work. We recognize patterns, such as the fact that courts have declared that the four factor analysis is not a numbers game where each factored is weighed the same, but rather different estimations of different factors can tip the balance one way or another. I guarantee you that I have read more case law relating to copyright than you have.
It’s also funny that you’re asking me why I favor case law that agrees with me when you’ve deliberately cited unsettled cases that may possibly indicate one court might potentially agree with your assertions and have chosen to ignore cases where fair use was determined.
It’s not “case laws,” plural. It’s case law. And it is definitely that you don’t understand it at all. If you did understand it, you wouldn’t be making claims that contradict case law.
The EFF pointed out how the US Copyright Office got their analysis wrong. I mostly agree with their perspective on this topic. I also offered my own perspectives, which you seemingly intentionally or just clumsily interpreted completely differently than anything I actually wrote.
And yet you continue to respond, thus admitting that you are voluntarily wasting your time. Why are you wasting your own time?
This is literally you interpreting case law to support your prejudiced conclusion. These are notable cases, and not all relevant, but they aren’t all of case law relating to fair use.
I’m only one person. Here you’re admitting that you’ve grouping me with other people. This is perhaps one explanation why you make up straw men I haven’t uttered. You think multiple people who you disagree with all think the same. This is lazy thinking. You refuse to actually engage with what I’ve said and you just fight false positions you imagine. You’re really wasting your own time here and looking silly while doing it.
Not close enough to set a relevant precedent. The Warhol work was definitely derivative of the Goldsmith work. An LLM trained on a work among millions of others can’t necessarily reproduce that one work and in the vastness of its training data a single work can’t be significantly influential on the results without intentional human intervention. This, again, demonstrates that you don’t understand how LLMs work.
This isn’t a complete or coherent sentence.
I’m not litigating anything so I’m not going to win or lose a case. Do you understand that I’m not a defendant in these lawsuits? Have you lost all sense of reality here?
Re: Re: Re:3
if it does not say that, what does the following quote mean:
“””The steps required to produce a training dataset containing copyrighted works clearly
implicate the right of reproduction.
Developers make multiple copies of works by
downloading them; transferring them across storage mediums; converting them to different
formats; and creating modified versions or including them in filtered subsets.In many cases, the first step is downloading data from publicly available locations,but whatever the source, copies are made—often repeatedly”””
To me, this explicitly states that reproduction right is infringed when AI companies prepare training data for the training process.
Re: Re: Re:4
For all of your comments on copyright, it appears you have no fucking clue how copyright works. “Implicating the right of reproduction” ≠ “AI practices are illegal”
How stupid are you?
Re: Re: Re:5
It’s you that need to explain why it’s not, and it won’t help by accusing other people are stupid.
If those commercial AI models use only licensed data (or public domain data) for training, then we have no problem. Otherwise, they are infringing either the author’s right on reproduction or derivative works or both. The only possible defense here is fair use, but, as Thomson Reuters v. Ross case has shown, the AI companies are not likely to win.
Re: Re: Re:6
It’s been explained to you. You just have a profit-motive that prevents you from understanding.
You are new here. Terop is stupid. He has a history of showing up and hallucinating new aspects of US laws that he wants to be true but have no basis in legislated or case law or reality. He comments here in an attempt to promote software no one is interested in.
Who’s we? You’re just one person. Are you a party to a lawsuit? Which one? What group of people have you been designated as the representative of?
Or they’re not because copyright isn’t unlimited and results that don’t include source material aren’t derivative. If I read Harry Potter and write a story about a wizard that contains nothing from Harry Potter, my story isn’t a derivative work. Otherwise Harry Potter is a derivative work of Tolkien’s. You’re creating rights that don’t exist. Copyright protects expression, not ideas.
You are losing the nuance again. Reuters hasn’t been decided yet. And the issue is greater than just these lawsuits.
Re: Re: Re:7
So your awesome analysis of the message i wrote consists of “proof by authority”. Even wikipedia says that its a bad idea, see https://en.wikipedia.org/wiki/Argument_from_authority
“Scientific knowledge is best established by evidence and experiment rather than argued through authority”
So basically if you wanted to dismiss my bullshit, you should examine the contents of the messages, instead of just looking at who wrote it.
Re: Re: Re:8
No, that’s not an argument from authority. You didn’t understand the wikipedia entry at all. There was no appeal to any authority figure. But even if there was and it was actually such a fallacy, you would be suffering from the fallacy fallacy.
https://en.wikipedia.org/wiki/Argument_from_fallacy
There’s also no need to prove you’re wrong. You’re making claims (that are wrong), therefore you need to provide proof that those claims are correct, but you can’t because you’re wrong. This is pattern recognition. You’ve been proven wrong in the past, with citations proving you wrong. You don’t get to just keep making new incorrect claims and expect everyone else to do the legwork to prove you wrong every time. If you prove yourself a person who can’t be bothered to learn about a topic before spewing your incorrect magical thinking, then nobody else has to waste their time on it.
Re: Re: Re:9
If I cannot make incorrect claims and expect legwork from you, why you can do the same? You have also been proven wrong, when you support pirate area and illegal options and practices.
In fact, this illegality is what I’m trying to save you from. The original reason why I entered techdirt in the first place was because it was clear that the site is full of propaganda/illegal practices/wrong statements about copyrights. It was so bad that we had to declare you the worst violators of copyright we could find from the internet. Even 4chan is not that bad, but you were focused on copyright issues but you took the wrong side in the argument.
Re: Re: Re:10
I’m not doing the same. I’m not making wild counter-factual claims. I’m not making up fake parts of the law. I have provided actual citations.
This is another unsupported claim from you. [citation needed]
You’re not trying to save me from anything. You don’t know me. You don’t if I engage in illegal activity or not. Also, your self-interest paints everything you talk about. You clearly don’t care about anyone else.
Our savior has come! Except you don’t understand copyright law, as has been proven several times. So you can’t teach us anything. You’ve actually come to be humiliated and to demonstrate your wishful ignorance.
Do you suffer from multiple personalities? Who’s we? Also, if you have proof of copyright violations on Techdirt, feel free to point them out. If they’re the worst, they should be quite easy to prove and link to.
Apparently you’ve never been to 4chan or any of the worse sites out there if you’re going to make this silly claim. You haven’t even taken the wrong side of the argument because you don’t understand US copyright law well enough to make a coherent argument.
Again, you’re making a bunch of claims without proof. Provide evidence or else prove that you are as ignorant as you appear.
Re: Re: Re:11
This is the real problem. You think that understanding copyright is required to stay legal when interacting with existing products. This isn’t the case. You just need to avoid clearly illegal areas. Things like AI (where training is broken), or torrenting (where lots of pirate material is available), or swapping pirate movies (where money doesn’t go to the authors but some illegal middlemen)…
My position is that none of the “understanding” the fine details about copyright is required, if you even do the minimal stuff and avoid clearly illegal areas.
But your position is always that its necessary for you to go to the illegal area and swap the damn movies…. then you’re in big trouble and need to scream “fair use will save my ass” when you forgot to obtain the licenses…
None of the fair use bullshit is even required, if you followed the default behaviour expected from you. Find the authorised vendor, and purchase the products you need. Don’t go to the pirate area.
Re: Re: Re:12
No, I’m asserting that you don’t understand US copyright law and I’ve proven it several times. Thus rendering all of your assertions about its nature to be unreliable. Some people can accidentally remain legal in their uses of copyrighted material. But you’re purported to have actual knowledge you don’t and worse, you’re purporting to educate others on the topic you yourself require understanding in.
Nothing’s actually clear when you’re ignorant. You’re just so simplistic in your thinking that you see everything (incorrectly) in black and white.
Not all AI training is illegal, clearly or otherwise. Not all torrenting is illegal. Some software developers release torrents of their free software. You can torrent Linux flavors legally. There are public domain works available via torrents.
Your position is incorrect and explains why you can’t argue coherently.
I have literally never said this. Not only do you hallucinate false parts of copyright law, you’re now hallucinating claims you pretend I’ve made. I will say again, quote me where I said that.
Fair use is legal behavior. That’s the whole point!
Who is the authorized vendor for free software?
Re: Re: Re:13
Authorised vendor is always the person or group of persons who has permission to decide the license for the user. So for free software, authorised vendor is all the contributors who have collectively decided to use LGPL or GPL license. Any one of them can publish the material in their web page, and pass license to use and prepare derived works further to the next guy who then becomes a contributor (and obtains copyright) for their own contributions.
Re: Re: Re:14
No, you said you have to purchase the products you need from an authorized vendor. If the software is free, you don’t have to find an authorized vendor. It was a rhetorical question. You didn’t understand that I was pointing out that you position is nonsensical. It’s the entire issue about fair use. You don’t have to ask for permission or pay a copyright holder if your use is a fair use.
Re: Re: Re:15
That’s wrong. You still need to find authorized vendor. It’s explicitly stated in the copyright law that 1) some operations are exclusive to author 2) you need explicit license to do those operations 3) only way to properly obtain the license is to find authorized vendor and ask a permission to do the stuff you want to do.
Free software is no different in this respect. They just make it easier to find authorized vendor by looking at some text files distributed together with the software.
Re: Re: Re:16
Um. You are wrong. Number 3 is not in copyright law at all.
Why are you lying?
Re: Re: Re:17
Precisely speaking, the other way is so called “fair use” defense, but it requires court decisions before you are greenlit.
Rather than wasting time arguing whether AI is “fair use”, my best way is to wait for court decisions to come up and see you guys lose horribly.
And even if the AI companies win the “fair use” defense (in which I highly doubt), there is still DMCA section 1202(b) that require users – including AI companies – to preserve the copyright management information (CMI). That is pending appeal in the Doe v. GitHub case.
Re: Re: Re:18
No, you are absolutely wrong. You don’t have to ask permission for all uses and you don’t have go to court for them to be legal.
If this were true, you’d still have to contact every single person who releases their works under a permissive license. FOSS developers would be up to their ears in emails asking for permission and they’d never get around to formally approving each individual request.
You really don’t understand US copyright law and neither does Terop.
This is why I call you a copyright maximalist. You invent copyright holder rights that don’t exist and insist they have more power than they legally have.
Re: Re: Re:19
This is the biggest bullshit I’ve heard in a long time. We already declared “With enough eyeballs, Bugs are shallow” as bullshit simply because we received no emails about any problems in our software. In my whole life, I’ve received exactly one email asking for permission. So if people are asking for permission, its definitely not via email. Other permission requests (where I received magnificient $6) came via itch.io web pages. That’s about it. I think the amount of requests is too low rather than too much.
So I would say the number of people who are doing things properly and asking for permission is very small. Now you all will be bashing my product as not useful and I should die horrible death simply because passing money downstream normally works as a permission request and getting money from the society is declared necessary. But creating copyrighted works is not the way to go.
Maybe you have better idea how to fulfill the requirements, since copyrighted works are not working?
Re: Re: Re:20
This is how copyright works. You don’t have to ask for permission for many uses, especially not open and free licenses. That’s the whole fucking point. And the worst part of all of this is that you yourself have released your software under permissive licenses that don’t require asking for permission!
So you’re claiming that you think everyone has to ask you permission, but you admit only one person ever has. So either you’re saying only one person has ever used your software or else you’re not enforcing your copyrights. But the reality is, under the license you’ve chosen, they don’t have to ask for permission. That defeats one of the purposes of the license!
I’m talking about FOSS developers whose software gets used a lot, like Linus Torvalds and Richard Stallman.
Except you’re completely wrong about “properly.” And “properly” isn’t even the same as legally because it’s not required to ask for permission with such licenses.
I’m not here to provide you with career advice.
Re: Re: Re:21
Well, I have tried multiple different licensing systems, including proprietary, open source, free software, custom, creative commons, eat your own dogfood etc..
But the one person who asked for permission was doing it for solely commercial game software.
Re: Re: Re:18
I think you’ve lost track of the conversation. I was responding to terop’s false claim that to use free software, you still have to get it from an “authorized vendor.”
That has fuck all to do with fair use.
I am beginning to think that you are not a good faith debater.
This is about free software, not AI.
Re: Re: Re:19
I didn’t argue out of context. Except that you mistook terop’s argument about “authorized vendor”. For free software you technically still need to get from what terop called an “authorized vendor” except this “authorized vendor” is “everyone that can distribute this software legally”.
Note that in jurisdictions where GPL cannot be fully enforced, distributing GPL software would also be illegal. This is the “liberty or death” clause since GPLv2.
Free software does not always mean public domain. For example, by training AI with GPLed code and release the AI model not under GPL, it’s still a copyright violation. Free software doesn’t mean an always green light regarding AI training (it’s a “mostly free” except when you release it as proprietary or combine with proprietary code).
Re: Re: Re:20
Bullshit. You’re omitting that Terop claims “only way to properly obtain the license is to find authorized vendor and ask a permission to do the stuff you want to do.”
You don’t have to ask for permission and your classification of the unofficial term “authorized vendor” as “everyone that can legally distribute the software” is still incorrect because you wouldn’t have to ask for permission from a person who doesn’t own the copyright and didn’t decide to release the work under the open license but merely passed it on via the permission automatically granted by the license itself.
Re: Re: Re:21
This sounds very wrong. The law does not work this way. Authorized vendor requirement in the law is there because the default behavior is that you need to be able to pass some money to the author, and not everyone in the world is authorized to sell you permission to use the software. Authors have various ways to pass the authorization forward in their sales organisations, but none of those authorization passing techniques allow you to skip the part where users find authorized vendor. None of the sales organizations can reach user’s home, so it is user’s responsibility to travel to the authorized vendor who is able to take your money and give a permission to use the software in return.
Re: Re: Re:22
This is the weirdest theory of copyright ever. You have to be trolling at this point. It’s impossible to be this dense. You’d be violating your own claims every day. You haven’t paid Mike to use Techdirt or asked for his permission to use the website, so by your own determination, you’re constantly violating copyright laws.
Or… You’re an idiot who doesn’t understand what you’re talking about about
Re: Re: Re:23
This is why my messages are being delayed or rejected outright, when Mike wants to forward all information about my products to the adverticement department and try to extort money from me.
Re: Re: Re:24
This claim is probably rising to the level of defamation.
Re: Re: Re:25
You can’t have defamation, if the information is the truth. And there’s no indication that techdirt suddenly stopped running ads on the site or that they stopped delaying the messages.
Re: Re: Re:26
Yet there is no proof that there’s a connection between your messages getting caught in the spam filter entirely because you’re spamming and any intent for Mike to extort or even ask you for advertising dollars. Claiming your unfounded claims are the truth doesn’t make them true. You’re just doubling down on false claims without proof.
Re: Re: Re:27
This is only because I’ve explicitly stated that I’ve received $6 from my 10 year software project/don’t have extra money to pay for the advericements. But since the $6 is coming from 10 years of work, those dollars should be more valuable currency than your ordinary dollars where you spend less time obtaining them.
Re: Re: Re:28
You’re a paranoid conspiracy theorist at this point.
Re: Re: Re:29
So this is the best reason you can think of, why my technology is not worth exploring. Sounds like you’re running out of reasons and have to invent some bullshit that doesn’t make sense.
Re: Re: Re:30
This has nothing to do with your software. You keep bringing it up as if anyone cares. You’re a nut.
Re: Re: Re:31
Lets look at it this way:
1) copyright office published some paperwork
2) the paperwork contains info about how AI should be handled by software authors
3) the conclusion is that training is violation of copyright
4) thus every software developer worth their salt will examine their copyright bullshit and modify it to match the changing legal environment
But my software has the following aspects:
a) I publish some computer source code/binaries
b) it has AI included in it
c) thus copyright office paperwork is relevant to the AI aspects of the software
d) it turns out that AI training area is composed of a black box that is bound to explode after copyright office opinion marks it as copyright infringement
d) thus there’s some changes needed in the software, and consiquences will be passed to usa copyright office
e) but either case, the copyright office paperwork is relevant to my software
QED.
Re: Re: Re:32
You’re confused. The topic may be relevant to your software. Your software, however, isn’t relevant to the topic for anyone else except you.
Re: Re: Re:33
Everyone else who creates software will need to follow the same copyright office rules… that makes it interesting to everyone.
Re: Re: Re:34
Again, that’s the topic, not your software. If you hadn’t commented on this, nothing would be lost. You have added no insight and in fact have spewed mistruths and ignorance. You have contributed only confusion and bluster and bullshit.
Re: Re: Re:35
that’s only because you didn’t read the full story. in short, copyright office opinion matters more than opinion of fair use pirates.
Re: Re: Re:36
Except it doesn’t if the Copyright Office is wrong. And you wouldn’t know who’s right or wrong because you admit and demonstrate ignorance about US copyright law.
Re: Re: Re:37
Noone in marketplace is wrong. They just have different view to the same problem. Authors cannot get their products to the market. Publishers don’t have money for extensive sales activity. Customers cannot find the product that solves their problems. And copyright office cannot make pirates stop pirating.
Re: Re: Re:38
If no one is wrong, then contradictory perspectives would both be right, which means truth doesn’t matter and aardvark smartphones postulate Freudian underpants in the cold void of your mom’s basement.
Re: Re: Re:39
this can be resolved by attaching context to the perspective. Then both can be simultaniously true, even if they’re logically contradictory. The context resolves the contradictions.
Re: Re: Re:40
Except both being true isn’t the only possibility. They could both be false. They could both be so wrong that they don’t make any sense whatsoever. And you clearly aren’t qualified to judge what’s true or false or even the context in which they might be either.
Re: Re: Re:41
Sure. Pigs can also fly.
Re: Re: Re:42
Yeah, you’ve devolved into complete nonsense.
Re: Re: Re:43
You haven’t even seen what level of abstract nonsense I’m capable of. I’ll give you some reading to do, we’ll return to this once you’ve read the following books:
1) sets for mathematics, Lawvere
2) Category Theory, Awodey
3) categories for working mathematician, maclane
The real nonsense is significantly worse than you think. I have significant trouble finding other people who can understand the bullshit, so there’s some ivory tower problems with the material. But hope you read it, so we can talk real bullshit and nonsense in 2 years.
Re: Re: Re:13
You know how I get information about how copyright works?
By creating copyrighted works.
Computer games, user interface libraries, intros and demos for demoscene, phone user interfaces and 3d engines. Things that are very common in today’s world as software products.
Creating copyrighted works and watching your products fail in the marketplace one after another is good way to learn the fine details about copyright.
I think you fail in copyright, because you rely on lawbooks and bullshit from the internet to base your knowledge and you have no idea how software is being written.
It’s the experience of watching it get invented, designed, written to software source code, submitted to version control, tested and bugfixed and then sales droids will try to turn it to money and failing miserably.
This whole process is such a revelation of why copyright is actually important part of societies and how much damage pirates are doing to the bottom line of companies and individual developers.
Re: Re: Re:14
Creating and publishing any creative work just generates a copyright in the US. It doesn’t give you any understanding of how copyright works at all.
I appreciate you admitting that you really have no understanding of copyright because you have some kind of folklore/cargo cult kind of belief about it. I had pretty much assumed this, but I appreciate you acknowledging that you’re neither well versed or even interested in understanding the law that you make assertions about.
So I don’t understand US copyright law because I base my knowledge on the law? Should I instead consult a ouija board? Cast bones to divine some magical understanding?
I have a good understanding of how software is being written. Also, software isn’t the only copyrightable content. Your focus on it only is a further demonstration of your ignorance. I produce copyrighted works also.
You’re talking about producing and selling a product. That’s not specific to US copyright law. That’s not even specific to software development.
You are just articulating how myopic your perspective is on US copyright law here.
You’re in Plato’s cave describing your deep understanding of a single shadow on the wall and admitting you’ve never been outside to see what’s actually going on.
Re: Re: Re:15
I think the above claim is blatantly wrong. Copyright was created to support authors whose work was ripped off by publishers who did not have authorization to sell author’s product. As such, copyright law recognizes what kind of activity is detremental to the success of product development. Its the activities like creating and publishing copyrighted works that must be continued even when money from the effort goes to some unrelated copycats.
Re: Re: Re:16
You have admitted to not studying US copyright law, so your claims about it aren’t just wrong, but completely ignorant.
Re: Re: Re:17
Copyright laws are supposed to work the same everywhere in western world, so you cannot hide behind your usa pond, when the same rules apply to larger area of the world. This is why we can sue pirate sites operated from usa, if they decide to infringe our copyright.
Re: Re: Re:18
No, they aren’t. That isn’t the law at all.
Re: Re: Re:13
If you give this BS to the courts, they will laugh you out from the courtroom. I bet you can’t even download the damn torrent client without violating copyrights.
Re: Re: Re:14
No, they won’t. It won’t even get to court because a FOSS developer won’t sue you for legally torrenting their free software that they themselves will often seed or offer to mirror sites to seed because they are interested in distributing their free software. That you aren’t aware of this fact is damning to you as a software developer. This is a giant gaping hole in your understanding of your own proclaimed field of expertise.
I’d ask for a citation, but I know you’re not good for one. And it’s not necessary because the claim is false on its face. Torrenting is just a file transfer protocol. It’s not itself illegal. Game platforms have used torrenting to release content. FOSS developers have used it to release their software. Public domain content is perfectly legal to transfer over the internet. The mythology you’ve invented about copyright law is absurd.
Re: Re: Re:15
User interfaces are needed before your file transfer protocol is useful to anyone and their mother. And there are strict rules that the user interface must not display pirated material in its user interface. Popular user interfaces like web browsers are struggling to meet the strict requirements set by the law. Basically, how you can check this information is by checking lawsuit done by RIAA and copyright lobby against software vendors who provide user interfaces (things like napster)… The lawsuits always have things like “our trademarks are infringed when (napster) displays the name of the song in it’s user interface”…
Re: Re: Re:16
@terop
Which rule? For what I’ve seen in the Napster and Grokster cases, I didn’t remember there’s a rule saying the user interfaces must not display pirate materials. The cases of Napster and Grokster were not that.
The problem with both P2P platforms is that the companies making the P2P software benefitted from the illegal copying done by their customers, and that the piracy were the primary use of the P2P software.
Of course it doesn’t make BitTorrent-as-a-protocol illegal. The real catch of it is, if you’re developing a technology that can be used for illegal purposes, make sure you don’t contribute to those activities or profit from them (or else you will be liable).
Re: Re: Re:17
https://copyrightalliance.org/wp-content/uploads/2016/09/AM-Records-v.-Napster.pdf
“””Napster provides technical support for the indexing and
searching of MP3 files, as well as for its other functions, including a “chat room,” where users can meet to discuss
music, and a directory where participating artists can provide information about their music.”””
This “technical support for the indexing and searching mp3 files” explicitly points towards the user interface.
Re: Re: Re:16
This isn’t a thing at all. There is no law that says you can’t list non-copyrightable names and titles in a display. And the names themselves aren’t the material itself. You have no idea how US copyright law works at all.
No, they aren’t.
Those lawsuits didn’t set any kind of precedent that makes displaying the names of content a copyright violation. You didn’t actually research this topic.
Trademark and copyright are two distinctly different types of “intellectual property.” You made a claim that copyrights were violated merely by the display of words in an interface, but now you’re shifting the goalposts to trademark.
The absurdity is that this entire argument is useless because it’s perfectly legal to torrent legal torrents, so therefore there wouldn’t be any “pirated material” displayed in order to run afoul of this entirely made-up false aspect of copyright law you just conjured. So you’re even wrong inside your own wrong fantasy.
Re: Re: Re:17
Good luck with that. The day you implement support for torrents, the users will demand that they can add their own protocol entry points to the system, so that their pirated file collections can be included to the system. Then you as an author can’t see anything wrong in your software, but users are leeching and pirating material like crazy. When you finally figure out that user’s only interest in your software is because it allows pirate data to be used, it’s too late and RIAA/MPAA is just few weeks away to pass DCMA notice to your software, or if it’s blatant enough violation, give you paperwork for a lawsuit.
This is significant issue for copyrights. Pirates will find a way to insert their bullshit to software projects. Things like requiring software to be open source so that pirate can then modify it and disable all the protections against copyright infringements.
Re: Re: Re:18
Many developers have implemented support for their own torrents. Notably, Blizzard has used the Blizzard Downloader that included BitTorrent protocols as a means of downloading legal game updates.
First, this isn’t the case. The Blizzard Downloader was only used for authorized Blizzard downloads. Users didn’t demand it be able to include copyright infringing uses. There are real world examples proving your random speculation is completely incorrect.
Software is typically agnostic as to its uses. Legal software being used for illegal means is on the user, not the developer, unless you can prove in court that the developer intended for it to be used for copyright infringement.
Blizzard was quite certain that its own customers wanted to use the downloader to…download Blizzard’s games, that they legally paid for. Blizzard was only too happy to provide the service people were paying it for.
Your ignorance of the legal uses of torrenting is appalling, but not surprising.
Cite the lawsuit from these organizations or their member corporations against Blizzard’s torrenting software. Oh wait, it doesn’t exist. Also, the MPAA is now the MPA and has been for 6 years, again demonstrating that you don’t know what you’re talking about. Your talking points are stale.
Your weird fever dreams and hallucinations don’t reflect reality. You should get your blood pressure checked.
Re: Re: Re:19
check this article about meta’s AI branch to fight publishers about meta leeching their AI data via torrent from pirate sites, and meta lost the fight:
https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Re: Re: Re:20
@terop
It’s not completely lost. The summary judgement is still pending, but it’s unlikely that Meta will win. Meta’s only last bet is claiming torrenting to train AI is fair use, but the judge has expressed doubt on that.
I’m also waiting to see the judgement coming out. It would be the first case regarding generative AI training and fair use (and more authoritative than USCO).
Re: Re: Re:21
Looks like Meta won their lawsuit because plaintiffs focused on wrong aspect.
Re: Re: Re:18
@terop
Open source software does not need protections against copyright infringements, since it’s part of the license to permit distribution almost anywhere and to anyone (except in jurisdictions where an open source license cannot be enforced, which is a rare case).
This comment shows your misunderstandings with open source software. Open source software ≠ piracy. And BitTorrent have legal uses. What can make you in trouble is when you permit users to pirate materials through your platform (software) AND you benefit from that illegal uses. Both conditions must satisfy in order for you to become liable. Cases where there is no liability include:
(1) You develop an open source BitTorrent client (e.g. Transmission). But you do not profit from the users using it (through subscriptions or other means) nor secondary benefits such as ad revenues.
(2) You develop a video game that supports BitTorrent protocol as a way to download game updates, and your video game client does not allow users to torrent arbitrary files on it. (That is, it allows only game updates.)
So make it clear. BitTorrent has legal uses.
Re: Re: Re:19
Holy shit. You actually got one thing right! You deserve a gold star!
Re: Re: Re:19
This isn’t true. Open source still needs to respect copyright of other people, even if they allow their own code to be copied freely.
This means that if users are giving urls to your software and software loads some data from the web, the software author need to ensure that the data loaded was not pirated. Since users provided the url/internet location, checking if the material is pirated becomes slightly more difficult/currently impossible to programmatically check it. For this reason, open source software when they load data from user-defined urls, need to have a section in their terms of service that pirated material must not be used in any urls typed to the software’s input slots. Basically software’s legality fails only if users break the conditions described in the TOS.
Re: Re: Re:20
This TOS solution is only used because no other solution is available. There are indications in court paperworks that the TOS solution is simply not enough to prevent large scale piracy happening through software you write. And thus courts are unwilling to accept it as a solution to the piracy problem. In my software, there’s additional tricks that need to be used as defense in addition to the TOS trick: namely, copy-pasting url to the software is designed to be burdensome enough that “manual steps are required” before piracy can happen, thus limiting significantly how large scale piracy users are able to do through the software.
But these same steps were tried by court cases where the defendant was paying significant damage amounts to the content owners. Thus the solutions we have for preventing user’s piracy might not be enough for the lawyers and they will just declare the software illegal.
It takes significant amount of research and effort to figure out these solutions that allow the software to operate properly, but still prevent piracy that users are trying to do. But if the research is not being done, the situation is significantly worse. The people who create software, but do not care about respecting other people’s copyright, will be blinking targets for copyright infringement lawsuits in the content owner’s copyright tracking system…
Re: Re: Re:20
The TOS isn’t necessarily binding legally if the terms of the TOS aren’t legal. A TOS, for example, could require someone to break the law in order to have permission to use the software, and under US contract law, that isn’t a binding contract.
That said, a TOS can explicitly state that verbal or written permission is not required, meaning that your entire theory that users must contact copyright owners is completely false.
I’ve literally seen copyright holders who release content under permissive licenses complain that people contact them to ask for permission when they specifically used permissive licenses so that no one would need to contact them and waste their time asking permission for something already granted in the permissive license.
Re: Re: Re:21
But this requires that author explicitly decides to do this. I’m usually talking about the default behaviour, i.e. what happens when authors decide nothing…
Re: Re: Re:22
When the authors decided nothing then they don’t make a TOS so your point is entirely moot.
Re: Re: Re:23
sadly for you, when the solution is that some copyright infringement cannot be detected reliably by software algorithms, the only solution that makes the software legal is a section in TOS that declares it illegal area for the user. Thus it is a legal requirement that TOS contains this section. Authors do not need to make explicit decision to include it, since the law forces their hand.
Re: Re: Re:11
I posted citations, but it was filtered out by the spam filter, and seems the maintainers are not willing to give you the informattion you need to check the facts.
Re: Re: Re:12
This is a flat out lie.
The only comment that we blocked of yours in reply to me calling you out for lying. Let’s be clear on this: you FALSELY claimed that we mention your software all the time.
I responded, pointing out that this was an outright lie, and saying I don’t know your software, have never mentioned it…
And YOU responded with a spam ad about your software saying “Oh, I can fix that…” and went off on some nonsense about your software.
You provided no “citations” to the things that MrWilson was requesting.
So, not only did you lie first of all, but you’re lying again now, and trying to use the fact that you tried to spam my comments with an ad to your software (which looks stupid, derivative, and pretty fucking useless), as an excuse for failing to provide actual citations.
Dude: fuck off.
Re: Re: Re:13
The damn software is the citation I’m relying on. If using my own copyrighted work is not enough for your standards to work as a citation for copyright issues, then I don’t know what is.
Relying on case law means just copy-pasting some keywords which refer to stuff we have no access to. Its just guesswork if some paperwork hidden in some lawyer’s office matches the technological environment the software relies on. Probably these case law paperworks have some cool limitations of when the stuff is valid and when its not valid, but since the actual text is not available (and noone would bother reading them anyway), it’s just useless.
my own software on the other hand is significantly better to rely on,since I know every detail of it after spending 10 years writing the copyrighted work. But if I rely on stuff that I know, other people obviously have problems following it.
So its either spouting bullshit about case law that I know nothing about. Or I spout stuff that I know well, but other people have trouble understanding. Its your call. Which would you prefer?
Re: Re: Re:14
No it’s not. Your post was literally an advertisement for your software. That’s not what MrWilson (or anyone) was asking for.
That’s not a citation.
I’m beginning to think your problem in life is not copyright, but that you may be the dumbest fucking idiot ever to visit this site.
That makes no sense. Again, the one post of yours we didn’t let through was not a citation. It was you promoting your software.
Why do you lie so fucking much?
Re: Re: Re:15
Yeah, that’s why I have to create these copyrighted works and receive no compensation for my efforts. Some would call it dumb to repeatedly do the same failure of creating a product when previous one didn’t make me richer than Bill Gates. But compensation issues aside, I think it benefits the society if people are spending their time creating something useful, instead of spray-painting the neighbour’s garage or government buildings. If I was real dumb, I would do damage to the environment. But no, I’d rather create software that looks cool and would be useful if someone would take the time and actually use it.
Re: Re: Re:16
This is a false dilemma. It’s entirely possible to create something useful and not be completely wrong about how US copyright law works. Plenty of people do it every day. You’re literally chatting in the comments section of a website you seem to find valuable to spend time on that Mike has created.
Re: Re: Re:17
Have you noticed that this only works in the land of the free? 95% of the world’s population doesn’t work that way. The world long ago moved away from the usa-centric platform you’re promoting, and looked other way. When you bow to the east, your ass points to the west. This is what you need to learn. The world isn’t in your small pond.
Re: Re: Re:18
Dude, we’re talking about US copyright law! That’s all we’ve ever been talking about here. This article is literally about the US Copyright Office issuing an opinion on US copyright law. Of course this only works in the US! You are so obtuse.
Re: Re: Re:19
You really dont get it. The laws and regulations done by USA copyright office are being copied by companies all over the world. The countries outside of usa must allow those stupid rules whether they wanted it or not, simply because global companies are following the usa rules. Even when the usa rules are completely ridiculous, the horror is being copied all the time all over the world and then we have to suffer the consiquences.
This is why usa rules are not just applying to the usa area, but it has wider impact. We should just require that usa respects their dominant position and keep the rules stable and not bring in stupid stuff that doesn’t work everywhere is the known universe.
Re: Re: Re:20
You really don’t get it. I can vote for legislators who will change US law. I have no influence over what companies outside of the US do. Talking to me about that is as useless as asking a fish at the bottom of the ocean to help address climate change.
Re: Re: Re:21
I can help with that. I know where the fish we eat comes from, i-e- norwegians are responsible for 60% of all fish eaten in finland, so we just need to pass the information through our retailers to the norwegian fish farms and they can then start fixing the climate change as requested.
Re: Re: Re:22
The fish themselves aren’t capable of consciously helping with climate change. You didn’t understand the simile at all.
Re: Re: Re:23
We can just delegate our copyright issues to the fish. What better confirms our software legality than getting our copyright bullshit from the sea.
Re: Re: Re:24
Well, you seem to conjure your understanding of copyright out of nothing, rather than actual study of the law and case law, so sure, why not?
Re: Re: Re:25
Now you’re saying my experiences as an author are “nothing”. Can’t I base my copyright bullshit on my own experiences dealing with publishers? Things like companies insisting on exclusive licenses so that authors are forbidden from publishing the same material via other channels? And then publishers stopping selling the product after a month of sales activity?
Re: Re: Re:6
If you post stupid shit, I call you stupid. Don’t like it? Don’t post stupid shit.
Even leaving aside the still undecided question of whether or not training on licensed work infringes, that wouldn’t make any AI system “illegal” as the original comment suggested.
If you want to go down that path, you’re still wrong. An infringing use does not make an entire product illegal.
But also, no, even if one is using unlicensed materials, it’s difficult to see how either of those rights are impacted.
Also wrong. An alternative is that no copyright protected uses are implicated. No reproduction is made, and no derivative works are created. So fair use is a defense, but hardly the only one.
One ruling, generally seen as ridiculous among the copyright bar, and being appealed. Wouldn’t hang your hat on that just yet.
Re: Re: Re:7
It’s disgusting for you and Techdirt to keep the attitude of “steal it first, and ask for forgiveness later”. That’s what the Big Tech companies are thinking. They gamble on they have more money to win in courts than the authors that would sue them.
Look, I know of AI uses that could win the fair use arguments, but for many generic, big AI models, they probably won’t. There are already evidences that ChatGPT generate work that compete on the same market as the book authors, and that should give you a warning sign when you make your AI application based on that.
Say that to Napster and Grokster, please. You guys ignoring the important ruling of MGM v. Grokster make me feel you guys are intentionally deceiving the public.
Even when this is true, how can the AI companies defend for fair use on this?
AI pre-training involves reproduction. So the first argument is false for generative AI already. The next is fair use, which is the only defense for Meta for its Llama AI models, currently.
Re: Re: Re:8
Quote Mike where he’s advocated for this.
What about the Big Media companies you’ve admitted to allying with?
They do have more money than the authors that would sue them. And licensing works won’t pay out to smaller creators much at all. Lawyers are going to be the biggest winners of any lawsuit on this issue, regardless of who wins.
This contradicts your previous assertions.
Based on your flawed analysis…
[citation needed]
Define LLMs in such a manner that they are comparable to a peer-to-peer network.
You don’t understand legal strategies.
Not in the model, not in the results. You’re demonstrating, again, that you don’t understand the technology.
Re: Re: Re:9
Youtube video “How To Create And Sell E-books Using ChatGPT | How TO Earn Money Using ChatGPT”
Page 28 of the USCO report part 3:
‘As discussed in the Technological Background, the extent to which models memorize training examples is disputed. When, however, a specific model can generate verbatim or
substantially similar copies of a training example, without that expression being provided
externally in the form of a prompt or other input, it must exist in some form in the model’s
weights.
When a model takes the prompt “Ann Graham Lotz” and outputs an image that is
nearly identical to a portrait found in the training data, the expression in that image clearly
comes from the model.’
(Emphasis added)
Refute this one, please, and stop saying bullshit.
Re: Re: Re:10
You cited a video creator claiming how to tell people how to do something. That doesn’t prove that it can actually compete or that human audiences will pay for it in favor over human-authored works. Notably, the video demonstrates the limitation of ChatGPT to produce a large amount of content in a single prompt. The video creator asked for a “book” that was 8 pages long. I googled more examples and found someone who spent 3 hours coming up with a “book” that was 8000 words long. You’re refuting your own arguments with this citation. And that’s not even addressing issues of quality in the rendered content.
As I’ve said, I’ve asked LLMs to compete with my own works and it can’t. It writes like a child repeating concepts it’s read but not understood.
This statement alone refutes your claim. It’s not definitively decided. It’s disputed.
This example was taken from research that hasn’t been reproduced by peers. And if you look at the details of the “research” they used an older model trained on very few relative images that included duplicates of the image that was supposedly (but imperfectly) reproduced. And they had to try millions of times to get something that looked close enough to make the claim. They were trying to achieve this result and they put biased efforts towards it. And this is a single example. If it were reproducing copyrighted works (and perfectly, which isn’t even being alleged), they’d have a million examples instead of one.
But you’re just quoting the copyright office report instead of having researched this particular example yourself before citing it, proving that you are only looking for claims you agree with rather than researching all the nuances of the topic. You’re not interested in learning or changing your mind. You’re just spewing your magical thinking and pretending like your confidence is a sufficient substitute for knowledge or morality.
Re: Re: Re:11
Yes. And the only reason that humans would buy AI generated works is when they can’t tell it’s AI generated. In other words, when the AI can deceive human audience. An also in another words, Turing Test.
So your idea would be to let AI flood the book market with AI generated “slop” and force the potential human buyers to participate this giant Turing Test, which I haven’t even argued it’s all ethical to being with. (An ethical Turing Test requires informed consent, that human participants are aware they are being tested and the content they see during the test may be AI made.)
And that supports my position that someone can make a book very quickly with AI, which in turn competes with the book authors that the AI was being trained with.
It doesn’t have to be a single prompt, but the fact the people can sell AI generated books is sufficient for this claim.
One in a million chance is greater than zero, and that’s sufficient for the claim.
A purely coincidential resemblance of a copyrighted work would be less than a quintillionth chance to make it. By quintillionth I mean 2^(-64). Or even less, because modern cryptographic hashes has been a least 160 bits long. And the chance of a random monkey making a Shakespeare chapter is much less than that (taking the infinite monkey theorem into account).
Infringement doesn’t require perfect reproduction. Imperfect copies can also constitute infringement.
You are the one that should cite the counter-claim, not me. You are more professional than the USCO then show us your papers, then.
Re: Re: Re:12
So advocate for laws that require LLM-generated content to be labeled. I’m fine with that.
That doesn’t require outlawing LLMs by expanding copyright.
You’re generalizing here. The Turing Test involves a machine convincing a human that it’s also human through direct interaction. The human has the opportunity to interrogate the machine. Just reading text generated by an LLM doesn’t qualify as the Turing Test.
No, not at all. I have not advocated for this at all. Again, QUOTE ME WHERE I SAID THAT! I said you’re just wrong about what you claim your citation proves. I didn’t advocate for anything in refuting your bad example.
Humans aren’t being tested in the Turing Test. You continue to show your lack of understanding for nuance.
Except it explicitly doesn’t compete with the books that the LLM was trained on. The LLM is trained on a wide variety of books. If an LLM is trained on a fantasy novel but is prompted to write a self-help book, the result doesn’t compete at all with the fantasy novel. And, here’s the kicker: it doesn’t even compete with other self-help books.
You’re A) conflating training with copyrighted content by the developer with the results prompted by a user, which are two different things B) you don’t seem to understand what compete means. You seem to think offering a product in the same general market as another product is competition but that’s not it.
I can ask Christie’s or Sotheby’s to offer a child’s five second crayon drawing in the same auction as a $500 million Picasso, but the child’s work doesn’t compete with the Picasso. The audience is different. The ability to fetch a similar price is different. The ability to affect the market for the Picasso is non-existent.
You’d have to prove that humans knew it was LLM-generated, that they wanted to purchase LLM-generated content, and that it replaced their interest in buying a human-authored work for it to actually compete.
What you’re also missing is that LLM-generated content isn’t copyrightable under US copyright law, so the “market” for the content goes away when one person purchases it and then legally releases it for free online.
That’s a function of what a publishing platform allows, not a legal claim. You should take it up with Amazon since the video referenced the KDP platform.
It’s not just one in a million. It’s one in a million attempts to get that specific content from an unused model disproportionately trained on the targeted content. You’d have to argue that people who could otherwise do a google search for the “copied” content and find it easily would rather intentionally download an obsolete LLM model and spend hours and hours trying to get a reproduction of a widely available image that you can download easily from the internet for free.
You’re hanging your entire argument on the possibility that some people will waste their lives trying to get LLMs to reproduce already available content that isn’t even for sale and that will somehow lead to the downfall of human-authored content. That’s absurd.
[citation needed] More specifically, legal citation needed.
Not this again. The infinite monkey theorem is irrelevant here. In reality, a monkey doesn’t have infinite time to type. It’s not a realistic measure of anything. It would never produce an entire work of Shakespeare (you said “chapter,” but Shakespeare wrote plays divided into acts and wrote poems, and the theorem is about whole works, not subsections of works). That you even mention this proves, again, that you cannot be taken seriously. You’re a dilettante pretending to know something about this topic.
If it’s not discernible as a reproduction, it can’t constitute infringement.
You haven’t cited a credible source yet for your claims. I’m not going to do the legwork to prove you wrong when you haven’t even supported a claim.
Re: Re: Re:13
“China Releases New Labeling Requirements for AI-Generated Content”
https://www.insideprivacy.com/international/china/china-releases-new-labeling-requirements-for-ai-generated-content/
There is no outlawing. But training AI with copyrighted data is already illegal (most of the time). So it’s actually about enforcing existing copyright laws, not adding new ones.
And DMCA included. AI companies must not remove copyright management information (CMI) during training. (If keeping the CMI would make AI “regurgitates” lots of copyright information in outputs, that’s the AI’s problem, not the problem with copyright law.)
Does this even matter? Because the human “interrogators” aren’t told they are to interrogate, and the same ethical problem occurs. (By the way, if I can tell MrWilson is a machine or human when I’m not told that I am to interrogate in this Turing Test…)
Yes they are. https://en.wikipedia.org/wiki/Turing_test#Should_the_interrogator_know_about_the_computer?
If humans aren’t told they are interrogators, they become part of the test and it’s the ethical problem.
Bullshit. Unless the LLMs have significantly limited on the type of outputs, end users will go to write a fantasy novel with that.
A direct proof: https://www.reddit.com/r/GPT3/comments/zgg3y9/how_do_prompt_chatgpt_to_write_a_fantasy_novel/
To talk this in detail, it’s the precedent of the Grokster case (and perhaps Napster, too). When P2P has many legal, non-infringing uses, as long as (1) users primarily use it for infringement, and (2) the company making the technology benefits from illegal uses. Then the company is liable (contributory copyright infringement).
It’s the ruling, so don’t blame me.
See the contributory copyright infringement liability.
Warhol v. Goldsmith. Like it or not. I don’t want to argue on this anymore.
Bad analogy. A better one would be: An imitation artist at an auction of Hayao Miyazaki who says he can replicate someone’s portrait of photo “in the style of Miyazaki” and collects money out of those imitations.
And I use Miyazaki instead of Picasso because Picasso’s copyrights have long been expired, making all infringement claims invalid and “fair use” arguments moot.
No. My assumption was humans don’t know they are LLM-generated when they purchase. So no “[replacement of] interests in buying a human-authored work” nonsense.
You are thinking the factor 4 of fair use too narrowly. That factor was made to be broad, covering not only existing market but also potential markets that authors deserve exclusive rights on. And that’s how USCO mentioned the “market delusion” (that people called it controversial) and why that’s an important factor to consider.
Emphasis added. You showed your true colors! So you advocate piracy after all. All your arguments about “fair use” of AI are just decoys to cover your true intentions that is encouraging people to pirate.
Enough said.
Re: Re: Re:14
Wait. This post is mine (Explorer09). Due to some mistake in the comment system I posted it as an anonymous coward by accident.
Re: Re: Re:14
I meant in the US. We’re discussing US copyright law here. You and Terop are both non-Americans trying to lecture Americans about US copyright law. You’d think you could stay on that topic.
This hasn’t been determined at all.
You’re inventing a new right that hasn’t existed.
That’s not actually what the DMCA says.
Yes, you’re misunderstanding an unrelated computer science concept relating to machine intelligence while arguing about a copyright issue.
This isn’t always the case with the Turing test. Your citation literally states: “Turing never makes clear whether the interrogator in his tests is aware that one of the participants is a computer.”
This isn’t a complete or coherent sentence.
Per that article: “The Turing test, originally called the imitation game by Alan Turing in 1949,[2] is a test of a machine’s ability to exhibit intelligent behaviour equivalent to that of a human. In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine.”
Humans are the interrogators and evaluators. They are not the test subject. A human cannot pass or fail a Turing test. You don’t understand what you’re talking about. You even provide citations that prove you don’t understand.
You mean the LLM will write the fantasy novel. If the user writes the fantasy novel, then it’s a non-issue.
That’s not proof. That’s people looking to do something they can’t already do and discussing failed strategies. And if you read the comments, they admit it’s not actually doing what you claim. They’re pointing out that it’s not capable of what you claim.
“I have been able to get chatGPT to write 6 or 7 mediocre chapters of a teen novel.”
“It doesn’t have the memory to create an entire novel.”
“Of course I had to read each chapter to make sure they were good (and retry when it was not)”
“keep it in a notepad that way you can refer to it when you need it and be able to paste parts of it or the entirety of it in AI just in case it forgets somehow the basic framework will remain in its memory, and it will utilize that to keep the alignment of the story”
“AI can’t do that yet. It doesn’t have enough memory in its head to understand relationships between text that is far apart from each other.”
You would have to prove that most uses of LLMs are primarily for infringement for this argument to work. I look forward to seeing your proof.
I’ll blame you for misunderstanding the ruling and how it applies to LLM training or use.
Provide a citation of law or case law that explicitly states that using an LLM is an infringment of a copyright.
You shouldn’t argue on this anymore. You continue to be wrong. That case is irrelevant to the argument.
Except my analogy doesn’t assume as you do that the intended use is copyright violation. You have a bias that assumes that’s the primary use. You are wrong. Search for all the articles about the tips people are giving each other about using ChatGPT or Claude. They’re all saying stuff like, “here’s how to be more productive by having ChatGPT write you a task list,” “use these five prompts to improve your self-care regimen,” “Seven prompts to make you more productive at work.”
To reiterate for the seemingly thousandth time—We are discussing US copyright law. In the US, most of Picasso’s works are protected under copyright until 2043, which is 70 years after his death. You could have searched for this fact easily, but you chose not to. This is why you are useless on this topic. You have a world of data available to you and you choose not to fact check your own incorrect assumptions, making all your claims invalid and your arguments moot.
If humans don’t know they’re purchasing LLM-generated content, then the person claiming human authorship and copyright on the content is making a false copyright claim. That would make the competition argument moot.
As the article you didn’t read notes, the US Copyright Office’s opinion is non-binding and isn’t law. Find case law that actually supports your position or else your assertions are moot. Your characterization of any aspect of US copyright law isn’t reliable.
No, you dumb fuck. You are so dense. You don’t understand the reference you made.
You quoted the US Copyright Office: “When a model takes the prompt “Ann Graham Lotz”…”
That image to which they referred is released under a permissive license. It’s not piracy to find and download it.
https://commons.wikimedia.org/wiki/File:Anne_Graham_Lotz_(October_2008).jpg
“This work is free and may be used by anyone for any purpose.”
And beyond that permissive license, it’s fair use to download the photo for personal use anyway, so the license itself isn’t even the only permissiveness. Copyright law itself allows such use.
You thought you had a gotcha moment, but it ironically only proves your eagerness to make bad assumptions and your complete and utter ignorance of US copyright law.
You can’t pirate what’s actually released for free, you complete dumbass. That said, who the hell wants to “pirate” a random photo of Billy Graham’s daughter anyway?
I absolutely agree. You have said quite enough, and you’ve been proven wrong multiple times, yet you keep spewing more laughable bullshit.
Re: Re: Re:15
Well, that’s the contributory copyright infringement liability is about (plus vicarious infringement)! If your damn company benefitted from the copyright infringements done by end users, why the fuck can you not be liable? Even when it’s not the primary use of the tech.
(See also: Types of secondary copyright infringements as listed by Copyright Alliance
https://copyrightalliance.org/education/copyright-law-explained/copyright-infringement/secondary-copyright-infringement/)
MGM v. Grokster. As Grokster had ruled to be liable, you can’t shield ChatGPT or whatever generative AI there from this liability.
I’m ignoring this argument because it doesn’t tell how it is necessary to train the AI model with copyrighted content in the first place. This is distraction.
What about countries whose copyright laws set the lifespan to be 50 years after death (Berne Convention)?
If you’re advocating for shorter lifespan of copyright, you shouldn’t bring this straw man, as you would contradict yourself.
No it doesn’t. One of the purpose of copyright is to protect the market from fake imitations. Even when the buyer may know they are fake, because the fake painting or books or music may be sold in a much cheaper price than the genuine one. Creating a huge temptation for buyers to look away from genuine goods. And that is the market competition in the factor 4.
Imagine this: You go to a video game store and look for a Nintendo Switch 2 Pro Controller to buy, and there is a illegal clone (from an unnamed manufacturer) with same boxing and priced US$10 cheaper (say, US$75 rather than US$85). Would you not be tempted to buy that cheaper knockoff?
You cannot cite case laws that support your fair use claims either, why the hell should I listen?
Or should I wait until one of the AI copyright cases gets appealed to the Supreme Court and see who ultimately wins?
And you are fuckingly arguing the wrong thing! First, permissive licenses require attribution, and Stable Diffusion didn’t attribute the source or the original photographer. Second, people will be tempted to extract copyrighted materials from the AI, like it or not, even when it takes millions of attempts to do it (it’s relatively easy to automate generations of a million images these days). Third, you are assuming everything on the internet can be downloaded “for free” when you made that argument. That’s why it’s your true colors! You advocate piracy, period.
Don’t refute me on the third point. You just slipped it when you wrote the reply. Not my fault.
Re: Re: Re:16
This citation says it all. You’re getting your propaganda from Big Media corporations that make their wealth exploiting the creativity of other people.
ChatGPT isn’t peer-to-peer software.
So you assert that there’s a difference between training and use when you want to ignore an argument you can’t refute, but you don’t think there’s a difference when you want to conflate the two for a different argument about liability. Curious.
We’re talking about US copyright law. We are always talking about US copyright law. That’s what the article is about. That’s what every comment I’ve made is about. If you want to talk about the laws of other countries, you won’t be doing it with me. That’s goalpost shifting and employing non sequiturs.
You seem confused about the difference between stating a fact about the current state of the law and stating a wish that the law could be different. That’s not a contradiction. Unlike you, I don’t think my preference is magically the law because I really want it to be true.
Also, you don’t seem to understand what a straw man is. A straw man is arguing against a claim that the other person didn’t make. You did make the argument that Picasso’s works were no longer under US copyright, which was absolutely false.
That’s trademark, not copyright. Holy shit, you’re an idiot.
That’s a trademark violation, not a copyright violation.
No, I don’t buy consoles.
The major cases on the topic haven’t been decided. Of course that didn’t stop you from trying to cite undecided cases. I’m not intellectually dishonest like you that way.
Yes, you should. It won’t really matter though. As I’ve already predicted, big corporations will win, whether it’s Big Media or Big Tech. Some will get slightly wealthier. Others will get slightly poorer. But the little guy is the one who will suffer and lose out.
Not all permissive licenses require attribution. And not all attribution licenses can legally require attribution for all uses. If you download a CC-BY licensed song and play it on your own speakers, you don’t have to attribute it anywhere. That’s just a personal use. Regarding LLMs, if the training dataset isn’t published, which it isn’t, you can’t attribute it. The trained material isn’t in the resulting model, so there’s nothing to attribute in the result. Again, you don’t understand how the technology works.
Should they post the attribution internally in their offices on a bulletin board? On a post-it note next to their computer?
First, that’s not actually possible. Even this one example was super blurry and not an adequate substitute for the real image. But also, why would they waste so much time and effort to try to use an LLM to replicate copyrighted materials when they could just find it somewhere else and in some cases, legally for free? That argument doesn’t make any sense. Again, you really don’t understand the technology or how people are actually using it. You’re just paranoid about a fever dream moral panic you’ve had.
No, I’m not assuming that at all and such an assumption wouldn’t serve my argument at all. I said that one particular picture which is the one particular example you provided referencing a single “study” that tried really hard to find that one particular picture is available for free on the internet so no one would even logically bother to try to reproduce it via an LLM (except in the scenario that you’re a researcher desperate to prove a point with deliberately manipulated results). You aren’t vetting the people or the content you’re cribbing your propaganda from, so you’re ignorant of the implications. If you think I’m referring to “everything on the internet,” then you need to prove that an LLM can actually reproduce “everything on the internet.” If it can’t, which it can’t, then your argument is pointless. Not everything can be downloaded for free, nor can everything (or even much of anything) be easily, reliably, authentically reproduced by an LLM. Again, again, you don’t understand the technology.
It’s not piracy to copy a free image in a legal manner, dumb shit. That’s the whole fucking point. You are so obtuse. That you think you’re scoring points by trying to accuse me of advocating for piracy really shows your “true colors.” You’ve already accused poor people of not existing and being freeloaders (positions which contradict each other). So you hate poor people. You admit to siding with wealthy corporations. You share propaganda from Big Media companies. Honestly, getting falsely accused of advocating for piracy is far better than what you actually are—a shill for the wealthy and powerful.
So you know your argument is bullshit and you don’t want me to explain why because you feel silly that you didn’t even understand the example you referenced. Well, too bad. As we’ve already established, you don’t get to tell me what I can and cannot say. We’ll add your ignorance of the 1st Amendment to the list of US laws you don’t understand, which already includes US copyright law, the DMCA, trademark law, et al.
Re: Re: Re:17
I disregard the “poor people” arguments (bullshits) because you cannot name any single person or organization that fits your definition. In other words the “poor people” you mentioned don’t exist. And how would it make sense to claim that I “hate” people who don’t exist at all.
I could compassionate poor people if you can name a single person or organization who is “poor”. Otherwise, all of these are bullshit to argue about.
Re: Re: Re:18
Every single (American) person who is not wealthy is on the list. As I have already said, listing actual names would take too long. Do you know an American citizen who isn’t wealthy? They’re on the list. I’m on the list. Everyone I know is on the list since I’m not friends with any billionaires or even millionaires. Anyone with a net worth less than a few million is on the list. 88% of the US population is a part of a household whose net worth is less than a million. That’s an easy enough line to draw. 88% of the US population is on the list. What do you think naming names will accomplish that this very broad inclusive statement doesn’t?
You’re just using a bullshit excuse to ignore the bulk of the American populace. Do you think there are only millionaires and billionaires in the US or is that just the only people you care about?
Also, bullshit is a mass or uncountable noun, so it’s just bullshit, not “bullshits.” If you add bullshit to bullshit, you just get more bullshit.
You being skeptical about the existence of poor people in the US is the weirdest made up hill to die on. That you can’t even accept that poor people exist shows how irrational your position is.
You apparently hate them so much that you deny their very existence despite the fact that you see them on the internet everyday and have the proof of their existence at your fingertips. You’ve been chatting with them on this website. Are you just talking to yourself here?
Let’s not pretend you have any compassion. If you can’t even imagine they exist without a citation as if you’re completely unaware of the state of the world, then you’ve decided on a weird, judgmental stance that deliberately erases them. Are there no poor people in your country? Is the concept of a poor person foreign to you? How wealthy are you?
Re: Re: Re:19
I only required you to list one name. So “too long” is an excuse.
Why this definition? Why is a household with net worth less an a million can’t even buy one work (music, book or movie)?
Bullshit. (This definition of “poor people” is made up and not reflecting the actual economic ability to purchase copyright ed works.)
Re: Re: Re:20
You keep pretending like not naming a person means that no poor people exist. You do realize that reality doesn’t change based on what a person argues, right? This is the most absurd demand. I’m not going to provide you with a list. I’ve given you an entire category of American citizens that consists of about 300 million people. Do a google search, find references to poor Americans, there’s your name. Look at any public voter roll data in the US. There’s about an 88% chance the names you find match my definition.
This has already been explained to you. You lost track of what we’re even discussing. I’m advocating for people who aren’t wealthy and powerful. Anyone who isn’t a millionaire, billionaire, or stockholder in a large amount of a big corporation qualifies. It also includes the vast majority of creators you pretend to care about, including me.
I didn’t say that they can’t buy media. You claimed poor people were freeloaders. I never said anything about that. They are the primary consumers of media. They pay a lot of money for their media. And you’ve been saying they don’t exist or they’re freeloaders and pirates.
And if you needed this explained to you again, then you clearly aren’t tracking the argument and your positions are irrelevant. You’re admitting you don’t know what we’re talking about.
Calling them poor was never about their ability to purchase copyrighted works. You made the claim that they didn’t. I never said that at all.
Re: Re: Re:21
I don’t care about your “reality”. I would disregard it as if the judge dismisses a claim due to lack of evidence. Even though this is a casual debate and not a court battle. I still expect the rule of whoever brings up a claim must supply it with evidence.
So, dismissed.
Re: Re: Re:22
I’m not referring to my “reality.” I’m referring the actual reality we both exist in.
Are you denying the existence of US citizens who are worth less than a million dollars? To deny the existence of the categorization of people I’ve provided, you’re tacitly saying you think every US citizen is at least a millionaire. That isn’t an intellectually honest position to take. You’re saying homeless people don’t exist.
If you want to pretend this is a debate, you can’t take absurd positions and expect any third party to take you seriously. You’re just using this as an excuse to ignore the fact that you completely misunderstood my argument and you’ve been fighting in favor of wealthy corporations against the very people you pretend you’re championing.
I provided better than names. I’ve cited the majority of US citizens. You’re pretending your demand for names supersedes the categorization I’ve provided. Providing one name isn’t greater than providing 300,000 million people.
Re: Re: Re:23
I don’t deny the existence of “US citizens who are worth less than a million dollars”. I deny you who claim to be representing them.
This is like a “class action” where you define your class improperly. I challenge your ability to represent the class you defined.
Re: Re: Re:24
(Oops. Another accidental comment as signed out where I didn’t mean that.)
Re: Re: Re:24
I didn’t claim to represent them. You’ve lost track of the discussion again.
I had stated: “It was to say that you are attacking big corporations but hurting innocent independent non-profits, researchers, students, and poor individuals. That’s the whole point here.”
To which you responded: “Please name an “innocent independent non-profit, researcher, student, or poor individual” you are talking about. Or is it just me that I sense no one but some “bad students” who just want to freeload and use ChatGPT to complete their homework, ignoring academic ethics?”
I don’t represent them. I’m challenging your assertions as a non-US citizen to lie about US law to deny those people their rights. You’ve created them as a class of people by saying they should lose out and wealthy corporations should benefit from their loss of rights.
Again, your inability to follow the discussion points is concerning on top of your proven ignorance of the law and the technology. You don’t even remember what you’ve said in this thread on top of arguing often with straw men I never uttered.
Re: Re: Re:25
Yes! Name someone who is hurt by me. Because it’s you that suggest that I hurt someone. You didn’t name any person (but rather a class that’s improperly defined), so I just disregard this claim. (Or should I say “dismiss”?) Anyway this claim is nothing to talk because you didn’t bring evidence.
Re: Re: Re:26
Bob Smith. There is at least one person in the US named Bob Smith who isn’t worth more than a million dollars. There you go. I named someone.
I look forward to seeing what new bullshit you will spin up now that we’ve gotten over the absurd barrier of your imagination to consider the existence of people without their names being mentioned.
Re: Re: Re:27
Which one? https://en.wikipedia.org/wiki/Robert_Smith_(disambiguation)
Re: Re: Re:28
Wikipedia is more likely to feature famous, wealthy, or deceased people. You should search a white pages listing instead. But also, pinpointing a single person seems really weird since the category includes 300 million other people. Are you planning to harass Bob or something?
Re: Re: Re:29
I want to you bring that Bob to show up on this website as a witness. So you can’t just frame a random Bob and make a claim that you represent him (or them).
Re: Re: Re:30
Again, I didn’t claim to represent them. You have claimed that. So you’re arguing with a straw man. And in fact, you’ve claimed to represent my interests since you said you were arguing on behalf of rightsholders and I’m a rightsholder. And I told you that you didn’t represent me. So not only is this straw man a confession of your own hypocrisy and stupidity, it’s just pointless, like most of your attempts at arguing.
This entire obstinacy in refusing to acknowledge the existence of poor people was just you waiting to assert that I didn’t represent people I didn’t purport to represent because you again didn’t understand what was being said. You create these weird barriers to your own understanding and spin out so much you don’t even remember what you’ve said, much less what I’ve said.
Re: Re: Re:13
Emphasis added. You showed your true colors! So you advocate piracy after all. All of the “fair use” claims about AI from you are just decoys to conceal this true intention of you!
Re: Re: Re:14
I love that you think this is an example of piracy and that pointing out that browsers have a download button in a menu is supposedly advocating for piracy. It says so much about you and nothing about me.
Re: Re: Re:5
Technically there is another element required, before you can conclude that AI practices are illegal, but it’s just the normal:
1) AI companies forgot to obtain licenses for the material they used for the training.
Re: Re: Re:2
First, not sure why you responded to me. I was talking to Explorer09.
Second, we still don’t care about your failed software here.
Re: Re: Re:3
That doesn’t mean that my refusal to implement AI does not have any effect on overall AI usage in western societies.
You still haven’t understood how important my “failed software” is to the overall software ecosystem. It’s an symbian phone area edge module, and as such plays pivotial role in protecting software vendors from entering area they cannot handle without the help of successful phone companies.
It is well known fact that symbian failed because developing software in the area was too difficult for open source developers. Thus only small number of 3rd party applications were developed for the phone platform and that has always been one element attributed to the failure.
Now that we built the edge modules designed to protect software vendors from entering the area which they cannot handle, your solution is to call it “failed software” and reject its purpose as an edge module.
We can easily call bullshit on your practices. There’s no reason to lower gameapi’s status from symbian edge module to failed software like you tried to do.
Re: Re: Re:4
Me: “We don’t care about X…”
You: “Let me tell you more about X…”
I call it failed software because if it was successful you wouldn’t be spamming unrelated posts about it. I don’t care about the technical details or the philosophy you have about it or whatever narrative you’ve told yourself for why it’s important.
The fuck are you talking about? You’re hallucinating worse than an LLM trained solely on crypto bro 4chan posts.
Re: Re: Re:5
Common theme in your bullshit seems to be that you simply do not care.
I think that’s significant problem. You really need to fix that problem. Sloppy practices where you simply do not care enough to arrive to work at correct time or care enough to actually create software that works properly… lazy and sloppy practices are consiquences for not caring…
Re: Re: Re:6
Yes! You’re finally catching on to the words that I am saying!
Hey, apparently I’m a software developer. Alright. Where’s the paycheck for that gig? Can I start putting that on my resume?
Re: Re: Re:7
This kinda explains why you’re against copyright. You have not created any copyrighted works worthy of copyright protection. Let me know if that’s wrong and I’ll upgrade your status from ignorant to something better.
Re: Re: Re:8
Being a software developer does not make a sufficient reason against copyright. In fact there are developers who are for copyright, too. I’m not saying this for myself, but for the plaintiffs in the Doe v. GitHub case.
In fact it was the open source developers who observed the copyright problems with generative AI (in particular GitHub Copilot, the AI code generator) and brought the lawsuit first! The key of making the open source communities thriving is copyright, that makes the “copyleft” clauses in open source licenses working, to protect the communities from corporate exploitation (e.g. incorporating open source to proprietary products without contributing back). @MrWilson seemed to have no idea about this.
The open source developers could have released everything to public domain, but they didn’t do it. Think of why.
Re: Re: Re:8
First, I’m not against copyright. I’m against copyright maximalism and greed and putting power and money into the hands of those who are already wealthy and powerful and who subvert or outright suppress human and civil rights. Copyright maximalists have used their vast wealth to undermine my democracy and weaken the influence of citizens in their own government.
If I had my way, copyright would only be owned by humans, not by corporations. It would only be licensable, not transferrable, so corporations wouldn’t be able to retain them after a creator dies. I’d also not have copyright last so long, definitely not after the death of the creator. Temporary copyright protection was meant to incentivize new works being created and dead people can’t create new works. Longer copyright made more sense when it took longer to distribute copyrighted works in a market.
Copyright isn’t granted on some subjective measure of worthiness. It’s granted on the nature of the work and the human authorship. This kind of statement proves, again, that you don’t understand US copyright law.
That said, I have created many works that are copyrighted. I’ve sold copyrighted works. I’ve released some copyrighted works under various licenses, including Creative Commons licenses. I’ve had my copyrights violated by large corporations and individual random people. I’ve had people sell my creations and made money off them. I’ve issued DMCA takedown notices for such violations. I’ve had my copyrighted works used in media and products.
You might even consider me a software developer, not in the proper sense of being a coder, but according to the copyright office, some of the works I develop are classified as software.
The estimation of a fool isn’t something I’m going to care about.
Re: Re: Re:9
And this is the ignorant part of you. When you argued about copyright and AI, you don’t really care about the creators whose works had been “stolen” (or scrapped) by AI even less than a month after the works have been published. You don’t give a sh-t about “temporary copyright protection” when you mentioned it. Even if we have copyright law that is non transferrable, and have shorter lifespan (say, 10 to 20 years), you would still try to greenlight AIs that steal works faster than that.
In other words, we know you are lying.
Re: Re: Re:10
Oh hey, you’re responding to me again. I feel honored! /s
You don’t get to tell me what I care about. I am a creator whose work has been used to train AI without my consent. You’ve been told this. That you pretend otherwise shows your ignorance. You just can’t fathom that someone might be in a similar situation and yet have a different perspective. The limitations of your understanding is your problem, not a deficiency on my part.
I wouldn’t have said it if I didn’t care about it. You seem to be missing the dominant theme in my interests in copyright as a topic or my perspective on LLM training.
Except if copyright duration were much shorter (you said 10 – 20, I didn’t), there would likely be less accommodation or need for fair use because works would come into the public domain so much faster. But this is all speculation. Getting all self-righteous about a hypothetical situation that isn’t likely to happen at all is very melodramatic. Clutch those hypothetical pearls!
You can’t know I’m lying or not if you don’t understand what I’m saying, which you’ve demonstrated multiple times, including this time. Also, who is we? You can only speak for yourself, and incoherently at that.
Re: Re: Re:11
I’m not replying to you. I’m making the post to let other people see you self-contradictory beliefs. Either you are ignorant or you have fooled yourself. Pick one.
As this quote tells everyone you don’t give a sh-t about the issue. Saying that your work had been used to train AI without consent and you don’t give a f-ck about your rights. And you just try to stop other people (creators) from z*exercising their rights** just because you don’t give a sh-t about them. How arrogant, and stupid as well.
There is no “different perspective”. You are just ignoring that it’s their right to not give consent on AI training. You are trying to enforce your thoughts on copyright to others. Who do you think you are when it comes to lawmaking? Copyright law exists, and IT’S THEIR RIGHTS, AND YOU SHOULD SHUT UP.
Copyright abolitionist. I get it. SO F-CK OFF AND GET OUT.
Re: Re: Re:12
Or I am a rightsholder and proved your claims to represent rightsholders is clearly incorrect. You have fooled yourself into believing that all authors, artists, designers, etc. will side with your preferred wealthy, abusive corporations who you admit are not good guys. You don’t have to pick one. You are ignorant and you have fooled yourself. And you’ve openly admitted to compromising to side with corporations over everyone else, to whom you have referred as freeloaders, even if you’ve not proven that they have ever committed copyright violations.
I’ve responded at length to a lot of your bullshit for someone who supposedly doesn’t give a shit about the issue. I can’t imagine another explanation for my motives.
I do care about my rights. They’ve been abused by the corporations you’re siding with. I’ve said this before. You’ve ignored it and pretended it doesn’t matter, as if maybe you don’t actually care about creators or their rights.
I’m not trying to stop anyone from doing anything. I offered a different perspective. You threw a bunch of assumptions on me and never understood what I was saying. You’re still making up new claims I haven’t stated or supported. You have a chronic issue with not reading what other people actually write.
Yes, you’ve just proven that “You just can’t fathom that someone might be in a similar situation and yet have a different perspective.”
That hasn’t be determined as a right in any court yet. That’s what this argument is partially about.
Last I checked, I don’t have the ability to control other people’s minds. I’m not trying to enforce anything. But that said, I am advocating, as I have already said, for the rights of poor people and researchers and students and others. You are trying to argue in favor of your copyright maximalism over their rights.
An American citizen with the right to vote for legislators who pass laws and officials who appoint judges who judge the laws. An American citizen with a right of redress from the US government. An American citizen whose rights (both copyrights and constitutional rights) have been violated by the corporations you have admitted to champion in this argument.
You are not an American as far as I can tell (and you’ve never denied my determination that you are not) and are trying to tell Americans how their own laws should and do work. You can opine all you want. I don’t mind you spewing your ignorance because I know I can prove you wrong, as I have.
No, it’s my rights too. It’s every American citizen’s rights that I’m advocating for. You’re apparently just advocating for copyright holders. But US law applies to us all. Legal precedents apply to us all (something I’ve had to explain to you multiple times).
The first amendment is also a law. Apparently the only US law you care about is copyright.
I didn’t say that at all. I literally provided a system by which copyright would still exist. But you ignore anything I say that doesn’t fit into your chosen straw man narrative.
No, you don’t, and that’s what’s so amusing. You seem to go to great lengths to not get it because, I’m guessing, your livelihood demands it.
You decided to come here to preach about your misguided views on copyright. If Mike wants me to leave, that’s his choice because it’s his site. That you’re pretending like you get to be the doorman for this website you don’t own or work for is also amusing and pathetic because you would be asserting control over Mike’s rights. Your assertion of caring about the rights of others is clearly bullshit.
Re: Re: Re:13
The right to reproduction and the right to derivative works. They are rights. All the arguments about AI training does not reproduce copies are bullshit, and USCO refuted on that one, too.
My position: (1) training data must be licensed, and (2) no copyright for AI generated works.
If I would support what you called “copyright maximalism”, I would let the big media companies copyright all AI generated things whatever they can. Why not? Why the copyright should be denied for AI generated works? This question is left for you to think. (Hint: Search for “Ban Corporate AI Profiteering” which was a campaign in 2023)
Re: Re: Re:14
The models don’t reproduce the training data.
The results aren’t derivative. They can’t be. If you put every famous physical artwork in a giant blender and made a collage out of random, pulped, minute little pieces (which isn’t even what an LLM does, but I’m being generous here with the metaphor), you couldn’t legally argue that the resulting collage was a derivative work of every single artwork that was pulped to make it.
Again, this demonstrates that you don’t understand how the technology works.
The LLM learned from several different texts that humans often say certain phrases. When it reproduces those phrases, it’s not copying any particular text but rather demonstrating what it learned about how humans frequently write text. If it randomly started quoting whole paragraphs from a text without being prompted to directly reference the source text, you might have a point, but that isn’t how they work.
They are indeed rights, just not rights that are legally assertable in this scenario.
Except it didn’t. You shouldn’t probably read the article since it refutes this claim. The fact that you’re saying this proves, again, that you don’t actually read what you’re claiming to disagree with.
Except you have no case law that backs up this position, just wishful thinking based on your interest in maximizing profits for copyright holders, primarily the large media corporations you have admitted to siding with.
This is already law.
Not necessarily. Big media might actually like AI generated content to not be copyrighted because they wouldn’t have to license it to use it in their own copyrighted works. AI generated art not being able to be copyrighted means it can be integrated and everyone else would have to only extract the AI generated parts but couldn’t legally use the copyrighted parts that aren’t AI generated, which can be difficult in certain forms of media.
You seem to have a lot of assumptions that you haven’t thought through much.
Why did you start arguing for a point that wasn’t being argued? You just brought this up out of nowhere. Where had you or I discussed AI content being granted copyrighted status before? This is a literal non sequitur. It doesn’t not follow anything we’ve discussed up until now. It is further evidence that you are arguing with straw men.
Re: Re: Re:15
Unfortunately yes. It IS derivative. It is the derivative of “every single work pulped to make it” as you say. At least that’s what the copyright holders alleged.
You really don’t have the idea of how “derivative works” in copyright law works (no pun intended). That’s why DJs mashing many popular songs together would get legal trouble if they don’t seek bulk licenses first.
And this derivative work rule is also critical of how open source licenses like GPL can enforce their “copyleft”. Because large software projects like Linux Kernel is technically a large “collage” of many developers’ contributions, each being small commits until everything is merged together.
No no no. The Big Media can get away with AI copyright pretty easily by making their own AI models with their own IPs. They just cannot exploit other companies’ IPs with their AIs. And the motives for Big Media to use AI is not to “steal” (they hold big IPs already and don’t need other people’s IPs to profit). It’s to cut labor costs by firing minor creative workers in the process.
I don’t see any problem with this.
Because you are assuming I am a “copyright maximalist” and I am saying I’m not, and this is one of the reasons.
You are making a wrong assumption that Big Media would want AI generated works to be copyright denied. The reality is actually different, as I mentioned above.
Re: Re: Re:16
That’s not what the law or a court would determine though. Copyright holders aren’t legislators or judges, fortunately, despite their past successes in influences laws for their benefit over the rights of citizens.
I do actually. That’s why I made that hypothetical example. If you can’t even tell what works the parts are from, it can’t be derivative because you must have identifiable copyrightable elements to have a derivative work.
Another example because you’re being obtuse:
If you bought a bunch of books and cut out singular words from each book and pasted them together into a completely different work that said completely different things than the books that the words came from, it would not be derivative because copyright doesn’t protect individual words but rather the expression that consists of more than just singular linguistic parts. If that resulting work qualified as a derivative work, then all written works would be derivative because people learn to write from reading other people’s work. We learn to speak by repeating what other people say. We repeat the phrases that our parents spoke when we were learning to speak.
You don’t understand derivative works or US copyright law in general.
Not all mashups are considered derivative works nor do they all require licensing. There’s nuance that you’re ignoring here.
Open source software is typically licensed such that derivative works are authorized with specific stipulations, such as the resulting work being similarly licensed. This doesn’t refute anything I’ve said or support any point you’re trying to make.
You’re not refuting what I said. You’re just reasserting your own preference.
These two sentences contradict each other. Big Media companies exist by exploiting the creative works of others. They do need other people’s IPs to profit, specifically the IPs they get assigned to them from the actual creators.
But you also don’t see why this would allow Big Media to benefit more than others.
Because you are assuming I am a “copyright maximalist” and I am saying I’m not, and this is one of the reasons.
I’m not assuming. You’ve claimed poor people who haven’t been proven to violate any copyright are freeloaders. You’ve actively advocated for copyright maximalist corporations to be able to exploit poor people. Whether you want to call it that because it sounds “bad” is your issue. You are a copyright maximalist based on the position you’ve taken and the side you’ve chosen. That’s like saying you’re not a fascist but you’ve sided with Nazis. What you call yourself doesn’t matter. Your choices determine what you are.
You’re just speculating randomly based on your own ignorance. That’s not reality. That’s your fantasy.
Re: Re: Re:17
That’s the difficulty of proof on the plaintiffs, but they doesn’t mean it’s no t infringing. And that’s why in the UK the House of Lords is now making a bill that forces transparency on AI companies to disclose all copyrighted data during training.
And I would no longer reply with bullshits about AI is not infringement unless the copyright holders can proof it.
Bullshit. The words are not copyrightable but the specific arrangements of words form creative expressions that are copyrightable! And when the AI “learns” from those specific arrangements it copies the protected expression.
USCO report, pp. 47-48:
Refute this one please.
MrWilson
I disagree. That’s your claim, not mine. Even when Big Media companies do exploit, it’s irrelevant to the AI companies who are alleged to “steal” works.
Technically the IPs produced by their own employees, yes. But not “people from other companies”. Trying to argue the definition of “other people” wasn’t helpful.
F-ck you as I’ve said there is no “poor people”! This argument is moot and useless.
Re: Re: Re:18
You ignored what I said. If you can’t tell what the content is from, the plaintiff’s wouldn’t even think to sue. You’re assuming omniscience on the part of the copyright holders.
We’re strictly discussing US copyright law here. We shrugged off British law a few hundred years ago. Do you need a citation for that?
You should no longer reply with “bullshits” either way.
Yes, exactly. And I said in the hypothetical example that only singular words were used. So you agree that my example wouldn’t be copyright infringement or a derivative work.
Except it doesn’t. It learns how to compose sentences based on weights and tokens, not the actual text itself. You are, again, demonstrating you don’t understand how the technology works.
Note that this wasn’t my argument that the Copyright Office is arguing against, so citing it as a refutation of what I said is irrelevant. I didn’t claim it was or wasn’t expressive. This is a non sequitur.
Note here that their argument is couched in the claim that the result is expressive, therefore the training must be expressive, but those are two different processes. Reading a book and writing a book are two significantly different processes.
I did, and with gusto!
It’s not irrelevant to the people who are exploited. That you acknowledge that you don’t care about people being exploited by big corporations, your claimed motive to protect authors is bullshit. They are the people getting exploited. That you can compartmentalize your morality this way reveals your hypocrisy.
No, fuck no. You’re a fucking idiot. You are admitting here that you don’t understand US copyright law or the nature of publishing. Authors aren’t typically an employee of a publisher. If they are, it’s usually in a different job with a different role. Authors are typically freelancers who may, but not always, assign their copyrights to the publisher in a contract negotation for publishing. An employee in US law is a specific classification whose work for an employer would usually fall under “work for hire,” but that is not always the nature of publishing. That you are ignorant of this nuance further indicates how useless all your arguments are. You don’t know what you’re talking about. You’re trying to tell Americans how their laws should or do work. And you have the gall to tell me to shut up and get out?
Apparently it wasn’t helpful because you’re an idiot. There’s a very important distinction there.
It’s not moot to point out that you are pretending the bulk of the US population doesn’t exist. It’s absurd that you can just declare this like it’s true. If it’s true, I don’t exist. Who are you even talking to?
Re: Re: Re:9
Yeah, but when you hear the real money amount that authors are getting, your response is to claim that the product wasn’t good enough. The time was already spent creating the product and you only have two responses: either we’re greed or our products are not good enough for the marketplace.
Basically you can’t have it both ways. Either authors are getting the money that they deserve or they don’t get it, but you can’t go both ways.
Re: Re: Re:10
No? No, it’s not. Why are you trying to tell me what I think when I’ve uttered no such statement? You know you don’t need to post here if you’re just going to make up fake positions to fight with. You can do that on your own offline.
Copyright doesn’t hang on how much time is spent creating content.
No, this is a false dilemma. There are other responses. Not all creators are greedy. And some high quality creations don’t become successful for reasons other than their quality. Sometimes there’s a timing issue or a lack of marketing that comes into play. Sometimes a similar piece of content gets released slightly earlier that spoils the market for a particular work. That you know so little about the various factors that can affect the success of a work indicates you might not be the best person to opine about this topic, especially not so confidently.
Deserving is a moral argument, not a legal one. You’re conflating different scales here.
Nobody “deserves” anything. You create a work and offer it for sale or license it some other way and hope to benefit in various possible ways from its release and publication, not all of which are monetary in nature. And some creators get ripped off and some become successful and still get ripped off. And many large media corporations make a lot of money based on the efforts of creators.
You continue to demonstrate your lack of understanding of the topic at hand.
Re: Re: Re:11
It is very common pattern with companies that they actually pay people based on the time spent, instead of how much copyrighted works was created. So very common compensation for copyrighted works is based on the time measurement.
copyright is needed for different purpose. It protects the company from ripoffs. While the software is being developed, there is danger of unintended distribution of the source code and that allows the whole world to take the software and run with it, without passing compensation to the authors. Company still needs to pay based on time spent, but copyright’s value is zero when the product has been ripped off by pirates.
How is the company getting money to pay for the time?
Re: Re: Re:12
We’ve had this discussion before. You invented something that doesn’t exist in US copyright law. The length of time spent on a work doesn’t affect whether it can be protected by copyright. You’re conflating issues relating to paying employees hourly wages with copyright protections. They are different parts of US law. They are unrelated.
Re: Re: Re:13
Yes. Copyright is very much to do with two different aspects:
1) compensation
2) control of where and how the product is used
Re: Re: Re:14
Copyright has nothing to do with compensation directly. Compensation is possible through what rights copyright law provides, but some copyrighted content is released for free under permissive licenses that doesn’t involve compensation at all. There’s nothing in copyright law that speaks to hourly wages or compensating creators at particular rates. That you think it does is just more proof of your ignorance. You didn’t even know the public domain existed until I pointed it out.
Re: Re: Re:15
You have a time machine to the 1980s? That’s when I learned about public domain/free software/open source/shareware/freeware/proprietary/commercial etc licenses..
Re: Re: Re:16
If I had a time machine, I wouldn’t go back to the 1980s. I remember it well enough and there wasn’t much good there.
First, a lot has changed since the 1980s, including relating to case law and copyright legislation in the US, so if that’s the source of your information, it explains why your ideas are so wrong.
Second, you stated (incorrectly) on September 15th, 2024 that “everything created by mankind is covered by copyright.” This is factually incorrect. Public domain works are not covered by copyright. Claiming you know about the public domain yet not factoring it into your broad claims doesn’t prove you actually know or understand what the public domain is. It’s a giant amount of content. It’s the entirety of human creations in fixed mediums before the 20th century.
Re: Re: Re:17
You’re missing context here. Some idiot was claiming that all content items in the internet can be freely used without considering copyright at all, simply because internet publishes the material to everyone and their mother.
That simply wasn’t true, and the default operation for all copyrighted works is that you need to obtain license to use them in any way. Public domain is such a small piece of the pie that it can be ignored completely, and noone would want to use the content from 1930’s anyway. They didn’t even have jpg standard back then.
Re: Re: Re:18
You’re missing the context. You made that statement in reply to me and I wasn’t making any such claim. I was pointing out that LLMs can be trained on content that isn’t covered by copyright and you falsely claimed that all content is covered by copyright and then I pointed out that the public domain exists.
Why are you lying about the record that we can just go back and look at?
No, that’s not the case at all. There are some legal uses that don’t require obtaining a license or asking for permission. This has been explained to you. And when you’re pressed on the letter of US copyright law and related case law, you have admitted you don’t study it.
Plenty of people use works that are in the public domain every day. It’s not all that small. The internet archive, Google Books, libraries, and other websites are full of such works. One group actually trained an LLM on public domain works from the Library of Congress.
We’re mostly talking about text, but plenty of LLMs are also trained on public domain works as well as copyrighted works. You can make a digital image of an old photograph (and it doesn’t qualify for copyright if it’s just a reproduction). You’re just hand-waving away points that prove you wrong without actually refuting them. You’re also significantly wrong about content from the 1930s. Plenty of people are interested in it. Also, public domain is more than just works from the 1930s. There are some works from between the 1930s and 1978 that also in the public domain since they failed to renew licenses or didn’t include copyright declarations as previously required.
Re: Re: Re:19
Thats only because usa didn’t sign berne convention rules until 1988… under berne, copyright is automatic and does not require registration…
Re: Re: Re:20
It’s still a fact that you weren’t aware of or at best disregarded in your false claims.
Re: Re: Re:21
only a sith deals with absolutes like that. Its not always black and white, when there’s levels of gray and all the vibrant colours too.
Re: Re: Re:22
Only an ignorant liar tries to cover up their ignorance with lies and lazy logic.
Also, you’re using the Star Wars quote out of context for your own benefit, which is something a Sith would do.
Re: Re: Re:23
You simply don’t understand it. I’m an expert at examining colours, since I was responsible of making sure phone screens had exactly correctly coloured pixels. Try count how many pixels I’ve examined to arrive at correct software to display accurate pixel colours.
It’s even so bad, that an adverticement that didn’t use rgba colour palette caused significant problems/headache.
Re: Re: Re:24
“Good luck. You’re gonna need it.”
Re: Re: Re:25
Gambling is forbidden for all the companies like wikipedia, game developers with lootboxes etc, since only veikkaus in finland is able to setup gambling.
Obviously wikipedia’s jimmy wales begging for money money collection violated finnish law about gambling.
Re: Re: Re:5
4chan is significantly better platform than whatever you’re using, at least they don’t put delays to messages for no good reason.
Re: Re: Re:6
If trolls like you are getting caught up in the spam filter, it seems to be working for a very good reason and quite effectively.
Re: Re: Re:7
Well, you’re the one waiting for reply when that happens.
Re: Re: Re:8
I don’t actually value what you say. I don’t care if I have to wait my entire lifetime for a nonsensical response.
Re: Re: Re:9
So this went again to the “lets look at who wrote the message instead of analyzing the message content”.
Re: Re: Re:10
If this were the first time you commented, you might have some benefit of the doubt left, but it doesn’t take a lot of pattern recognition from all your comments to guess that everything you say next will also be uneducated bullshit.
Re: Re: Re:11
You forgot to factor the possibility that all the bullshit you read is actually accurate and important copyright messages in our part of the world.
I know it is difficult in usa to understand how the world works, when you’re used to looking at your flagpole and singing national anthems, but maybe next time you will actually read the bullshit.
Re: Re: Re:12
The article is about US copyright law. We have only ever been talking about US copyright law. The article was written to address the US Copyright Office commenting on US copyright law. This has never been about copyright in any other part of the world. You came here and chose to freely engage in a discussion about US copyright law. You don’t get to pretend we’ve been talking about anything else.
I love this absurd generalization. I know it’s difficult for you to understand how people in other countries work, but we’re not the stereotypes you see in movies. I haven’t sung the national anthem in 30 years. But the greater point here is that you’re completely ignorant of what you’re talking about, again.
Re: Re: Re:13
I know. You’re actually artificial intelligence robots that are soon taking over the world, if we cannot stop the robots that you sent back in time via these devices that can cut through steel and it has nice rotating sphere that cuts its way through time.
Re: Re: Re:14
The most pathetic aspect of this fantasy is that you’re not important enough that any machine would want to come back to assassinate you. You’re Willy Loman, not JFK.
Re: Re: Re:15
I’m not important enough? I think you are gravely mistaken. Without my work, there wouldn’t exist over 150 million symbian phones in the auropean market. And my 3d engine is used/downloaded by 650 lucky people.
Re: Re: Re:5
Its not unrelated. I spent significant amount of time (and time is money) for developing a system for my software to avoid liability in copyright area. Things like not allowing pirate material to ruin the products created with my software, or asking for license information for each assets before publishing them to the world…
Do you understand that your bullshit about how pirate practises are awesome way to create products for the world, with napster or limewire directly competing against my products with unfair and illegal business practices which violate copyright. You’re directly supporting criminals and criminal lifestyle.
Re: Re: Re:6
It’s unrelated because the article doesn’t mention your software and no one here cares about your personal agenda.
Quote me where I’ve said this.
Are you still using AOL? Napster and Limewire are decades old news now.
[citation needed] You are assigning concepts to me that I have never uttered. You seem to think that because I point out that you’re ignorant about how US copyright law actually works, that I must agree with everything you oppose. That is very lazy, simplistic thinking. If any of this were true, you could easily quote me where I have said such things.
Re: Re: Re:7
They mention my software every time they run an article about copyright (and ruin it with propaganda about how it’s legal to do copyright infringement)
Re: Re: Re:8
WTF are you talking about? I’ve never mentioned your software. I have no idea what your software is or what it does.
Another lie. We have never said that it’s “legal to do copyright infringement.”
Are you having a stroke? Do you need help? Otherwise, why lie?
Re: Re: Re:9
Hey terop:
I see that you posted a comment replying to this, also caught in the spam filter, where you used me calling out your lie that we mentioned your software, and used it as an excuse to promote your software. That’s just fucking spam, man.
It’s not getting through the filter.
Also: grow the fuck up.
Re: Re: Re:10
There are users screaming in your site about me not providing citations for the facts I’m posting, and when I finally post the urls, it gets filtered in the spam filter.
These users never saw the urls, since they’re newer than where the urls have been available.
But guess those citations and facts are not important. If that’s your decision, I can’t do anything but tell these people that the citations are not allowed in the site.
Maybe next time they wont request ability to check the facts we post…
Re: Re: Re:11
Your own software isn’t proof of any of your claims. That you think being asked for citations of US law or case law is an opportunity to spam links to your own software is entirely on your self-interest and stupidity.
Re: Re: Re:12
You’re wrong. My software is like a town library. It contains the information gathered in the last decade, stores it for future use. The claims are fundamentally based on the stuff that has happened in last decade, and thus the software source code should have entries proving all the claims i make. We saw how big success sites like wikipedia has become, and our software source code is our version of the information storage.
It is significant failure in your part that you still do not have access to our software module catalog. Its like rest of the world got youtube, but your internet is slow enough that you cannot access the videos. Its our history and contributions that are contained in our software and people who reject it, will lose the next chapter in our magnificient technology development.
We are soon taking control of the world, and you will be left behind, poor and miserable, when you failed to innovate and embrace emerging technologies. You’ve seen a glimpse of our technological lead, but it seems to be so advanced that you instantly rejected it. This is why you’ll be late, finding out the world changed under you when noone knew such changes are possible.
so long, and thanks for all the fish in the sea…
Re: Re: Re:13
Except we’re discussing US copyright law. Your software doesn’t dictate US copyright law nor is it US case law relating to copyright. You can claim your software contains the secrets of Roswell, New Mexico, the JFK assassination, and the location of alien enclaves in the deep ocean, but that doesn’t make it a factual source.
This is hilarious attempt at advertising. You’re trying to neg people to get them interested in your software that you admit people aren’t finding valuable enough to pay for.
First, who’s we? You’re pluralizing yourself again.
Second, you have no clue as to what technologies I embrace. It sounds like you’ve been left behind since you’re touting software that other websites have already been doing, such as display 3D models on a web page.
Third, this egomaniacal rant is in humorous juxtaposition to your earlier self-deprecating regarding the failure of your software.
It’s “So Long, and Thanks for All the Fish,” not “all the fish in the sea…” The dolphins didn’t eat all the fish in the sea before they left earth.
Re: Re: Re:14
US copyright law is based on the berne convention rules, so if I solely rely on that ruleset, the same stuff should work also in US soil. If not, then USA legal system is violating their contractual responsibilities.
Re: Re: Re:15
No, no it’s not. US copyright law recognizes the Berne Convention rules, but US copyright law is not based on the Berne Convention rules. US copyright law is more nuanced than the Berne Convention and the Berne Convention doesn’t cover everything in any of the signatory countries’ copyright laws. It’s just an agreement to respect copyright from other countries. It doesn’t cover things like the DMCA or US copyright case law or the vast majority of US copyright law. Your ignorance on this point is just one more black mark on your already disintegrated credibility.
No, that’s not how it works at all.
No, US copyright law recognizes the Berne Convention, but goes beyond that with its own greater nuance. You don’t understand the Berne Convention. It’s a superseding set of laws over every aspect of US copyright law. For one thing, the US would never agree to that if it was, nor would many of the other signatory countries. Berne only covers one minor aspect of copyright agreements between countries.
Re: Re: Re:16
Yeah, but you only have experience about berne rules since 1988, since usa was late in adopting the rules. We have experience with the rules for over 200 years, so our berne techniques are significantly more fine-tuned and we can find subtle details from them which are completely unheard in usa.
Re: Re: Re:17
Except we’re only talking about US copyright laws, not the Berne Convention. You can have more experience in underwater basket-weaving but if we’re not discussing that topic, it’s completely irrelevant and actually worse than useless when you pretend it substitutes for actual knowledge of the topic at hand.
Re: Re: Re:18
You should increase the scope of your rules to contain larger area of the world than your tiny usa pond. I’m thinking maybe berne rules are still not containing everything needed, when china, asia and africa rules should be included too. Staying in your local bubble is never too healthy.
Re: Re: Re:19
The laws of other countries don’t affect me in my own country. This article is about US copyright laws. It’s not at all relevant to address foreign laws here. If you don’t want to be limited to this particular topic, comment on a different article that is actually about the laws of other countries. And the US is rather large for being a “local bubble.” We have states that are bigger than some European countries. This is quite a desperate goalpost shifting.
Re: Re: Re:20
I don’t think there is any such articles available in techdirt. There’s too much american bullshit and pre-censorship of foreign articles that the site almost never posts articles about foreign countries. So if I post something, it needs to be about copyright issues, since copyright rules are the same for everyone on the western world.
Re: Re: Re:21
Sounds like you should find a different website that caters to your interests.
Except they aren’t.
Re: Re: Re:22
The other websites don’t have similar kind of copyright problems than what techdirt has. So I wouldn’t fit well to their reader groups.
It is important that my software had significant problems with the copyright area. First it was attempting to clone youtube’s main content catalog format, but failing to gain traction among users. Then the software allowed users to download content from the internet, and even displaying “Downloading..” as a progress indication, indicating to older people that something illegal is happening. Then it built it’s 3d model catalog as a combination of own software code, but using 3d models from sketchfab without displaying copyright or authorship information next to the model. Once that was fixed, it used artificial intelligence which is known to have copyright problems.
Stuff like this is breaking every software project on the planet. Including mine. I’ve been researching for solutions to most of the problems mentioned above, but not all the solutions found have been taken into usage, given that the solutions are regularly causing issues that end users are unable to solve.
Re: Re: Re:23
You don’t fit here at all.
Nope. Nobody cares about your software.
Re: Re: Re:24
This is where you’re going against established practices and common sense. Software is the magic enabler that solves every problem in today’s society, and I just happen to have the newest and most bleeding edge software available on the planet. But given that you did not even try to analyze what the software is doing, your analysis cannot be trusted and you’re always painting yourself to a corner.
Re: Re: Re:25
@terop, please stop advertising your on software as anything better than others’.
Re: Re: Re:26
Nope. Then I would be lying.
Re: Re: Re:27
Stronger claims require stronger evidences. Since you didn’t provide any evidence about your claim, that would be misleading advertising.
Re: Re: Re:28
My stronger claim is that “I actually have working software” instead of pile of bits that crashes all the time. As you know from software vendor histories, this is better than what software market as a whole can implement.
Re: Re: Re:29
There are other software vendors that ship non-crashing software. So your basic claim like that isn’t helpful.
Re: Re: Re:30
rust developers are claiming on the internet that status of the software industry is not handling security issues carefully enough and memory corruptions etc. So there are people on the internet that are claiming that your bullshit explodes to your face.
Re: Re: Re:4
I suspect you are referring to potential copyright infringement lawsuits. Well IANAL (I am not a lawyer), but it was indeed a legal danger zone if companies use AI code generators without lawyers’ reviews. When the AI “regurgitates” any known copyrighted code (that can be accused of copyright infringement), you won’t be able to know it right away.
Re: Re: Re:5
today’s news tidbit kinda explains why AI is dangerous.
https://www.axios.com/2025/06/11/disney-nbcu-midjourney-copyright
disney is suing AI company because copyrighted characters appear in the midjourney’s output…
Re: Re: Re:6
https://media.npr.org/assets/artslife/movies/misc/midjourney.pdf
If disney is going to attack midjourney with this level of big guns, midjourney has no fucking chance. No amount of fair use is going to save them. This is more than billion dollar lawsuit, and midjourney’s technology is going to be so fucking dead that we’re going to be digging bones in graveyard if we want to see the technology in next 50 years.
Re: Re: Re:7
now we got what real lawyers think of disney vs midjourney lawsuit:
https://www.youtube.com/watch?v=zpcWv1lHU6I
Re: Re: Re:8
Now claude is in big trouble after pirating the training data:
https://aifray.com/claude-ai-maker-anthropic-bags-key-fair-use-win-for-ai-platforms-but-faces-trial-over-damages-for-millions-of-pirated-works/
Re: Re: Re:9
@terop
Regarding the Bartz v. Anthropic summary judgements, the opinion I saw are mixed. In particular, creators are not happy.
The only good side of this judgement is that piracy is likely a game over for AI companies now. (I’m talking about Meta and OpenAI, too.)
I have >50% confidence that the fair use judgement for the case will apply for a appeal. Because this “training is fair use so long as you legally acquired a copy” would mean a greenlight to OpenAI and Google scraping billions of web pages simply because they’re available gratis (for free). This is a terrible precedent for e.g. news companies that publish content mostly on the internet, and independent bloggers and writers.
Re: Re: Re:10
It seems that the “fair use” solution that the companies trusted to fix the copyright issue is not actually helping when the use is relying on pirated source material. Basically pirated material can have issues like drm getting removed or formatshifted from proprietary file formats to the more commonly pirated formats like mp3, mp4, png/jpeg files. This makes pirated versions more convinient for the AI companies, but the legal paperwork was very clear that convinience of the company is not acceptable reason to use pirated source materials.
Re: Re: Re:10
Or maybe they never had a right to demand a license for training and they’re not losing anything. Why is training magically not fair use when other transformative uses are? If the material is obtained legally, why isn’t the use legal?
“You can read this legal freely available material, but a machine can’t…except a web browser…and a search algorithm…and the internet archive…and a web scraper…and…”
Re: Re: Re:11
fair use should be limited to sentences of size 6 words or smaller. Currently they’re asking fair use to apply to terabytes of data, and they’re not considering the work amounts that went into collecting those databases(much less creating the material from scratch). If companies paid proper money amounts for the data, the AI databases would cost significant amount of money, millions of dollars.
I think it’s about the sheer amount of data used. Large collections of data has generally been illegal, given that no-one is able to obtain licenses for all the data in the collection, since the mere negotiation process for millions of content items is too burdensome. But copyright law has generally solved it by insisting that the size of the data amount is reduced to small enough amount that the proper license acquire is possible. The license acquire is still significantly easier process than creating the same material from scratch. Why should your company get access to huge database of data, when the same data is unavailable for use for everyone else who follows copyright law?
It’s only because of computers allow large data collections to exist, that it has been significant problem recarding copyrights. When books were manually copied with printing press or ink based pens, getting a license was minor issue compared to the overall work amount involved.
Basically none of the AI companies executed the proper process of dividing the data to small pieces and obtaining separate license for each piece from its author. They think it’s too burdensome, but copyright law thinks that they should not use that much data, since creating it from scratch is also burdensome.
When we learned copyright law, the conclusion was that everything in internet is illegal to use in your own product. There simply wasn’t licenses available for the data. Author names were missing and contacting authors via email turned out to be impossible since everyone is trying to avoid spam. I.e. internet had huge collections of data available, but all of it was inaccessible when you wanted to follow the proper copyright process.
Now AI companies are trying to solve it the same way as how pirates collected their movie/song/software/game collections. This is the wrong way. They should develop technologies that use less data. Make their AI algorithms work with smaller datasets.
This is what I’m doing with my software. I only rely on small amount of data for developing my 3d model technologies. Large amount of data is copyright-dangerous, and it also requires more compute-time to analyze and utilize. Even normal 3d models are large enough that GPU cards struggle rendering all the data passed to the hardware. If handling the data takes long time, there’s no reason to collect such large databases.
Re: Re: Re:12
@terop
In the U.S. the “fair use” in copyright law is ruled by the court in a case-by-case basis. Rather than listing which particular cases are fair use, the statute mandates four consideration factors (17 U.S. Code § 107). The judges will evaluate the four factors of fair use separately and then combine the factors together for the overall conclusion. The judges will also reference precedents so that similar cases would evaluate fair use in similar way.
The sad fact is there was a case nicknamed “Google Books” (Authors Guild v. Google) that had ruled fair use even when Google scraped terabytes of data. It’s a book search and indexing engine, and the courts gave that fair use. So it isn’t about the amount of data scraped. Even terabytes can be fair for a search engine.
And yes this is why the AI companies try to lobby and try to gain fair use for everything they scraped. (They had fair use for search engines and are trying to push that for generative AI.)
Good point. And this is why the recent Anthropic case the judge denied fair use on pirated books (I totally agree on this part despite the rest of the rulings are significantly flawed.)
Note. In the case of book search engines, creating data from scratch won’t make sense. There are also another case (sorry I can’t find a case law for this) of a plagiarism detector when the machine needs to keep the full copy of the books so that it can used to find plagiarism on users’ inputs.
That is partly true. Most contents published on the internet are not allowed for commercial reuse. But there is a subset of data that comes with explicit licenses such as Creative Commons that would permit you to use it without contacting the author. (I would argue that, with proper attributions, AI can be trained with Creative Commons licensed works. It’s just that we didn’t see AI companies attribute the sources when they train AIs.)
Or in the alternative, obtain licenses for all the datasets. This is how large, open-source software (such as Linux) thrives.
Re: Re: Re:13
I think what separates google books from artificial intelligence, is that google books only wanted to utilize the “captions” from the data, not the full text of the book. They published search snippets, which were intentionally restricted to small piece of the text, and never couldn’t contain large section of text from the book. The full power only came to the fact that it could index multiple books.
Artificial intelligence is different. They use full text of the book, for it’s core creative aspect of the book. AI’s trick is to try to “obfuscate” the source of the material, and thus they’re unable to collect a list of works the end result has been created from. AI is not creating direct quotes of the text, but they run the data trough an obfuscation service. It’s similar to how criminals hide their crypto money track by running the money through coin mixers.
The coin mixers are clearly declared illegal on money area for helping criminals do money laundering, so if we consider copyrighted subject matter as a form of money, we must consider AI practices illegal too.
Re: Re: Re:14
@terop You didn’t read the case of Google Books and made the wrong assumption. Google did index the full content of the books.
And as Judge Chhabria had ruled, you need to point out evidence that generative AI “obfuscated” the sources before your infringement claim works. Note that it’s not I like AI, it’s that the infringement claim needs stronger evidences in order to work. And hell, I know data laundering is a serious moral issue, but that thing doesn’t lead to your conclusion.
Re: Re: Re:15
The recent paperwork claimed that meta AI output could only reproduce less than 50 words from each individual book, even if you carefully craft the prompt to look for info from that book.
And this fact was used to claim that google book scanning case applies to the situation..
=> so the small amount of infringing data in output is essential part of their case…
Re: Re: Re:16
People often quote content of a book to express opinions about the books by themselves. Such “quoting for commentary” use are definitely fair within copyright law. Unless you quote too much making your commentary effectively substituted the book sales.
There is another weakness to the genAI fair use claim: There is a possibility that the regurgitated portion end up in another book for sale on that is also same purpose for the original author (e.g. quote from novel end up in another commercial novel; or quote from news article for publishing another news without crediting the original source). That could defeat the fair use. Judge Chhabria might have anticipated this “unfair use” in mind, yet plaintiffs didn’t argue. And so he had to rule Meta as marginally fair use, and yet with a lot of warnings.
Re: Re: Re:17
@terop
Mind you. I don’t like the Google Books precedent at all. Even though the regurgitation of 50 words is not much, a malicious users could eventually extract the whole book out of AI by repeat trying the prompts to piece many 50-word outputs together, to make a full version of the book that’s infringement.
The Google Book case is a Second Circuit ruling. Theoretically it can be overturned by the Supreme Court, but the aforementioned malicious use has not been seen and the plaintiffs didn’t cite any evidence for such. It isn’t worth it to appeal this case – it’s better to file a suit again with different authors.
Re: Re: Re:18
This is why many plaintiffs are trying to move from detecting infringements from the output to the input of the AI system. In the input side, the infringement is clearly blatantly copying full text of the books.
It’s no longer just selecting snippets from the books, but instead the input side is cloning the full expressive content of the book. But then it hits the problem of linking the infringement to the copyright owner’s exclusive operations: PERFORM, DISTRIBUTE and DERIVATIVE WORKS, DISPLAY. AI system isn’t exactly publicly displaying other than the snippets. Distribute bit fails for similar reasons. perform is not failing either. But the key aspect what AI clearly infringes is the DERIVATIVE WORKS section. If plaintiffs would focus on derivative works, they could win all AI related lawsuit.
AI based products on the internet are clearly infringing on derivative works -exclusive operation.
Re: Re: Re:19
The “fair use” in U.S. law does shield users from infringing the derivative work right. So your proposed focus would not work. (It’s Campbell v Acuff-Rose Music case law.)
Re: Re: Re:20
The courts and judges just warned in the paper works to rely on fair use defense, given that it was just plaintiff’s blunders that gave Meta fair use decision, and had the plaintiff’s actually ran the case properly, they would have won the fair use. They’re clearly recommending every AI company to take a license to the training material, before content owners find out about the operations.
Re: Re: Re:11
A better analogy is that when you buy a CD from a music store, it grant you a license for personal (and home) enjoyment of music, but you are not to play that music on your workplace or store.
Purchasing the CD does not imply a license for commercial use of that music.
Just to mention, I strongly suggest this case will be appealed. Judge Alsup’s reasoning is deeply flawed and it focused too much on “transformative”-ness that it engulfed other considerations of fair use. Also it erroneously equated machine learning to human learning (I’ve suggested this equation shouldn’t hold because there is no legal personhood for machines; not founded in any constitution of any country).
Re: Re: Re:12
But purchasing a CD, listening to the music, learning to play guitar and understand chord progressions and then writing your own music using the skills you learned is perfectly legal. Adding “with a computer” shouldn’t magically make that different.
But purchasing the CD does not restrict you from learning to make music and make money from that skill you developed. But also, not everyone training an LLM is seeking to or is making money.
Let’s take the machine out of the process. You sit a chimp in front of Bob Ross episodes and the chimp learns to paint. The chimp paints a painting that isn’t a copy of a Bob Ross painting. Is the chimp violating copyrights? No, of course not. If you sell the chimp’s painting, you’re also not violating copyrights just because the chimp learned from watching Bob Ross. The chimp being supplanted by a machine doesn’t change the legal foundations of the process. That would be like saying you’re legally allowed to look at a painting, but no one can look at it using glasses to improve their vision. Glasses are just a tool. A computer is just a tool.
Of course it will. There’s a lot of money to be made in licensing deals if media companies can force everyone to license training data. That doesn’t mean it will be overturned on appeal. It could be. We’ll see.
The thing about the four factors is that they aren’t applied equally. You could have three factors go against you and a court might rule that the remaining factor goes in your favor strongly enough to override the others, or vice versa. A poem consisting of four lines is so short that 100% of the work is used, but that doesn’t make it infringing if other factors weigh heavily towards a fair use.
Transformative is a powerful argument. Commentary and parody are transformative by nature and they get special carve outs.
The lack of legal personhood shouldn’t be an impediment. Evolving algorithms are already legal. Humans are just using the machines as tools to do something. That’s literally something humans have done for millions of years.
The personhood issue only speaks to whether the output can be copyrighted. It can’t because the authorship is by a machine, not a person (though there is some debate as to whether the human prompt is equivalent to other human input that does allow for copyrightability, such as a human clicking a camera button to take a picture, even though it’s the camera that is actually capturing the image – that’s a debate for another day).
Re: Re: Re:13
This argument is fine for music maker applications. But probably not for music-generating AIs (Suno & Udio). When it comes to AI it cannot be equated with human learning, because there is no so-called “skills”. Rather they are more about “samples” and the quality of those samples.
I would say this is a good analogy, MrWilson, but details can matter.
Considering that Judge Chhabria has also ruled on the case now. I won’t debate on this part further. Judge Chhabria’s arguments are much better than Judge Alsup’s.
I don’t think a debate is needed on this. The key is the amount of human creative control that determines copyrightability. When the AI generates significant part of image/music/content that the human has no control of (as if, most of the internal decisions are black box), those part would be uncopyrightable. And the USCO has recognised certain works with AI and registered copyrights for them, albeit each of them carries a waiver (on which parts are uncopyrightable). (I would call these cases as “partial copyright” protection, in constrast to full copyright.)
Re: Re: Re:9
Here’s a decision that meta’s use of pirated books is fair use, simply because authors couldn’t connect the dots for book sales slowing to the introduction of AI in the marketplace:
https://news.bloomberglaw.com/legal-ops-and-tech/meta-beats-copyright-suit-from-authors-over-ai-training-on-books
Attribution next to the generated output, you idiot!
Say “this AI generated images incorporates materials from [arthor-name] [image-url], which is licensed under CC-BY 4.0”
As I’ve said, you slipped it out of your mouth. You support piracy, period.
You assume every image that AI model has been “trained” with is legally obtained for free (which is not; OpenAI, Meta and Anthropic all use pirated materials), otherwise you would not make that question at all.
A person who respects copyright would say, buy the license for AI training. Even on a fair use scenario of, Google Books for example, the result page of Google Books would show the user where to buy the book (e.g. buy from Amazon). You didn’t consider that, because you support pirates. And I should stop making you smarter by recognizing your mistakes.
Re:
Except the training dataset is not in the generated output, you idiot! That’s the whole fucking point! Again, again, again, you don’t understand how the technology works. The result from the model is not a derivative work. It doesn’t utilize a single image. It utilizes its understanding of denoising using tokens and weights. There’s no attribution that can be made, anymore than pulping all physical paintings and using the pulp to make new art would allow an collage artist to identify the original source of any random element.
Yes, legally for free piracy. That makes total sense.
As I’ve said, you’re pretending the internet doesn’t exist and you’re pretending that it’s illegal to view images on the internet and you’re pretending that LLMs contain all images on the internet and you’re pretending that LLMs can just reproduce all images on the internet. Your assertions are non-sensical and completely impractical. You’re predicting internet users will forget the internet exists and just try to recreate all content with an LLM. This is such a weird fever dream and moral panic. Do you think Dungeons and Dragons actually teaches children real satanic spells and rituals? Do you think Halloween candy has razorblades and poison in it? Do you think playing popular music backwards lets you talk to demons?
Your sanity is in question at this point.
I’m not assuming anything. You’re back to conflating training and use as if they’re the same thing. You’re also assuming that LLMs can perfectly replicate trained data that isn’t in the model.
There isn’t a license for AI training in most cases. This hasn’t been a thing until now. There also isn’t an established legal right to demand licenses for training, which is what the court cases and this article are about. You’re pretending it’s already settled.
That’s not a mandatory action by Google. That’s not legally required. You’re inventing your own standard and pretending it’s legally binding.
That’s just a useless ad hominem at this point. You claim poor people both don’t exist but also do exist and are freeloaders and pirates. If that’s your definition, the criticism isn’t bad. It’s telling that you think it’s an insult to support the rights and interests of poor people and you feel compelled to denigrate them.
That’s a weird claim coming from a person who has been proven wrong so many times. I’ve been giving you a free education despite the fact that you claim “You should pay tuitions to your teacher. Everyone pays someone when they learn.” I can give you a paypal or venmo link if you’d like pay me “tuitions.”
Re: Re:
Then why can the AI generate output that is similar to copyright works? By magic?
That’s common sense. (Remember the example of reproducing the photo of Anne Graham Lotz that USCO has cited.)
Another slip of your mouth. You support piracy period.
Before the need to argue whether it’s practical to recreate copyrighted content with LLMs, you simply suggest users to download the content from the internet, which implies, you pirate.
(Otherwise you would say to buy instead. Some of the creative works are exclusive to physical media, some others allow downloading but are behind a paywall. You are worse than the AI companies trying to defend their training data are “publicly available” content, because you suggested pirate ways.)
Say what? https://authorsguild.org/advocacy/artificial-intelligence/ai-licensing-what-authors-should-know/
What the fuck is “the right to demand license”?
Re: Re: Re:
First, this question indicates your admission that you don’t understand how LLMs are trained, as I have already claimed multiple times. So at least you’re being honest here about your ignorance.
Second, it’s because it learned to generate based on its analysis of the training data. Text LLMs predict the next word based on weights and tokens. Image generation works through a similar process with denoising. This information is widely available on the internet. You shouldn’t be this ignorant, especially when claiming to know more than other people.
I am surprised you brought this example of your utter ignorance up again.
The study intentionally sought that image using a poorly trained model that contained a bunch of duplicates of that image and they had to try millions of time to get something that looked “similar” but not useful as a substitute or a competing work. And that’s the only example you’ve ever mentioned despite using this one example to assert that any LLM could do the exact same thing with virtually any copyrighted work.
Except we’re talking about a legally free image. It wasn’t illegally distributed.
No, LLMs don’t contain any images. That’s the whole point. The training dataset is not in the model. It can’t be. There’s isn’t a compression standard good enough to squeeze all that data into a few gigabytes. You’re showing off your ignorance again here.
I haven’t admitted it’s technically possible. I’ve pointed out the flaws in the conclusion that the study made that the US Copyright Office parroted and that you lazily trusted.
Actually, you said one in a million. I said one in millions. And I pointed out the context that makes that one single example difficult to reproduce. And what it did render was not a copy, but a fuzzy reference at best. And, as I already pointed out, the image was already LEGALLY free and available on the internet, so no one would need to try so hard to get a bad imitation of it from an LLM.
Provide another example that is actually a passable copy of a work used in the training data. This one example is not your proof.
Try it. I’d love to see the results. You claim it’s possible. Go for it. Until you provide this proof, we’ll consider it impossible.
There’s no slip. This is you admitting that you think people viewing stuff on the internet means they’re violating copyrights. I’m referring, as I have stated, to simply going to any random website that isn’t locked behind a paywall or a login and your browser just loads images. Or do you think you’re “pirating” and violating copyrights when you come to Techdirt and the logo loads in your browser? Why are you accusing yourself of being a pirate?
No, you dense motherfucker. I literally linked to the webpage where your one weakest example is legally available for free. Which implies you think browsing the internet is a copyright violation. Or at least that you’re just an idiot who doesn’t know what they’re talking about.
You literally can’t buy a lot of content that is legally available for free. There’s no purchase button on a Wikimedia page offering a free license on a photograph that the copyright holder has released with a permissive license.
That your mind jumps immediately to copyright violations indicates your bias. There’s plenty of legally free content on the internet that nobody can or needs to pay for. Are you like Terop and think the public domain doesn’t exist?
Then they aren’t getting trained in LLMs.
Then they aren’t getting trained in LLMs.
I didn’t suggest “pirate ways” at all. You claimed legally free content required a purchase.
You keep doing this where you just search for something you imagine supports your argument but you don’t actually examine what you link to. For dog sakes, you literally quoted a reddit page previously. That linked page is one organization talking about their licensing recommendations. But they don’t cover every creator whose work might be used in training an LLM. Not all creators are easily contacted. There’s works still covered under copyright whose current owner isn’t easily identified. You’re pretending it’s as easy as just clicking a button on a website and paying for a license as if it’s openly offered in easily identified and verified manners.
I’m an author. I’m not a part of the authors guild. My work isn’t available for licensing for LLM training. It’s not like I have a button somewhere on the Amazon pages for my work.
There is no law or court case that has ruled definitively that all LLM training on copyrighted content requires a license, ergo, copyright owners do not possess such a recognized right.
Re: Re: Re:2
Weight prediction = expression “fixed in a tangible medium” -> copyright issue.
Denoising = a very-aggressive compression technique that does not effect the analysis of copyright.
The judge in Andersen v. Stability AI had already disagreed with you.
“That these works may be contained in Stable Diffusion as algorithmic or mathematical representations – and are therefore fixed in a different medium than they may have originally been
produced in – is not an impediment to the claim at this juncture.” (Judge Orrick) https://admin.bakerlaw.com/wp-content/uploads/2024/08/ECF-223-Order-Granting-in-Part-and-Denying-in-Part-Defendants-Motions-to-Dismiss.pdf
Because it exists not in the form of images but in the form of mathematical expression! See the quote from the judge again.
Straw man.
No, my question is, why the fuck can’t the AI models be trained with only public domain materials.
When you suggested many things on the internet can be “legally downloaded for free”, you ignored the reality that AI companies train with pirated materials that means copyright infringement! Suggest “legally downloaded for free” content won’t help because the AI companies are not accused of this part!
Your replies suggests me again and again that training the AI with only public domain content is the only non-infringing way to go! So fuck!
The right to reproduction and the right to derivatives cover AI training. And you dodged my original question of what the fuck is “the right to demand license”.
Re: Re: Re:3
Weight prediction isn’t a copied expression fixed in a tangible medium. It’s an original understanding of how to compose language. For this to be a copyright issue, children would likewise have to get copyright clearances to learn how to write. That is not how US copyright law works. You’re inventing a right that doesn’t exist.
Denoising isn’t a compression technique at all. Again, you don’t understand how the technology works.
As the EFF has said:
“The complaint against Stable Diffusion characterizes this as “compressing” (and thus storing) the training images, but that’s just wrong. With few exceptions, there is no way to recreate the images used in the model based on the facts about them that are stored. Even the tiniest image file contains many thousands of bytes; most will include millions. Mathematically speaking, Stable Diffusion cannot be storing copies of all of its training images (for now, let’s put a pin in the question of whether it stores a copy of any of them).”
https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0
The judge doesn’t disagree with me. The word “may” in that sentence means the judge is saying no determination has been made at this stage. That was, as you can see, simply to address a motion to dismiss. The judge is only saying it’ll go to court to be determined. Again, you gleefully try to find anything you think will support your argument but you don’t understand the nuance so you end up posting things that at best don’t support your argument and at worst prove you wrong.
Also, I wouldn’t trust a judge over a technologist on a matter of technology.
Except that’s not actually true. What exists in mathematical expression is the original understanding that the LLM has about how to compose sentences and predict the next word and not predict the next word from a single work but from the entirety of every piece of data it has been trained on but doesn’t retain. It is not compressing the work at all, it is remembering what it learned. It’s like reading a bunch of books and learning how English syntax works and developing a vocabulary you can use to compose sentences from them. That is not copyright violation or else, again, human learning would violate copyright.
The judge also isn’t making a legal determination of fact. It was a motion to dismiss that they were ruling on, which involves factoring in the unproven arguments of both sides. You also don’t understand how the US court system works.
No, not at all. It’s literally your theory of copyright in a real world example. It contradicts your own assertions. If you don’t think you’re violating copyright by visiting this site, the same principle applies to going to the Wikimedia page for a photograph. But you’ve referring to that as piracy and a copyright violation. You don’t like it being pointed out that the only example you can parrot from third parties doesn’t even support your argument because the image is available legally for free.
Someone tried. It was really hard and the result was less useful than one trained on larger datasets. But that’s not a technological or legal question. You’ll want to ask the people doing the training.
This is you again pretending I’ve ever supported this. This is a straw man. I never said it was okay to use “pirated” materials.
It’s also a moved goalpost because on this particular topic, we weren’t discussing the conduct of the LLM trainers, but rather the efforts of users who you fantastically claimed would try to use an LLM to produce a flawed reproduction of a work that is legally available for free in a perfect reproduction on a website, again legally for free!!! Why would anyone go to great lengths to generate millions of results, wasting vast amounts of time and electricity, to get a failed reproduction of something you can just get legally for free on the internet? The scenario doesn’t even make sense because you would have to have already visited the Wikimedia page to have seen the legally free image in order to know what to try to reproduce using the LLM. The study scenario is contrived and your attempts to use it to assert people will pursue this as an avenue of copyright violation is absurd. This only says anything about your paranoia and ignorance.
Again, we’re not talking about the AI companies in this respect. We’re talking about users that you accused of copyright violation in using an LLM to reproduce a failed attempt at a copy of a legally free image already available on the internet. If you can’t keep track of what you’ve argued, you should look back through the thread. Keep up, dude.
Feel free to train your LLMs with only public domain content. That’s your right, I assume, though I don’t know the laws in your country. In the US, that would be perfectly legal. Have fun with that.
Sofa king great!
Again, “There is no law or court case that has ruled definitively that all LLM training on copyrighted content requires a license, ergo, copyright owners do not possess such a recognized right.” You responded to the answer.
The right to demand licenses is the right of the copyright holder to demand that people obtain a license for uses of their copyrighted content that are required by law. And again, again, again, no legislation or case law has yet determined that copyright holders have a right to demand a license or licenses for LLM training.
Re: Re: Re:4
Children don’t learn how to write by reading a copyrighted novel. It is already bullshit by analogizing machine learning with human learning (this is also debunked by USCO), now it’s more bullshit by suggesting children learn writing by reading copyrighted books.
If you make a more reasonable claim such as learning with dictionaries and textbooks. It could have made more sense, but no you didn’t.
Say that to judges! Say that to Disney and Universal! It’s bullshit to suggest images can come out of thin air. And EFF’s reasoning is fundamentally flawed.
Because Stable Diffusion doesn’t need to store “all” copies! Rather, it stores aggregated and “diffused” representations of all training data, and let the model interpolate the images it didn’t store.
EFF had certainly confused about the “compression” claim. they thought the “compression” only happens on a single image, and not compression to the aggregated data! When there are lots of redundancies in the aggregations of image representations, compression can dictate that only one representation is needed to store, even when the representations of objects are visually different among training images.
So what does that mean? There are still copies. But they are highly aggregated and “fused” with other images of the training data. By analyses in the copyright, the model itself becomes the derivative work of the training data. No magic here.
As if claiming you cannot make a “useful” AI without being unethical.
There is no “legally for free” zone here. Only “illegal for free” zone.
You are questioning why people would reproduce “legally for free” images through LLM, but that’s never my question. What I said is people reproduce “illegally, copyrighted” images through LLM, not the “legally for free” bullshit.
There is no need to “demand license”. Because people other than the copyright holder cannot legally use copyrighted content for any purpose. Rather than copyright holders demand other people, it’s people who plea copyright holders for licenses for usung content.
Re: Re: Re:5
I did. Not just one “novel,” but many different books. My classmates did too. Your ignorance of the US education system is revealed again.
Except they actually do. This is the weirdest claim. Are you a writer? Have you never learned about composition from reading written material? How did you learn to write English? You definitely read copyrighted content while learning to read English. You’re getting into Terop-level of absurd claims here.
Dictionaries and textbooks are copyrighted content! Holy shit, you’re an idiot! Your ignorance is wild. It’s all over the place.
Lawyers for the defense already will, as well as specialists they call.
There’s no reason to try to explain something to people whose livelihood depends on them not understanding it.
Nobody suggested images come out of thin air. Again, you don’t understand the technology and you keep demonstrating that fact.
So you admit it doesn’t store the images and therefore cannot reproduce the images. You just destroyed your own arguments.
Except it doesn’t at all. It stores what it learned through noising and denoising processes. The training data, to be specific, we’re talking about the images it’s trained on, are not represented in the model. A process is what the model actually contains. A process is not a copy of the training data.
That’s not what interpolate means. You can’t interpolate an image you aren’t able to insert and it doesn’t have the original image available after training.
That’s not their stance. Also, “compressing aggregated data” isn’t the same as having compressed copies of all the trained content. Again, again, again, you don’t understand how the technology works. Your claims are useless.
It seems like you’re quoting something here but I’m not seeing a source.
There are no copies! This can’t be said enough. The model doesn’t not contain the training data.
The training data is not in the model, therefore this cannot be true.
It isn’t. It can’t be. It doesn’t contain the training data. The model contains a process, not an object.
That’s a subjective statement. I’m claiming you’re arguing in favor of unethical media companies who can’t be as wealthy as they are without exploiting poor workers and creators. You don’t seem to have a problem with that so your self-righteous assertions of ethics is hypocrisy. You’ve just chosen, as you have already admitted, which set of unethical corporations you want to side with, despite it being pointed out that you don’t have to side with either.
I literally linked to the image source where you can indeed find the image “legally for free.” You don’t get to declare other people’s chosen licenses to be illegal. And again, if your assertions are true, you are violating copyright by visiting this site. Of course that’s absurd, so your argument is absurd. Browsing the internet and seeing content that copyright holders have intentionally posted to the internet, served up by their servers and the servers of companies they have chosen to host their content on, is not illegal. You should really stop using the internet if you think everything is a copyright violation, otherwise you’re complicit and morally compromised.
But the only example you can provide is legally available for free example! And it’s a terrible example because it’s a blurry failed reproduction from a model nobody actually uses after the researchers wasted millions of attempts and electricity and time to find something vaguely resembling the targeted image just to make the argument that it’s sorta maybe might could be possible.
You’re admitting that you’re using one really bad example to pretend your absurd, paranoid, unrealistic hypothetical scenario is plausible or desired.
You must have a right to license a particular use in order to demand, ask for, request, or expect someone to license your copyrighted material.
There is no right under copyright law for a copyright holder to demand a license for exempted acts, including fair use, but also those that fall under the TEACH Act, which I’m quite certain you’re completely ignorant of, as well as those listed under 17 U.S. Code § 110.
For example, “…the following are not infringements of copyright:…performance of a nondramatic literary or musical work or of a dramatico-musical work of a religious nature, or display of a work, in the course of services at a place of worship or other religious assembly;”
So as a copyright holder, you can’t demand a church license your copyrighted content for them to be able to legally perform or display the work during a church service.
They can legally use copyrighted content for many purposes that don’t require licenses or asking for permission.
You don’t have to plea for a license if your use doesn’t require one. That you are ignorant of this fact means, again, for the thousandth time, you don’t know what you’re talking about. You don’t understand US copyright law.
Three pieces of bullshit together.
Firstly, there are public domain books. Secondly, AI “reads” “millions” of books while no human could have that time to read that many. Thirdly, you cannot legally justify that AI has the right to “learn” even when it can learn like humans.
Emphasis added. Since you claim “what it learned” can be “stored”. “What it learned” is fixed in a tangible medium and such is subject to copyright! You can’t win.
Stored process. Since all computers in our world are stored-program computers, the “process” itself you are referring to is subject to copyright.
Dismissed again because you failed to list any example of “poor worker or creator”.
Well, since there is a news that Disney and Universal are suing Midjourney just a day ago, why not read their complaint for such illegal examples?
Here you are: https://www.courthousenews.com/wp-content/uploads/2025/06/disney-ai-lawsuit.pdf
First, your cited section 110 doesn’t cover anything about AI. Second, even your cited example (§110(3)) is flawed, because the limit is on “nondramatic” literary or musical works, or dramatico-musical work “of a religious nature” only. I still have exclusive rights if my works are “drama” and “non-religious”.
Are you intentionally misinterpreting my “any” word here?
Re:
There are indeed. That doesn’t mean I didn’t read copyrighted books to learn to write. This doesn’t refute anything. The fact that I did in fact read copyrighted books to learn to write disproves your claim entirely. I didn’t get permission or obtain a license to learn to write from reading the books I read that were subject to copyright, because I didn’t have to. US copyright law doesn’t require that!
Yes, and this lack of a limitation changes nothing about the legality. If a human could actually read that many that fast, it wouldn’t magically become illegal just because one person was exceptional. The speed at which someone or something can read is not a legal basis for anything. Otherwise, you’d be citing the law that refuted this.
LLMs have no rights. They’re not people. But humans have the right to train LLMs.
What it learned is a process of its own actions, not a copyrighted work.
No, it isn’t subject to copyright. The learned process is a machine process not of human authorship. It doesn’t qualify for copyright. Fixing something in a fixed medium doesn’t make it subject to copyright. That’s a requirement for copyright, not a proof that something is copyrighted. Machine-generated code is not subject to copyright. The US Copyright Office that you keep citing will tell you this.
First, this is wrong. Second, it’s terrible logic. Computers storing programs isn’t a basis for them being subject to copyright. You can’t cite a law or case law that says that anything stored on a computer is subject to copyright by that fact. That would wipe out the public domain status for any public domain work that gets digitized or originally existed in a digital format.
I already listed an example. If you were an American citizen, I’d list you. I’ll list myself since I’m not a millionaire. You want someone else? Len Kaminski.
Now, what is the great response you were hoarding while waiting for a name? What brilliant reasoning will refute this claim because you can’t fathom the existence of poor people in the US?
Nope. You’re goalpost shifting. You said that one example was proof. I proved it wasn’t. Either you admit you were wrong or your prove that example was actually a violation of copyright and reason why it would lead to the scenario you claimed. Otherwise, your claim is completely dismissed, as you are so wont to do with far less logic.
We’re discussing copyright. Holy fuck! We’re discussing US copyright law, you dumb motherfucker!!! How dense can you be? The use being for training an LLM doesn’t magically change copyright law. You’ve been arguing about US copyright law. You’re goalpost shifting again!
That’s not refuting the example. The example is to show that there are exemptions to copyright. And that’s just one example. It’s not claiming all uses are exemptions of copyright law.
No, I’m intentionally pointing out that there are some, in fact, there are many uses that don’t require licenses or asking for permission. You’re claiming that all uses require permission. Copyright law itself says that claim is false. You don’t understand US copyright law and as you’ve revealed in this post, you don’t even understand we’re talking about US copyright law!
Which section of the U.S. copyright law?
The debunk is about AI reading the same as humans, not about the legality of “reading”. It’s that the anthropomorphizing machines won’t work.
Which law? Especially on the copyright of that. Cite a clause in the U.S. copyright law.
You are taking out parts when you make the statement here. First, whether a work is subject to copyright is independent of whether it’s a machine that made it or a human made it. A machine-generated stuff (either the model or the LLM output) can be simultaneously uncopyrightable and infringing someone else’s copyright. That’s the key point of many copyright lawsuits about AI.
I will reply these two comments together, because when you are suggesting there are “some” uses that don’t require licenses blah-blah-blah, you then made the claim of “you can’t cite a law or case law that says that anything stored on a computer is subject to copyright”, misusing the word “any” here. What a hypocrite you are.
If I replace your “any” with “every”, then the claim is true, but with “any”, you instead suggest software code is not copyrightable, and then what the fuck would many lawsuits about software copyright are about?
I’m not challenging your ability to name any “poor person” according to your definition, I’m challenging your ability to represent the “poor people” class you defined.
You cite Len Kaminski, and so I am fucking serious to ask this question, did he support piracy like you do? (I say Len Kaminski, a comic book writer, notable with some Iron Man works. https://marvel.fandom.com/wiki/Len_Kaminski )
Re:
The doctrine of first sale, fair use (107), and since 2002, the TEACH Act. And specifically, 106 doesn’t give copyright holders the right to restrict learning from copyrighted material. You seem to assume that US copyright law gives copyright owners infinite rights over their material. It doesn’t. You should avail yourself of the vast resources on the internet that are available legally for free to educate yourself on the topic better before trying to argue with others about it.
Nobody is anthropomorphizing machines. Not being human doesn’t stop machines from learning. The learning process is not the same but is analogous. The analogies are made because you don’t understand the process and it tends to make it easier to understand if you compare it to something similar. But you’re intentionally being obtuse or at best, you’re just dense. Neither qualifies your perspective as useful, much less authoritative on the subject matter.
The 10th Amendment of the Constitution, which supersedes US copyright law.
Fuck no. You are absolutely wrong. And this point should have been a gimme. How did you not know this? You again prove you don’t know what you’re talking about yet you have to gall to pretend you can educate Americans on their own laws.
https://www.reuters.com/world/us/us-appeals-court-rejects-copyrights-ai-generated-art-lacking-human-creator-2025-03-18/
“The Human Authorship Requirement
The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being. The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.” Trade-Mark Cases, 100 U.S. 82, 94 (1879). Because copyright law is limited to “original intellectual conceptions of the author,” the Office will refuse to register a claim if it determines that a human being did not create the work. Burrow-Giles Lithographic Co. v. Sarony, 111 U.S. 53, 58 (1884).”
https://www.copyright.gov/comp3/chap300/ch300-copyrightable-authorship.pdf
Your ignorance on this simple and essential aspect of US copyright law should, again, say it all. You don’t have a fucking clue what you’re talking about.
False. The model can’t be because it doesn’t include any of the trained data. You’re conflating the two.
That’s a key argument that hasn’t been determined by a court of law definitively in a precedent that influences all of US copyright law.
Some uses don’t. That’s a factual statement.
You’ve ommitted the key qualifying phrase in that statement, which makes your take entirely dishonest. What I said was “You can’t cite a law or case law that says that anything stored on a computer is subject to copyright by that fact.” You left out “by that fact,” meaning that the fact that something is stored on a computer doesn’t make it subject to copyright protection. That was the meaning that you’re either too stupid to comprehend or at worst, intentionally trying to twist. Indeed, what a hypocrite you are.
No, it isn’t.
No, I didn’t. You’re just revealing you didn’t understand what I said, yet again. I keep telling you that you’re not understanding what I’m saying and you just keep responding to things I didn’t say. How many times have I said this in this comment section? Of course software code is copyrightable. Quote me where I specifically said “software code is not copyrightable.” You can’t. I said that just because something is stored on a computer doesn’t mean that it is copyrightable, which is what you claimed. There are other requirements for things to be copyrightable beyond being stored in a fixed medium. One of which is human authorship, which I have already pointed out. Your ignorance of copyright law is astounding in direct correlation to how much you try to expound upon it.
I never purported to represent them. Quote me where I said I represent them! Otherwise, your argument is dismissed, as you are so wont to do.
I haven’t supported piracy, so that’s a straw man unto itself. Len can represent himself, but I don’t recommend trying to harass him over your issues. He hasn’t been in good health recently. But he is a poor US citizen and a creator who has been fucked over by the big media companies you’re championing, which is another reason you shouldn’t bother him. You’re arguing in favor of the people who are responsible for his current state.
I love that you’re going down these rabbit holes based entirely on your own straw men.
I never asserted I represent others, meanwhile you asserted that you represented my interests. I never supported piracy, yet you’ve asserted that legally accessing freely available, permissively licensed content is piracy and a copyright violation. Your entire argument is a giant bundle of arrogant ignorance and straw men.
Let’s review: I’m a rightsholder. I disagree with your siding with Big Media corporations. Poor US citizens and creators exist. You don’t represent me. You don’t represent any of them. You don’t understand US copyright law. You don’t understand the US Constitution. You don’t understand LLM technology. You have proven nothing that you’ve claimed.
Re: Re:
How do the four factor of 17 U.S. Code § 107 grant you permission on that?
Which section and clause?
It’s not restricting AI “learning”, it’s restricting AI reproduction of copyrighted work under the guise of “learning”. § 106(1) and § 106(2) the exclusive rights to reproduction and derivative works.
In case you didn’t get it, here’s a hypothetical example:
A group of students learn to write horror fiction by reading novels by Stephen King, and through learning, the students are asked by their teacher to write book reports or “notes” documenting how a novel in King’s style can be made (so they’re not just reviews or commentaries that are “safe harbors” in fair use judgements), and then these “notes for creating works in Stephen King’s style” later got published and sold commercially. How do people claim that these student’s “learning” is fair use?
Here, the claim that AI doesn’t store any piece of the work won’t help. (1) The “notes” are “fixed in tangible media” and thus subject to copyright. (2) Even though the works are not present in the final use, the “notes” here are under derivative work category and “learning” is not an excuse for this one being fair. (3) The mention of specific author name in these “notes”, means that author can accuse trademark infringements in addition to copyright infringements. For now in court, the “styles can’t be copyrighted” defense cannot apply to this (Andersen v. Stability AI). So, how do you answer?
I’m not saying I would harass anyone. I was saying if you cannot bring any person here as a direct witness, then your claim of me hurting “poor people” is unfounded and I dismiss it. You carry the burden of proof of your claim, not me.
You support piracy of AI companies under the guise of “machine learning”!!! While I asked why the fuck can’t AI companies train with only public domain content, you dodged the question and then simply insist that machine learning is legal (even when the source of training data is pirated)! That’s your stance, damnit. It cannot be any clearer.
If you really focus on “permissively licensed content” for training, then you should condemn the AI companies doing illegal actions. Otherwise the stance you claimed you are is not the same as what you behave.
Dismissed. (Even when poor citizens and creators exist, you can’t represent them. No witness showing up, so this argument is moot even when what you say is a fact.)
Re: Re: Re:
First, let’s cover the fact that you’re actively questioning the legality of humans learning to write under copyright law you don’t understand. This is absurd. Are you going to claim that eating food is a trademark violation next? This ignorance is appalling. If you need this explained to you, you are admitting you don’t understand US copyright law whole cloth.
Second, fair use would be cover it, but 106 not reserving the right to copyright holders is all the justification you’d need.
But a four factor analysis would weigh easily in favor of reading to learn to write because the use is transformative (turning the reading of the content into pattern recognition of language composition in an intangible, non-fixed human brain medium) and the fact that the use wouldn’t affect the market for the original. Nobody refuses to buy a book from an author because a kid somewhere has read it before.
Teachers can make copies of works for educational purposes. Section 110(2) of the U.S. Copyright Act.
Quote where 106 says rights holders have the right to demand licenses for LLM training. You can’t, because, as we’ve already said, this hasn’t actually been determined by law or case law.
This is perfectly legal. The notes are composed of the students’ pattern recognition, learning, analysis, reviews, and commentaries about Stephen King’s writing style. And here’s the goddamn kicker: Stephen King wrote a whole fucking book called “On Writing: A Memoir of the Craft” in order to teach writers how to write like him! Of all the writers you could choose to make an example out of, you chose someone who literally teaches others to write. I couldn’t have planned this so well if I wanted to. You created your own trap for yourself. But even if Stephen King hadn’t expressly put effort into teaching others to write, it would still be fair use to write notes about how he composes. US copyright law notably only covers a specific expression, not a writing style. Writing style falls under ideas, procedures, methods, systems, processes, concept, etc. which aren’t covered.
Copyright does not protect
• Ideas, procedures, methods, systems, processes, concepts, principles, or discoveries
• Works that are not fixed in a tangible form (such as a choreographic work that has not been notated or recorded or an improvisational speech that has not been written down)
• Titles, names, short phrases, and slogans
• Familiar symbols or designs
• Mere variations of typographic ornamentation, lettering, or coloring
• Mere listings of ingredients or contents
https://www.copyright.gov/circs/circ01.pdf
https://www.copyright.gov/circs/circ33.pdf
The notes are the creation of the reader, not the writer. If a human wrote the notes, that human owns the copyright, not the writer whose works are being commented on. And notes about how a writer writes does definitely fall under commentaries. And again, you’re confused about the fixed medium requirement. Being fixed in a tangible media is a requirement if you want something to be copyrighted. It does not at all in any way mean that everything fixed in tangible media is subject to copyright. Public domain works are fixed in a tangible media. Government works that don’t qualify for copyright are fixed in a tangible media. Titles, names, short phrases, and slogans are fixed in a tangible media and do not qualify for copyright. This has already been explained to you. You’re not learning a goddamn thing from this discussion.
The notes aren’t a derivative work because they don’t contain the original work and if they contain any part of the original work, that would be a de minimis use and the rest of the work that contains the commenter’s own thoughts would still qualify for its own copyright. They just wouldn’t own the copyright on any quoted parts written by the original writer.
As long as the writer of the notes doesn’t purport to have the participation of the original writer, this wouldn’t be a trademark infringement. You could accuse all you like.
To be clear, again, you are ignorant of the US education system. If your theories of copyright were correct, almost all learning in public and private educational institutions would be illegal. Students would be violating copyright to write a book report or an essay on a work. This isn’t the case and you can’t point to any court cases in which this has been adjudicated. If you were right, you could. CITE YOUR PROOF OF THIS CLAIM.
It absolutely can. Andersen v. Stability AI hasn’t been decided at all. It’s not even set to begin for another year. You can’t claim a precedent from a case that hasn’t even started. And even when it’s decided, it won’t necessarily set a precedent in all US jurisdictions. You don’t understand how US law or US courts work either.
You seem to be confused about the claim. You do realize people can have their rights violated and not even know it, correct? For example, I could bribe politicians to pass a law to take away a right that US citizens currently have and they may not be aware that I’ve taken it from them. Laws are complex and often hundreds of pages long. Many people (such as you) don’t take the time to learn what all is included in them. You’re demanding a subjective perspective from a person who may not know anything about the topic. This is such a useless counter-argument. At best you could say some of the people wouldn’t care about losing a right they weren’t planning on using, but that’s not the same thing.
Large media corporations have already hurt US citizens by bribing politicians and expanding copyright far beyond its original intent and purpose. They have already deprived US citizens of their rights. They have already subverted the democratic process and disenfranchised voters’ rights.
And you’re defending them. You are actually cheerleading for the further harm to US citizens.
It’s simple and I’ve already said it multiple times. “If corporations lose cases and the result is a legal precedent that all training requires financial compensation, poor people will not be able to afford to train LLMs and therefore only wealthy corporations will be able to.” The large AI corporations can pay for licenses if they lose. Poor people won’t be able to. So we’ll only have big, expensive, powerful AI companies licensing their AI to the government, to educational institutions. The future of our society will be controlled by the LLMs trained by the wealthy. You’re functionally just saying you want your bribe before they make my society worse.
LLM training isn’t itself piracy or a copyright violation. CITE A COURT CASE OR A LAW THAT SAID EXACTLY THAT.
I didn’t. I pointed out that someone has done so and the results were mediocre. It’s also not a question you can demand I answer. I’m not saying they can or can’t. Take it up with them.
You can’t point to a law or case law that says it’s illegal. If it were illegal, then software dating back to at least the 70s or earlier would have been illegal. Algorithms would be copyright violations. Search engine would be copyright violations.
I didn’t say this at all. I will say again QUOTE ME WHERE I SAID THAT. Your continuing claims while failing to actually quote anything I’m saying that actually does what you claim proves you’re wrong. You’re arguing with straw men.
That’s not my stance. You keep making up fake things I haven’t said instead of just quoting me. You say things are clear while you demonstrate multiple times that you don’t understand much of anything you’re saying.
You will note at the very beginning of this that I am not siding with the AI companies. I literally said you don’t have side with or favor any large corporations and you insisted that you must. I haven’t and don’t support them. You have asserted I do based on your own false dilemma that a side must be chosen.
I don’t “behave” at all. The stance I claim is all there is here. I don’t work for a large AI company. I’m not training their LLMs or downloading copyrighted content for their training purposes. You seem to assume I’m doing more than discussing the topic. That’s a really weird assertion. It’s almost as if you’ve propped me up as a straw man for arguments you wish to have with people who aren’t here.
I have specifically not purported to represent them, though you have purported to represent my interests, which makes you a hypocrite.
You didn’t ask for a witness. You asked for a name, which is not the same. And asking for a name or a witness is irrelevant. This isn’t a court of law, dumb shit. And you’re not a lawyer.
Re: Re: Re:2
I intentionally question this so that you can explain your theory about how human learning fits the four fair use factors. Not because I believe that human learning is non-infringing, but because your “fair use” analysis on this simple case might be backed by any court ruling. (And if so, your claim about AI training being fair use would become flawed theory as well.)
I would say this answer of yours might be wrong. But I am not certain that mine is right (you should ask a lawyer to confirm my answer). Anyway:
According to 17 U.S. Code § 101, the definition of a “copy” required the work to be “fixed” in a “material object”. The memories in human brain cells are not “fixed” in a strict sense, and so humans memorising stuff is not considered “reproduction” in the copyright law, and as a result, there is no need to consider fair use factors.
Hold on, there are people who do choose not to buy a book if they had read it somewhen before. E.g. they may had borrowed the book from someone (assuming that book is a legal copy). This can affect the fourth factor i.e. book sales, on the fair use. If I didn’t argue that memorization in brain cells is “not fixed” and thus “not a copy”…
§ 110 (2)(A), this exemption applies to “a governmental body or an accredited nonprofit educational institution” only. Nothing to do with AI training, and if you are a for-profit corporation, this section never applies.
Again, it’s to show many copyright exemptions to human learning don’t apply to AI training, and the analogy between the two is fundamentally flawed in the legal sense. The layman’s words for this is unlike humans, AIs don’t have rights to learn.
Except that the cited Stephen King’s instructonal book is not part of this example! It’s about how students pretend to be Stephen King and write a book about how to write like Stephen King without King’s permission!
If the students write generic instructional books about how to write without any mentions of King (or merely comments about how King inspired them), then the case would be much easier because then the students’ works would have no direct extraction of King’s works (or too minimal to bring copyright concerns). Even if there is reference or extraction of King’s works, the results would be “transformative” enough to not look like Stephen King’s works or their derivatives at all.
Emphasis added. Because I didn’t assume in this hypothetical scenario the quotes are de minimis use. And for substantial quotes the question comes in: The original writer didn’t give license to you to quote substantial portions of his book. And how would you justify fair use on that? Especially when the students books are later sold commercially.
No, this is not my claim. The claim is about students selling books telling readers how to write “in Stephan King’s style” without permission from Stephen King. And the students also quote substantial content of Stephan King’s work without permission.
Note that as you cited that Stephan King also wrote instructional books, this mean a market competition with King’s instructional book directly, which could affect the factor four analysis in “fair use”.
I didn’t deny the legality of commentaries.
What purpose?
Which rights are specifically deprived?
Before I request a proof of how that could happen, I would ask another thing? Why do you have to rely everything on the LLM, as if people cannot work without LLM in all industries? There’s one thing I need to mind you: You only need a pen, and not LLM, to write, and a brush and ink, not LLM, to draw. Why the heck did the entire world depend on a technology that is more harmful to environment than before (in terms of energy waste, and air pollution)?
That’s part of the reason I don’t buy your argument about “[copyright/creative monopoly] making the society worse”. As AI is already worse for the environment already.
By the way, I dismiss your “poor people” claim here again.
Re: Re: Re:3
For environmental impact on AI, see also these:
https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about
https://www.caltech.edu/about/news/air-pollution-and-the-public-health-costs-of-ai
Re: Re: Re:3
Again, it doesn’t need to be fair use. There is nothing in 106 that gives copyright holders the right to demand a license for people to learn to write (or draw or sing or play guitar) from their copyrighted works. There are no cases where this is disputed. It’s hard to find anything on the topic because it’s so legal that nobody is debating it because nobody with the least amount of understanding considers it a violation of copyright. Which why it’s entirely absurd for you to even entertain the idea that it might not be legal.
You’re ignoring the evidence that elementary schools are not daily being sued for copyright infringement for teaching American children how to write. I would say your answer is absolutely wrong because there’s no proof whatsoever and there would be plenty if you were right.
That’s what I said.
You changed what I said. I said nobody refuses to buy a book because a kid somewhere has read it before. I didn’t say the kid who had read it already is the person who refuses to buy it. A random person in Seattle won’t refuse to buy a book in a bookstore in Seattle because some random kid (completely different person) read the same book in elementary school in Florida. A child reading a book and learning to write from it doesn’t affect the market for the book.
We were discussing children learn to write from reading. Now you’re goalpost shifting again. But, as it stands, I have already pointed out that I’m concerned about the ability of researchers (including employees and students of accredited nonprofit institutions) to train LLMs for research purposes. So it definitely applies. And you have demonstarted your ignorance of how education works in the US, so your assertions on the matter are useless.
Except you’ve demonstrated you didn’t understand US copyright law even as much as it pertains to humans learning, so your analysis is fundamentally flawed in the legal sense.
But humans have the right to train machines.
You don’t get to pretend that students won’t have access to or knowledge of the fact that Stephen King wrote such a book. And since your scenario is that a teacher is asking students to study Stephen King’s writing, the teacher would most definitely avail the students of the knowledge King specifically offers regarding his writing in his book on writing. You can’t just exclude it because it completely destroys the premise and conclusion of your argument. That’s intellectually dishonest. You’re handling the fact that you fucked up by using Stephen King as an example pretty poorly.
You’re changing the scenario again. Pretending to be Stephen King would be fraud.
Stephen King already wrote that book. Why would the students try to write it? That doesn’t make any sense. However, it wouldn’t be illegal since writing style isn’t copyrightable in the US. You might have to write a disclaimer stating that your book has not been approved or endorsed by Stephen King, the way many biographers do when writing an unauthorized biography about a famous person.
You’re retreating here. This directly contradicts you’re previous assertions that this conduct was a copyright violation.
So you assumed copyright violations from the start, so the entire scenario is tainted by your negative perception of anyone who might be pursuing a legal approach to the topic.
You don’t need a license to quote authors if it isn’t substantial. That’s literally the commentary exception. You’re just magically assuming it would be in quantity enough to violate copyright. You’ve decided to cripple this hypothetical author by insisting that they do something that is likely a copyright violation so that you can claim it is in fact a copyright violation. You’re using circular logic.
They wouldn’t use large portions of King’s work. If they didn’t self-publish, their editor and publisher wouldn’t let them include such large portions.
But this is all a moot point because you’re pretending this is an analogy about how LLM training works, except it’s not because there is less than de minimis use of the copyrighted works in the model itself. There is no copy of the works in the model. The correct analogy for LLM training is that a person read many copyrighted works, their brain used their analysis skills and pattern recognition to retain the process of composing phrases and sentences and paragraphs and chapters, and they then use those patterns they recognized to compose completely new sentences, paragraphs, chapters, and whole works. This is literally how I learned to write and how I have refined my writing skills over decades. I don’t include the sentences of the writers from whom I learned to write in my writing. But I have used pattern recognition to refine processes for composing good sentences and paragraphs, to compose effective prose and storytelling content.
Re: Re: Re:3
But this is the natural conclusion of your claim. You’re saying that learning is a copyright violation if that learning is recorded in a fixed medium. Except the learning is neither the original work nor a derivative work.
Here’s an example:
Author Joe Smith writes this paragraph.
The woman told him that he would encounter death if he chose to walk into the garden, but he walked in anyway. In the garden, he encountered a cloaked figure who spoke with a supernatural resonance in his voice and who knew the man’s fate. The man realized that this was Death whom he had encountered in the garden, just as the woman said.
And a student reading this passage might write notes like: “Play on words – ‘encounter death’ can mean to die, but it could also mean ‘encounter the personification of death.’ Use play on words in the future to seem clever in your writing.”
The student references only the phrase “encounter death” from the original work. It’s de minimis and the phrase itself doesn’t qualify for copyright protection itself since it’s so generic and common. It would only be copyrighted in the context of the greater work. The student learned a process about writing, specifically in this scenario about using word play to surprise the reader. This is not at all in any fashion a copyright violation.
This isn’t itself illegal. Style’s aren’t subject to copyright protection.
This is different. You’ve included this to make it illegal. But it’s not even analogous to LLM learning. LLMs do not quote or contain the original works they’re trained on.
A random student who isn’t a bestseller author publishing a book about Stephen King’s writing style will not compete with Stephen King’s own book on his writing style.
But you denied that writing a book report or essay on Stephen King’s works was commentary. Even publishing a book on Stephen King’s writing style would be considered commentary.
Article I, Section 8, Clause 8: To promote the Progress of Science and useful Arts
They expaned copyright beyond its original term such that it is now the life of the author plus 70 years, which has deprived the public of the use of the public domain works within their own lifetime. If I die before 70, I will be dead before a work written during my lifetime is available in the public domain. But the lobbying and bribery by media companies has also bought corrupt politicians and deprived citizens of their right to representation and the right of redress of their government. When the legislature writes laws for the wealthy, they are not representing the people or upholding the Constitution. The normalization of this corruption and bribery opens the door for more big corporations and “campaign donors” to further subvert democracy and representation.
This deprives US citizens of some of the most fundamental constitutional rights since it affects free speech, free press, the right to petition the government, and the right to representation within Congress, among others. And this corruption affects other rights since wealthy corporations can support candidates who support creating more rights for wealthy corporations and those candidates may also violate other civil and human rights. Some media companies have donated to Donald Trump and he’s violated uncountable human and civil rights, so they are at best indirectly supporting those violations.
I’m not relying on LLMs. I’m recognizing the fact that corporate leaders are only too eager to embrace “AI” for almost anything. Some educational institutions are embracing it’s use. Many corporations are hiring “prompt engineers.” Customer service representatives are being replaced by LLM chatbots. You have to be blind not to see the inevitability that the wealthy and the corporate leaders are going to force more and more AI on everyone. The Trump administration is already endorsing it’s use in the analysis of government systems in the disingenuous search for programs to cut funding for. Demanding licenses for training only means the AI companies just pay the Big Media companies an amount, but they’re going to continue to be used. These court battles won’t kill AI even if the outcomes are negative for the AI companies. If the outcomes are negative, deals will be struck. It’s just a matter of how much money gets spread around to already wealthy people. But even if Big Media wins these lawsuits, the creators won’t see a significant windfall at all. The Big Media companies already use contracts to minimize how much they have to share of their profits with the creators who actually create the works they profit from.
This statement reveals your fundamental misunderstanding of my position. I don’t use LLMs to write. I don’t use LLMs to create graphic designs. I don’t use LLMs to draw. I publish works that are my own efforts. I’m just not an idiot who can’t see that it’s inevitable that these LLMs will be pushed on the American public more and more and I’d prefer the LLMs that get the contracts for government and public services were trained by ethical researchers instead of major for-profit corporations who just happen to be the only ones wealthy enough to license enough content to train an effective LLM. When I’m old and needing greater amounts of health care, I want the LLM that gets assigned to my case not be programmed to maximize profits but rather to take my health into greatest consideration.
Because the wealthy don’t care about the environment. Big media companies rely on giant environmentally-unfriendly server farms to stream their licensed content over and over to the same people because they don’t want their precious IP cached and replayable. You’re pretending AI companies are the only destructive corporations. The wealthy own stock in a wide variety of environment-destroying and exploiting operations.
Or, as I already said, wealthy corporations are all the same, regardless of which side they fall on for this particular topic and you don’t have to side with either of them as you will never be one of them. You’re at best a useful idiot for the big media companies. But…and here’s the kicker…the stock of the big media companies and the stock of the big AI companies can be owned by the same people. You’re not actually picking a side when the wealthy can own parts of everything.
Yeah, you don’t give a damn about the poor people who will have your preferred licensed AI companies’ LLMs forced upon them. We’ve already established that for any possible reference you make to an ethical position or concern about the environment, you’re willing to throw that out the window in favor of big media corporations and their profits, which exist in the first place at the expense of the creators and poor people.
Re: Re: Re:4
No. It’s not students writing “encounter death” constitute copyright infringement. It’s students writing “The woman told him that he would encounter death if he chose to walk into the garden … In the garden, he encountered a cloaked figure who spoke with a supernatural resonance in his voice and who knew the man’s fate. The man realized that this was Death” that constitutes copyright infringement.
The infringement happens on the bigger context and word & sentence arrangements, not the de minimis cases of individual words or phrases. Not about a word play either.
You are oversimplifying things. (1) Styles of a specific artist are protected under TRADEMARK LAW. (2) Style mimicry while replicating a significant portion of the original author’s work is still copyright infringement. You can’t escape liability by just say you copy someone’s “style”, because the devil is in the details.
Have you read any lawsuit complaint before you make this BLATANTLY FALSE claim? Here is one, New York Times v. OpenAI, read pages 30 to 32:
https://admin.bakerlaw.com/wp-content/uploads/2024/01/ECF-1-Complaint-1-1.pdf
What if the book becomes bestselling later on? This defense is fucking stupid when it comes to factor four on fair use.
Yes I deny it. Why? Educating other people on Stephen King’s style with significant quotes from Stephen King’s copyrighted works go beyond the boundary of being a commentary. I did’t even mention about AI regurgitating.
(Emphasis added)
What bullshit are you talking about? There is no such a right as “using the work within the people lifetime”. Quote the U.S. constitution for such a right. The latter things about “right to representation” or “free speech, free press” have nothing to do with copyright. These are bullshit debunked many times. (I know this position is from EFF, but after Warhol v. Goldsmith case, the EFF didn’t ever learn. See also: https://copyrightalliance.org/warhol-foundations-flawed-transformative-use-theory/)
Then the solution is not to let AI win but to fucking address the AI problems and not adopt AI blindly!
Copyright theft is just one of the problems with AIs. Misinformation, deepfakes, fraud, energy waste, dehumanizing creativity, are all there.
Then your position still exploits human workers that put efforts in your health care stuff. You don’t bother pay any human that would help in your later life. So you’re selfish. But that’s fine, because every worker is selfish when they want to make a living and get paid and not get exploited.
Dismiss “poor people”, you can only represent yourself.
Re: Re: Re:5
First, even quoting that paragraph likely wouldn’t be enough to be infringing if it were part of a longer novel or even a short story. Second, you’re choosing a scenario that isn’t realistic. You’re starting with the assumption that they’re infringing copyrights and therefore learning is infringing, but it’s perfectly possible and perfectly legal to learn from copyrighted material. And this doesn’t serve as an analogy for LLM training because nothing of the original content is quoted. And more importantly, you’re ignoring that this is how human beings learn to write. You read the works of others. You compose sentences by seeing how other people compose sentences and some of those compositions you observe are still protected under copyright.
First, we’re not discussing trademark law. You can’t claim a violation of a copyrighted work as a trademark violation because those are different sets of laws. Second, you’re going to have to provide a citation that writing styles are protected under trademark. I’m not seeing anything in trademark law that would cover this. I’m starting to suspect you don’t understand what a writing style is.
If you’re mimicking the style, you’re not using any portion of the original author’s work. If you’re using their work, you’re not mimicking them, you’re just copyring them. You keep saying, “they’re violating copyright, therefore they’re violating copyright.” Your circular logic doesn’t work. I thought maybe you didn’t understand how the US education system works, but it seems like you just don’t understand human beings.
You absolutely can not be liable as long as you aren’t copying their actual work in portions large enough to trigger a copyright claim.
You keep saying this. That wouldn’t fit the instructor’s assignment.
Holy fuck. I love it when people provide citations that prove they didn’t read or at least understand their own citations.
Here’s the pertinent part of the claim: “In May 2023, Microsoft and OpenAI unveiled “Browse with Bing,” a plugin to ChatGPT that enabled it to access the latest content on the internet through the Microsoft Bing search engine.”
We’re discussing LLM model training, not live search features. The LLM model itself doesn’t contain the quoted text. It literally has to be connected to a live search add-on to have that functionality. This literally proves what I’ve been saying.
But again, you keep using the lawsuits as if I’ve been saying all the lawsuits are right or wrong. I’m not defending the practices of the large AI companies. I’m defending the singular principle that training an LLM isn’t itself necessarily copyright infringement. You keep using me as a proxy for the AI companies despite my disavowing them from the beginning. I’m not your AI company straw man.
It’s still fair use and commentary as long as it doesn’t quote too much of his original work. You can describe a writer’s style without quoting it. You can describe plots and characters and sentence structure without quoting the author’s words. You’ve apparently never read a book report.
You keep adding the “with significant quotes from Stephen King’s copyrighted works” part. That’s not what does happen when humans learn to write from reading someone else’s work. That’s not what LLMs are doing when they’re responding to prompts. You’re making up a fake scenario that doesn’t apply to anything.
There was originally. https://copyright.gov/about/1790-copyright-act.html
Amendment 10: “The powers not delegated to the United States by the Constitution, nor prohibited by it to the States, are reserved to the States respectively, or to the people.”
They are more important than copyright and they have been weakened and subverted by the big media companies you have chosen to side with.
These what are bullshit? You didn’t address anything. What claims are you saying are debunked?
Nope. I’m not touching a link to an industry-backed organization. You’re saying, “these are the good guys, just read their propaganda and they’ll tell you.” Cite a legal research document with case law and legislative sources. You keep citing the Copyright Alliance as if they’re not a biased source. This is why you don’t understand the topic. You’ve been getting all your information from biased claims, not neutral or factual sources. Their job is to empower themselves, not to speak factually about the law. They’re arguing for stricter protection for their profits.
How do you propose doing that? You don’t have influence over all those companies and institutions. They have free will. They’re going to do it whether you or I approve. Setting a legal precedent that all LLM training requires licensing won’t stop that.
Those are issues that have existed before LLMs were widely available. They’re also misuses of various technologies by human beings. You’re complaining that technology can be exploited for unethical purposes. Of course it can. It always has been. It will continue to be in the future. It’s not effective to outlaw technology. You have to regulate human actions.
No, you dumb fuck. The scenario is that my health insurance and health care provider will be using AI to handle my case. You’re pretending like I would choose that. The whole fucking point is that this will be pushed on people against their preference. I do pay human beings right now for my health care. And since I’m an American, I likely pay a lot more than you pay and for worse health care than you probably get because the wealthy assholes who run the US won’t blink at an opportunity to monetize every human need. And what’s worse, we have non-Americans like you cheering it on self-righteously.
I exist. I’m not wealthy. I’m also a copyright holder and a creator. You don’t represent me.
But you also don’t represent anyone else, which makes your entire stance useless unless you’re purporting to be a creator whose works are published in the US.
Re: Re: Re:6
Yes, I am using the scenario that “learning” is infringing. And by the way that “learning” word is in scare quotes because in this scenario it’s commercial reproduction of copyrighted works under the guise of “learning”.
If the “students” in this scenario publish nothing and sell nothing, there is no infringement! But heck that is not the example!
No. Even when the LLM quotes nothing it can still be infringing. The case can be when it takes the sentences or paragraphs and translates word by word into another language, so that it technically quotes “nothing of the original”, but the creative expression through careful arrangements of words is still copied. This is technically a derivative, but still an exclusive right to the copyright holder.
I can write without reading any copyrighted work! This analogy is fucking nonsense. It’s nonsense because you assume there is no public domain literature. And you are a hypocrite when you argue about public domain in your examples, while denying public domain in examples other people mentioned.
Irrelevant. RAG (retrieval augmented generation) is likely not fair use either, and it also “reproduce” copies that is copyright infringement.
You are effectively defending AI companies! Not a straw man. Because none of the research purpose only AI are being sued. What are being sued are commercial AI models despite they have secondary uses that are non-commercial.
And keep in mind the bottom line is: There is no blanket fair use for generative AI. Whether the AI model training is fair use depends on the ultimate use of the AI by its users.
You have fucking no idea how court judges rule fair use. In the U.S., the law doesn’t state “book report is always fair use” nor “commentary is always fair use”, because there could always be exception cases where a “report” reproduce a significant portion of the original work, so that the “report” becomes a market substitute of the original. The court judges have to always evaluate the four factors to determine fair use. No skipping steps.
Factor one: Purpose is commercial selling of the book (against fair use). Factor two: Nature of copyright work is fiction books, but likely already published (neutral or slight for fair use). Factor three: The amount taken is likely not minimal (neutral, but let’s consider this leans for fair use for the sake of argument). Factor four: It creates a market substitute for Stephen King’s own guidebooks on how to write fictions (thus against fair use in this most important factor).
This is how in the U.S. the fair use is evaluated. Oh wait, this analysis is quite similar to the Thomson Reuters v. Ross Intelligence ruling in the district court.
The analogy does not have to be “realistic” as of how humans would do it. It’s AI. That is one difference between human learning and AI “learning”, mind you. The AI training process copies significant amount of works during the data gathering phase, which is even before the data is transformed into neural network weights.
Quote the exact section and sub-section of the law.
Are you suggesting the U.S. Copyright law is unconstitutional? Because no case law ruled that, and the quoting of your Amendment 10 is vague because no even a State law implies a right of “using a copyrighted work within a person’s lifetime”.
Unless you can sue to the Supreme Court about the constitutionally of the modern copyright law, I would disregard this one as being unfounded.
Irrelevant. Because AI is not a person and has no free speech.
The Supreme Court ruling in Warhol is not propaganda, and you can try ignoring all arguments of the opposite side, but by doing do I’ll ignore your argument (and those from EFF, too). It’s good to live yourself in an echo chamber, until you see the fact that the Supreme Court wasn’t on your side.
I don’t have to “stop that”, but I can get paid for my work and use it on environmental protection campaigns that could eventually stop that. I’m not the good guy as I have said before. I just want bad guys to stop exploiting the creative labor of humans.
And the copyright law fits exactly one of the purposes of regulating AI, especially on the no-exploitation-of-human-labor part.
Oh yeah and I would stop your damn health care provider from using AI for your case because that’s exploitation and should be stopped.
And the whole fucking point is creative worker are exploited by AI companies without their “preference” either.
Why should I care about your health care if there is a chance that I may live with a shorter lifetime than you? This is off-topic, but the point is that it’s not necessary for your healthcare to depend on AI, and even when it does, it’s not an excuse for exploiting creative works (why should healthcase AI be trained with novels anyway?)
True. And I am stripped that market opportunity because of AI.
Re: Re: Re:7
Except it’s not. You claimed children learning to write by reading copyrighted works was a copyright infringement. It is not.
Students don’t publish their book reports usually.
It would have to have the sentences and paragraphs to be able to translate it. You can’t translate if you don’t have the original text. The model doesn’t have the original text!
You made up the translation scenario. That’s not realistic or relevant.
That’s not expression. That’s an idea.
It’s not translating at all. You keep making bad analogies that have nothing to do with the topic. It just reveals, again, again, that you don’t understand how the technology works and you don’t understand how US copyright law works.
And you can literally legally learn to write from copyrighted content. This isn’t an analogy. This is literally my lived experience. If I remember how to write based on what I’ve read, it’s legal. I often don’t remember the specific sentences I’ve read but the method of writing is retained, such that the “data” that I’ve trained myself on isn’t even present in my head. I can’t quote word for word much of some writer’s expression despite remember plots, characters, and writing style.
This isn’t an analogy. This is literally how I learned to write.
Except I haven’t assumed there’s no public domain. The existence of the public domain just doesn’t magically make it illegal to learn to write from copyrighted content. It’s not a necessary component of the argument because you don’t have to only rely on the public domain.
It’s essential. The whole point is that you’ve been claiming the models can quote content you and then now admit it can’t quote without an added feature. The models don’t contain the original trained data. The added feature isn’t a de facto part of all LLMs. You’re trying to find anything that can be infringing and you keep ignoring the entire reason we’re arguing here. You think I’m saying nothing is infringing. I haven’t said that.
I am not. Again, again, again, you need to reread my first posts here.
I’m not talking about these lawsuits you keep bringing up. You’ve brought them up, not me. You keep pretending I’ve been defending the actions taken by the AI companies. You really, seriously, definitely need to read what I first said at the start of this whole thing. It would save you so much time. Here, I’ll do it for you, again:
“My frustration with the arguments of people claiming it’s not fair use and that all training must be licensed is that many people seem to think they’re championing the little guy when they’re inadvertently advocating for the benefit of the wealthy and corporations.”
If you don’t disagree with every part of what I said there, you should stop responding.
Not necessarily. The uses can be infringing without the model being infringing, the same way a VCR can record content off TV and the person making the recording can violate copyright by selling it without authorization or they can use it for time-shifting their fair use viewing. If your assertion were true, then any technology that can be used for an infringing use would be illegal. You wouldn’t be able to read this message because your Internet connected device would be illegal. Again, you don’t understand US copyright law.
What if the book becomes bestselling later on?
I do. I’ve studied many cases and decisions. You should do the same, and not just some corporate propaganda version.
You’re changing the scenario. A book report doesn’t mean it gets published. You could reproduce a significant portion of the original work for a classroom assignment and it won’t be a market substitute at all. At most, the teacher would just tell you that it’s not necessary for the assignment to quote quite so much. It’s unlikely more than the student and the teacher would ever read the report.
That’s not the purpose of a book report. It’s education.
Some people write book reports on non-fiction. Stephen King writes non-fiction books, like his On Writing book.
You’re saying they use a lot when they likely wouldn’t. This is like saying, “if someone owns a gun they’re a murderer because I made up a scenario where they murdered someone with their gun.” You are starting from the conclusion. That’s intellectually dishonest.
It doesn’t create a market substitute. You’ve only claimed that a teacher would ask a student to publish their commentaries about Stephen King’s work, which isn’t likely. What publisher is going to print that? Are they self-publishing? Whose doing the formatting? Is this for an assignment? Does the curriculum for the class cover this purpose? Your scenario doesn’t make any sense.
It wasn’t just an analogy. You said students learning to write from reading was a copyright infringement.
Wait, the data is transformed? Would you say the process of transforming something is… transformative?
“…the author…shall have the sole right and liberty of printing, reprinting, publishing and vending such map, chart, book or books, for the term of fourteen years from the recording the title thereof in the clerk’s office… And if, at the expiration of the said term…the same exclusive right shall be continued to him or them, his or their executors, administrators or assigns, for the further term of fourteen years;”
No, quote me where I said that. I said that the public was deprived of the use of the public domain works within their own lifetime. And I provided the source of that.
And I didn’t claim that.
I didn’t say it expressly said that. And you completely missed the pertinent clause in the 10th Amendment that I was referring to.
Unless you can understand what I’m arguing, you’re dismissing your own ignorance here.
I am a person and I have free speech. And my free speech, representation, redress, and other rights have been subverted by these large corporations you’re siding with. And what’s worse is that you’re siding with them over profits. You’re saying you care about hypothetical sales rather than the danger of my country turning into an authoritarian hell-hole.
Then link to the ruling and not to a propaganda organization.
That’s not the opposite side. There aren’t only two sides. I’ve been pointing out that you’ve been engaging in a false dilemma from the very beginning but you keep applying that myopic perspective.
You have already been ignoring everything I have said since the beginning. I’ve pointed out a field full of straw men. There are no crows for miles.
I don’t live in an echo chamber. I’m aware of the bad, selfish, greed-driven perspectives of unabashed exploiters. I just won’t accept them as a valid citation when we’re talking about truth, not propaganda.
The Supreme Court has been making some unconstitutional decisions. Many justices are openly corrupt now. Many of their appointments were the result of corruption and unlawful activity. I won’t be surprised if any particular SCOTUS decision goes a way I’d disagree with.
How do you propose doing that? You don’t have influence over all those companies and institutions. They have free will. They’re going to do it whether you or I approve. Setting a legal precedent that all LLM training requires licensing won’t stop that.
You are siding bad guys who are exploiting the creative labor of humans. That’s literally what I’m railing against! I am the exploited. You’re cheering on the people who have exploited me! I know you aren’t the good guy. You are a sycophant to the wealthy and powerful.
The entire system of capitalism is built on the exploitation of human labor. That ship has sailed. Private owners of the means of production make money from owning things. They’re not doing the work themselves. A worker can sometimes generate millions of dollars worth of profits for a company and only pull down a barely living wage. You’re pretending like AI companies are the only exploiters out there. It’s the whole damn system!
But you won’t be able to. Again, even if LLM training data must be licensed, wealthy LLM companies will still be able to afford to license them (and small creators will get virtually nothing), and so the LLMs will get trained and used and pushed on us without our consent. You might as well say it’s okay because unicorns are going to emerge and give everyone a million dollars. Having unrealistic expectations of how things are going to go doesn’t justify bad decisions today.
The whole fucking point is that workers are exploited by companies without their preference. It’s not just creatives. It’s not just AI companies. It’s all capitalist companies that aren’t employee owned.
I could get hit by a car tomorrow and die. What does the hypothetical that you or I would outlive each other have anything to do with the moral argument that health care should be humane and affordable? That you pose it as a zero sum scenario is weird.
We’re not talking about LLMs being trained on fiction works. As you should recall, had you ever actually understood my premise, stated several times now, I am saying that you can’t just assert that all LLM training must require licensing, because it’s not established in law or case law at this time. And I am not talking about the big AI companies getting sued right now. I’m talking about anyone who might ever want to train an LLM in the future in the US, you know, including those poor people and students and researchers you don’t apparently think exist.
This doesn’t seem like a loss for us. I can’t imagine your works would be appreciated in the US. You can’t understand things that have been taught to you multiple times.
Re: Re: Re:8
There is no blanket fair use for children’s learning, mind you.
American Geophysical Union v. Texaco case shows that intermediate copies could consistitute infringement. For this particular case it’s employees of a for-profit corporation learning and it’s human learning case that was ruled not fair use.
Because in the AI “pre-training” phase the text has been translated to model weights! It does not need to be text to constitute “copies”.
Why the hell should I care about your learning experience? Are you AI and not human?
You keep presenting this as a fact while I dispute this many times! The model contains data in a different form than what the copyright content was originally “fixed” in. And for the purpose of copyright, the model itself is a derivative work!
And because it’s derivative work it needs copyright license to distribute, period.
You cited no single case where AI training is not infringing or “fair use” because there is none (yet). Stop making the claim that you can train AI without copyright license!
If only the model is not infringing (which is false premise already, and I don’t need to argue about the further what-if scenario).
(And I wish there are AI models with data fully licensed, damn it! But what we’ve seen are AI companies trying to defend their scraping is fair use while their arguments don’t hold. Thomson Reuters v. Ross is an example case.)
No. You’ve the confused between the exemption for schools (§ 110(1) and (2)) and the fair use exemption (§ 107). I’m not talking about the § 110 case. Copying for school classroom use is already exempt so I don’t need to address the fair use four factors.
Factor one must be evaluated with the ultimate purpose of the use. So it’s not “education” in the intermediate step, but the commercial publishing of the notes/book that matters. Texaco case law.
Except that you are refuting a straw man. Even when what you say is true here, Factor Three would still rule neutrally here.
Did I say the hypothetical scenario has to make sense? This is actually what generative AI has been doing as a metaphor. So no need to question whether there is a “publisher” who would print that because there just is.
USCO even said that generative AI outputs are transformative, so what’s the issue here?
You mistakenly believing “transformaive = fair use” is the issue.
There is no such right of “using works within their lifetime”! Even with your quote on U.S. Copyright Act of 1790 doesn’t say there is such right.
Your quoted acts only says about the copyright of 14 years, but then, there is nothing unconstitutional for extended that lifespan to “life + 50 years” or “life + 70 years”. When you insist on the a right that doesn’t exist in statute, there is nothing to be “deprived” of.
https://www.supremecourt.gov/opinions/22pdf/21-869_87ad.pdf
So you have more authority than the Supreme Court? Get it.
I would go for setting a legal precedent for that whether you like it or not. Call me a bad guy whenever you want, because you even disregard the Supreme Court.
At least I can get paid for my works! I don’t care whether you are exploited! I even dismiss your claim about “poor people” as without evidence. Are you happy now?
And why the heck should I let AI companies keep exploiting anyway? I have no obligation to fight what you called the “whole damn system”.
Why do the small creators need LLM to do anything? You didn’t answer this question when I asked before, so the rest of the outcome you suggest is nonsense!
“LLM gets pushed on [me] without [my] consent”? What the fuck?
Because LLM is a luxury and not everyone can get the luxury however you tried. The LLM requires a server farm which means smaller companies won’t have that luxury to deploy an LLM unless they can rent the servers from someone else. This is unchangeable. A better world would be every personal computer be able to run SLMs (Small language models) instead of rely on LLMs for most computing tasks that need AI. Assuming the SLMs have training materials that are all licensed, by the way.
Re: Re: Re:9
You don’t need fair use. It’s a non-infringing use. There is no legal precedent that human learning from copyrighted content is a copyright infringement (and to be clear, we’re just talking about learning, not some unrealistic impractical convoluted scenario you’ve invented where infringement is assumed). If there was such a precedent, you could cite it. You can’t.
That has nothing to do with human learning. That case was about intermediate copies, not non-fixed human brain learning.
The model weights aren’t the training data. It is the process of the model learning, not a translation or encryption or compression of the training data. You can’t take the weights and faithfully reproduce the training data.
You said human learning was a copyright violation. If you can’t accept the legality of that, you’re admitting you’re just a myopic copyright maximalist trying to invent new rights that don’t exist.
You can claim it’s not true all you like. That doesn’t mean you’re correct. You’ve demonstrated that you don’t understand how the technology works.
The data the model contains is not a different form of the training data. It is different data. It is data created by the model.
The model doesn’t contain the original data. There’s nothing to be derivative of in the model.
You can’t cite case law or law that says this.
You just contradicted yourself. If no case law or law has yet found LLM training to always require licensing, then it is legal until such case or law becomes a precedent. You’re admitting here both that your claim that it’s illegal is just opinion, not fact-based and you’re admitting you don’t understand US laws and legal precedents. That understanding you prove you don’t have is a prerequisite for having a useful perspective on this matter. You’re getting to be as non-sensical as Terop.
You just admitted that the model hasn’t been determined to be infringing yet. Stop contradicting yourself.
There will be eventually, but you’re missing that creators will still be exploited. The big media corporations will get the licensing funds and very few creators will get much of anything. Creators will be pressured to license their work for almost nothing when signing new contracts. The future you think you’re railing against and that you think can be prevented by successful lawsuits will not be stopped by these methods.
No, you’re confused between the lack of a 106 right being established that covers human learning and the need for a justification that something is fair use. If there’s no right, then there’s no need to argue fair use.
Your entire fair use analysis is unnecessary.
Reality dictates that it must. You don’t legislate precedents based on absurd scenarios that will never happen.
Generative AI output isn’t by default published to a market. That’s a false claim.
It’s not always fair use, but it’s a strong argument in favor of fair use. It frequently is.
I didn’t claim the 1790 law was where the right was derived. I said the 10th Amendment was the source of the fact that it was legal before the expansion of the length of copyright.
I didn’t claim it was unconstitutional. You used that term. I said US citizens were deprived of the ability. I didn’t even call it a right. You did.
Authority? No, quote me where I uttered that straw man.
So you accept that you’re rabidly arguing for a futile, useless scenario.
You called yourself one.
SCOTUS hasn’t yet ruled on this issue, but I do condemn corrupt SCOTUS decisions. They have made unconstitutional decisions at times. Would you argue that a SCOTUS justice isn’t capable of being corrupt or making a biased decision that contradicts the Constitution? If so, what are you smoking that makes you so trusting of the wealthy and powerful?
These lawsuits won’t get you a lot more money. The big media companies will just get a pay out from the AI companies, some of whose stock are owned by the same people, and you’ll get pressured to accept a contract that includes the licensing for nothing or almost nothing. You’re literally fighting for someone else’s profit at the expense of poor people.
Yes. I’ve been saying this, but I appreciate you admitting it. You are saying you have no moral stance here. You’re just out for your own. You don’t care about others. That means no one is going to care about you or your pet issues. This should be the end of anything you say on the topic. Everything else you say is derived from this bias.
You have no power to fight the whole damn system or to allow or stop the AI companies. You wouldn’t be impotently arguing with straw men in this comment section if you had any power or influence. You’re just strutting around here bragging about how you only care about yourself. No one needs to feel any empathy for you. Your position is so futile and contradictory and ill-informed and full of propaganda that I don’t think you’re a creator. You seem to be a shill for the Copyright Alliance or a similar organization.
Ask them.
You assume I’m speaking for them even though I’ve said and you’ve agreed that I represent myself. But you’re also ignoring that it’s not just about creators. It’s about all US citizens, including the majority whose existence you dispute.
You’re ignoring local LLMs. You’re ignoring smaller purpose LLMs. You seem to assume every LLM present or future will be some big generalized LLM like ChatGPT or Gemini. You’re admitting your ignorance again.
It’s not only changeable, but there are also contrary examples that prove your claims wrong.
Why do you think this won’t ever be reality? It’s already happening. This is another demonstration of your ignorance.
Re: Re: Re:10
Define “learning”. Because it looks like we have definition difference for the term.
Again, define “learning”, before I can go with this debate.
Again define “learning” when it comes to machine as well.
Created through what?
Then the models comes from thin air?
You didn’t read any recent lawsuit about AI training, did you? There is one case law, Thomson Reuters.
You can only say this before March, 2025.
And I can wait until another case law’s summary judgment is made, which is no later than this year.
It’s marginally better for a creator to be “forced” to sign a licensing contract, than have their works taken without permission!
For two things: (1) the creator can get paid under the contract; (2) if the contract turns out to be unfair labor practice or anything, they can sue with sufficient evidence.
“By default”? What the fuck again? How can the companies be charged with secondary liability of copyright infringement only when the output is published “by default”?
By the way, there’s a recent complaint. Midjourney keeps the user’s image generation for public viewing in its “Explore” pages, and Disney (and Universal) cited the Explore pages for infringement proof. Say to Disney that it’s not “published”.
Which argument?
And yet the expansion of copyright through Congress wasn’t illegal. End of story.
‘Oh! I am “deprived” of the ability to kill! I am “deprived” of the ability to commit adultery! I am “deprived” of the ability to steal!’
Seriously, what the fuck?
That is a political question. I don’t reply this because it’s out of scope. You go persuade the politicians because this is nothing to do with me.
It’s already happening even without the suit! Google admits that it uses YouTube videos to train their video generating AI, Veo 3 (https://www.cnbc.com/2025/06/19/google-youtube-ai-training-veo-3.html)
And there’s no way to opt out!
YouTube Terms of Service
“By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content (including to reproduce, distribute, prepare derivative works, display and perform it) in connection with the Service and YouTube’s (and its successors’ and Affiliates’) business, including for the purpose of promoting and redistributing part or all of the Service.” (Emphasis added)
I’m fighting for my profit. And dismiss “poor people” again. (This “poor people” is spamming bullshit as there is no single witness showing up. Why the fuck should I assume “poor people” exists or why should I care for them?)
My moral stance is respect copyright and no copyright exemption for AI training. There is no “others” you mentioned that I should care about. Bring a witness here, or else I dismiss.
I ask you. Because you brought this bullshit argument about “poor people” need access to LLM.
I dispute even the “majority” word of yours here.
I would rather have polls like this before you claim you opinion is the majority:
https://theaipi.org/poll-biden-ai-executive-order-10-30-5/
Re: Re: Re:11
We’ll go with the verb form:
Verb
learn (third-person singular simple present learns, present participle learning, simple past and past participle learned or learnt)
To acquire, or attempt to acquire knowledge or an ability to do something.
In the context of English-speaking American children learning to write, it means reading and using pattern recognition to understand how nouns and verbs and other parts of speech are used in the composition of complete sentences. It also involves remembering vocabulary words so you can write more concepts. With that learning, you can compose a large variety of sentences. None of this process requires memorizing specific sentences composed by other people, copyrighted or not. It can be learned from reading copyrighted or public domain works. It is not illegal in the United States and is in fact referenced in Article I Section 8 Clause 8 of the Constitution as the original purpose of copyright: “To promote the progress of science and useful arts…” And in that historic context, science meant knowledge/learning.
I doubt my definition will enable you to argue anything coherent since you seem to intentionally take the worst stances and also willfully misinterpret what I say, even after correction.
Research how LLMs are trained.
A process.
It comes from the process.
We’ve literally already talked about this. You don’t seem to remember anything we’ve said. Are you an LLM that can’t track a conversation over a long period of time? The appeal of Thomson Reuters v ROSS is not decided yet. It doesn’t mean anything yet. There is not precedent. And it doesn’t even cover the scope of what we’re talking about, so it won’t set the precedent you think it will.
Again, again, again, you don’t understand US laws or how precedents are set. A ruling in a case that A) is on appeal B) doesn’t cover the entire scope of your argument and C) is not a precedent is not a good thing to rest your entire claim on. The details of Thomson Reuters v Ross aren’t the same as other scenarios. It’s not a blanket decision that all LLM training requires licensing.
So wait and stop making claims that you acknowledge aren’t based in current law or case law.
I’m a creator. It’s functionally the same to me. Being coerced is the same as not giving permission.
By whom? The AI company? It’s not going to pay small creators shit.
Yes, with their big bags of money to hire lawyers, small creators will sue and definitely beat large corporations with top tier law firms on speed dial.
You really have no fucking idea how anything works.
I didn’t say anything about whether they could be charged with secondary liability. You pretended in your unrealistic analogy that the output of an LLM is typically published publicly. It more often isn’t. People use a lot of LLMs privately.
I have already said I’m not defending any particular AI company in any lawsuits. And you continue to bring them up as if I represent a person who supports them despite multiple iterations of me saying they’re not the people I’m concerned about. Big corporations can get fucked, regardless of what product or service they profit off of.
That a use is transformative.
First, I didn’t claim it was illegal. Second, the Nuremberg Race Laws in Nazi Germany were legal. Legal is not the same as moral or ethical or right. Corrupt legislators can make immoral laws. Corrupt justices can make immoral case law. Are you willing here to say that you think all laws are morally correct? That would be a bold and stupid admission.
That you equate the use of a public domain work with the ability to kill or steal is insane! Holy shit!
Also, committing adultery is legal in the US, so the inclusion of that one is another admission on your part that you don’t understand what you’re talking about. Full stop.
That’s what I’m saying! What the fuck is up with your moral equivalences?!? Do you think people who use public domain works should be imprisoned or executed? That’s fucked up.
You’re arguing for a change in laws and rights. That’s political. And it’s infinitely more important than the petty 50 cents you think you might get out of licensing works for LLM training. And your admission that you don’t care about it makes your myopic obsession with a different country’s laws irrelevant.
This supports my argument. I appreciate you pointing out that I’m right. Feel free to continue doing so. You’ve already done this a lot.
You won’t get shit. You’re fighting for nothing.
I don’t know if you understand how arguments work, but it’s absurd to insist that I abduct someone and drag them into an argument they may not already be interested in. This is the weirdest demand for proof. Poor people exist. American citizens exist. Their rights are affected by case law and legal precedents. None of this should be in dispute. That you dispute it is intellectual dishonesty.
You shouldn’t assume. You should know. Have you never met a poor person? Are you not able to simply search the internet for the portion of the US population living under the poverty level?
If you don’t already understand why you should experience empathy for other human beings, I don’t know that I can answer this for you. If you admit you’re a sociopath incapable of empathy, then there’s no point in discussing anything anymore. You’re just a selfish toddler throwing a tantrum.
That’s not a moral stance. You’ve admitted you’re only interested in your own profit. Unless you’re arguing your benefit is the only basis for morality, in which case you deserve no empathy from anyone else.
You can “dismiss” all you want. It doesn’t change the existence of other people you should care about. If you can’t even fathom the existence of such a person without having one of them shoved in your face, you’re at best a solipsist and at worst a narcissist and neither means your opinion carries any value.
I didn’t say they need access to an LLM. I said they currently have a right to train an LLM if they want to because no law or legal precedent yet says otherwise. There are plenty of people training their own LLMs. Look at Hugging Face and GitHub. You’ll find lots. Ask them.
You can’t dispute a fact without evidence. You have no evidence. Not everyone in the US is wealthy. Prove otherwise.
I didn’t claim my perspective was in the majority. The majority of people don’t care about this topic.
You should read that poll more closely. It doesn’t say what you think it says. And it doesn’t speak to the topic we’re actually discussing. Where was the question about licensing for LLM training?
Again, again, again, (how many times do I have to point this out?) you are not arguing with me. You’re arguing with a straw man. I’m not fighting for AI companies. I’m not saying AI should be used in drones in warfare. If I have ever said such a thing, you could fucking quote me and you never quote me whenever I ask you to quote me on something like this. Your failure to quote me is your tacit admission you’re full of shit.
Well then. The ruling on Meta’s fair use is out.
https://storage.courtlistener.com/recap/gov.uscourts.cand.415175/gov.uscourts.cand.415175.598.0_1.pdf
remotely like using books to create a product that a single individual could employ to generate
countless competing works with a miniscule fraction of the time and creativity it would
otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the
fair use analysis.“