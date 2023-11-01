Judge Dismisses Most Of The First Of The Many ‘How Dare AI Train On My Material’ Lawsuits
There are now a bunch of these lawsuits accusing AI companies of some sort of copyright infringement for training their models on works of plaintiffs. However, the first high profile one was the case brought by Sarah Andersen, Kelly McKernan, and Karla Ortiz against Stability AI, MidJourney, and (bizarrely) DeviantArt. We covered the case back in April when the companies moved to have the case dismissed.
As we noted at the time, somewhat incredibly, the artists hadn’t even registered the copyright in their their works before suing, which is the kind of error that dooms cases. It’s also embarrassing for their lawyers, Joseph Saveri and Matthew Butterick, who have been behind a bunch of these lawsuits, and present themselves as knowledgeable lawyers, yet couldn’t even think to make sure the works they sued over were registered? (For the non-copyright experts among you, you don’t need to register your works to have a copyright, but you do need to register them if you intend to sue).
Now the judge, William Orrick, had mostly dismissed the case, as we expected after the oral arguments made it pretty clear how weak this case was.
The first issue was the lack of registrations:
Each defendant argues that McKernan and Ortiz’s copyright claims must be dismissed because neither of them has registered their images with the Copyright Office. They also move to “limit” Anderson’s copyright claim to infringement based only on the 16 collections of works that she has registered….
In opposition, plaintiffs do not address, much less contest, McKernan or Ortiz’s asserted inability to pursue Copyright Act claims. At oral argument, plaintiffs’ counsel clarified that they are not asserting copyright claims on behalf of these two plaintiffs. July 19, 2023 Transcript (Tr.), pg. 17:1-5. As such, McKernan and Ortiz’s copyright act claims are DISMISSED WITH PREJUDICE.
Likewise, plaintiffs do not address or dispute that Anderson’s copyright claims should be limited to the collections Anderson has registered. The scope of Anderson’s Copyright Act claims are limited to the collections which she has registered.
Even in the limited case of Andersen, who had registered some works, the defendant companies pointed out that no actual arguments were made regarding which actual works the AI systems were trained on, but the judge says that part can at least move forward with discovery.
As we had noted earlier, the inclusion of DeviantArt never quite made any sense at all in this lawsuit, and again seemed to show that Saveri and Butterick had no clue what they were doing. DeviantArt did not create an AI system, but instead, the argument was that Stability’s AI was trained on DeviantArt works. As the court notes, that doesn’t make DeviantArt liable for any direct infringement:
Plaintiffs fail to allege specific plausible facts that DeviantArt played any affirmative role in the scraping and using of Anderson’s and other’s registered works to create the Training Images. The Complaint, instead, admits that the scraping and creation of Training Images was done by LAION at the direction of Stability and that Stability used the Training Images to train Stable Diffusion…. What DeviantArt is specifically alleged to have done is be a primary “source” for the “LAION-Aesthetic dataset” created to train Stable Diffusion…. That, however, does not support a claim of direct copyright infringement by DeviantArt itself.
The court does allow the plaintiffs to amend their complaint to argue that “compressed images” of the infringing works are somehow included in the AI training database, but seems skeptical.
There was another theory presented that because users of DeviantArt can use the DreamUp tool that is based on Stability that it is creating infringing works, but that’s not how any of this works. DeviantArt notes that no matter what prompt you put into the system, you’re not going to get “substantially similar” works out of the system, and they need to be substantially similar to infringe.
The plaintiffs argue that you don’t need substantial similarity because the new works are inherently derivative of the works it was trained on, but the court says that’s wrong:
Plaintiffs rely on that line of cases and point to their allegation that all elements of plaintiff Anderson’s copyrighted works (and the copyrighted works of all others in the purported class) were copied wholesale as Training Images and therefore the Output Images are necessarily derivative….
A problem for plaintiffs is that unlike in Range Road – observed wholesale copying and performing – the theory regarding compressed copies and DeviantArt’s copying need to be clarified and adequately supported by plausible facts. See supra. The other problem for plaintiffs is that it is simply not plausible that every Training Image used to train Stable Diffusion was copyrighted (as opposed to copyrightable), or that all DeviantArt users’ Output Images rely upon (theoretically) copyrighted Training Images, and therefore all Output images are derivative images.
Even if that clarity is provided and even if plaintiffs narrow their allegations to limit them to Output Images that draw upon Training Images based upon copyrighted images, I am not convinced that copyright claims based a derivative theory can survive absent “substantial similarity” type allegations. The cases plaintiffs rely on appear to recognize that the alleged infringer’s derivative work must still bear some similarity to the original work or contain the protected elements of the original work. See, e.g., Jarvis v. K2 Inc., 486 F.3d 526, 532 (9th Cir. 2007) (finding works were derivative where plaintiff “delivered the images to K2 in one form, and they were subsequently used in the collage ads in a quite different (though still recognizable) form. The ads did not simply compile or collect Jarvis’ images but rather altered them in various ways and fused them with other images and artistic elements into new works that were based on— i.e., derivative of—Jarvis’ original images.”) (emphasis added); ITC Textile Ltd. v. Wal-Mart Stores Inc., No. CV122650JFWAJWX, 2015 WL 12712311, at *5 (C.D. Cal. Dec. 16, 2015) (“Accordingly, even if Defendants did modify them slightly, such modifications are not sufficient to avoid infringement in a direct copying case. . . . Thus, the law is clear that in cases of direct copying, the fact that the final result of defendant’s work differs from plaintiff’s work is not exonerating.”) (emphasis added); see also Litchfield v. Spielberg, 736 F.2d 1352, 1357 (9th Cir. 1984) (“a work is not derivative unless it has been substantially copied from the prior work”); Authors Guild v. Google, Inc., 804 F.3d 202, 225 (2d Cir. 2015) (“derivative works over which the author of the original enjoys exclusive rights ordinarily are those that re-present the protected aspects of the original work, i.e., its expressive content”).
Then there’s MidJourney. The complaint argues that MidJourney uses Stable Diffusion, but MidJourney hasn’t actually said that anywhere, so the judge asks for clarification:
Plaintiffs need to clarify their theory against Midjourney–is it based on Midjourney’s use of Stable Diffusion, on Midjourney’s own independent use of Training Images to train the Midjourney product, or both?
The vicarious infringement claims are all dismissed as well, with leave to amend, specifically regarding the claims that Stable Diffusion supposedly has “compressed copies” of the images it was trained on:
Plaintiffs have been given leave to amend to clarify their theory and add plausible facts regarding “compressed copies” in Stable Diffusion and how those copies are present (in a manner that violates the rights protected by the Copyright Act) in or invoked by the DreamStudio, DreamUp, and Midjourney products offered to third parties. That same clarity and plausible allegations must be offered to potentially hold Stability vicariously liable for the use of its product, DreamStudio, by third parties.
There’s also a claim over DMCA 1202(b), alleging that the defendants removed copyright information from the images, but as the defendants note, there are no allegations of any actual removal.
In response, plaintiffs point to paragraphs180 and 191 of their Complaint, where they allege generally that plaintiffs and “others” in the putative class included various categories of CMI in their works and the “removal or alteration” of that CMI by defendants, including “the creator’s name” and “the form of artist’s signatures.” These allegations are wholly conclusory. In order to state this claim, each plaintiff must identify the exact type of CMI included in their online works that were online and that they have a good faith belief were scraped into the LAION datasets or other datasets used to train Stable Diffusion. At the hearing, plaintiffs argued that it is key for the development of generative AI models to capture not only images but any accompanying text because that accompanying text is necessary to the models’ ability to “train” on key words associated with those images. Tr. at 9:13-24. But there is nothing in the Complaint about text CMI present in the images the named plaintiffs included with their online images that they contend was stripped or altered in violation of the DMCA during the training of Stable Diffusion or the use of the end-products. Plaintiffs must, on amendment, identify the particular types of their CMI from their works that they believe were removed or altered.
The publicity rights claims also go nowhere:
The problem for plaintiffs is that nowhere in the Complaint have they provided any facts specific to the three named plaintiffs to plausibly allege that any defendant has used a named plaintiff’s name to advertise, sell, or solicit purchase of DreamStudio, DreamUp or the Midjourney product. Nor are there any allegations regarding how use of these plaintiffs’ names in the products’ text prompts would produce an “AI-generated image similar enough that people familiar with Plaintiffs’ artistic style could believe that Plaintiffs created the image,” and result in plausible harm to their goodwill associated with their names, in light of the arguably contradictory allegation that none of the Output Images are likely to be a “close match” for any of the Training Images… Plaintiffs need to clarify their right of publicity theories as well as allege plausible facts in support regarding each defendants’ use of each plaintiffs’ name in connection with advertising specifically and any other commercial interests of defendants.
There were also unfair competition claims, but those are pre-empted by the Copyright Act (basically, the UCL claims are really an attempt to argue infringement under a different law, and you can’t do that).
DeviantArt also brought an anti-SLAPP motion, which the judge says he’ll take up after the plaintiffs present an amended version.
So, the end result is that only the claim by Andersen (misspelled as Anderson throughout the ruling) of direct infringement against Stability AI can move forward for its apparent collection of images for its training database. Even that may still end up on the losing end after discovery. But this at least allows for discovery to happen first. Basically all the other claims are dismissed, though the plaintiffs can still amend those, but they’re going to have to overcome some pretty massive inherent weaknesses.
The lawyers for the plaintiffs are claiming victory because of the one claim being allowed to go to discovery, but this is a pretty embarrassing outcome for them overall.
Filed Under: ai, copyrigght, copyright registration, direct infringement, joseph saveri, karla ortiz, kelly mckernan, laion database, matthew butterick, saraha andersen, training, william orrick
Companies: deviantart, midjourney, stability ai
On the contrary, this is a major victory for the lawyers. There is at least one claim standing, so defendants didn’t manage to have the whole case dismissed out of hand. The remaining defendants will now be subject to the ruinous expense of discovery and motion practice, with the prospect of statutory damages looming over their heads if they blink (because any stumble in the motion practice can result in a default judgment). Plaintiffs are now in a great position to
extortnegotiate a settlement.
Extort is probably right, as it looks like the plaintiffs saw an opportunity to make money from their art which was not forthcoming by the normal means, hence the lack of registration.
I do expect the defendants to move on with this lawsuit; to dissuade other lawyers (and their authors) to file similar lawsuits. Otherwise they will be bled dry by an endless file of marginally competent lawyers (and their unsuccessful authors).
what is infringement?
It will be interesting to see whether the mere act of training is infringement even before a final product is produced.
Because if the training isnt infringement, then you will have to show the final product is LIKE the copyrighted work. A side by side comparison.
Which is the standard for every other type of copyrighted work, with one caveat. For other types of work there’s a defense of “I didn’t have access to the work I’m accused of copying.”. That’s why editors and authors don’t read unsolicited manuscripts or fanfic. For an AI trained on data from the Internet, it’ll be all but impossible for anyone who used it to generate a work to show that the AI never had access to the work they’re accused of infringing. That’ll make it considerably easier for plaintiffs to win infringement suits since all they have to prove is sufficient similarity and then the burden is on the plaintiff to prove the AI didn’t copy the similar elements from the infringed-upon work.
Re: If training AI is infringement, then so is all creativity.
Suppose you read a copyrighted book. You like it so much that it inspires you to create your own original story in the same genre/style. This work is NOT fanfiction or derivative in any way. It’s completely your own original work (at least within the meaning of copyright law). You get that story published, and during an interview, you mention the one book that gave you inspiration to create your story. Upon hearing this, the author sues you for copyright infringement, despite the two works bearing no substantial similarity. The author posits that since you wouldn’t have had gotten that inspiration but for you reading the author’s work, then this somehow amounts to unauthorized copying of the work. Any judge would throw out the whole suit on a motion to dismiss.
Now suppose that instead of a human reading a book and creating a new work in a similar genre/style without copying anything, it’s a computer program. The computer program in question is some form of generative AI (like Chat-GPT if we keep the book example). The AI was trained on the book, which doesn’t copy the book into its data set, but rather ingests its content in a neural network, and puts what it learns in its data set. Then, someone uses that training data and gives it a prompt to write a story in a similar genre as the book it was trained on, and it spits out something that is NOT substantially similar to the original work it was trained on. If the author sues the AI company (and maybe the user who published the AI’s output) for copyright infringement, then how is it any different from what the human creator did in the first example?
But because it’s a computer doing what a human would normally do, somehow this changes the analysis? That is what all these AI suits are arguing, likely from a misunderstanding as to how the technology works (I guarantee that the data set of images in the suit mentioned in this story is NOT simply compressed copies of the images that are at issue). If a judge rules that this is infringement, then copyright law will become distorted so much that it reflects exactly what we critics of copyright have been saying for a long time: There is no such thing as a 100% original work. Every work is based on something that came before, whether it be an inspiration-type example like my first example above, or something that is more emblematic of remix culture. Like Kirby Ferguson said, everything is a remix. This debate about copyright and training AI is only putting this fact more at the forefront.
It’s worse.
It also means learning is now banned.
For both humans and LLMs.
Can’t read a book unless you pay your fees, and those fees will bankrupt anyone who isn’t the 1%.
If anyone can’t see where this is going…
No, the author of the book that inspired you would point to the elements that are the same between his work and yours and assert that you copied them from his work. The burden is now on you as the defendant to show that either those elements aren’t subject to copyright (eg. Scènes à faire) or that you didn’t copy them (which is almost impossible to do once you’ve admitted you’d read his work). This isn’t even new law, it’s why editors and authors don’t read unsolicited works. Editors in fact have a secretary or assistant whose sole job is to pick any unsolicited manuscripts out of the incoming mail and return them to sender unopened with the editor never even knowing they arrived. Showing that you never had access to a work is much easier than proving that you didn’t copy from it.
You don't have to register to have a copyright
Anything you write is copyrighted when you write it. But if you’re going to go to court, you generally have to prove when you wrote it. Registration is a convenient way to do that. But there is also the “poor man’s registration” — put your document in a sealed envelope and mail it to yourself. The postmark then has a date on it.
Not true in the US. See Title 17 U.S. Code, Section 411 and the SCOTUS ruling in Fourth Estate Public Benefit Corp. v. Wall-Street.com.
The Plaintiff’s lawyers asserted they were not pressing copyright act claims (for the unregistered works), and I am not a lawyer, so I am unclear:
The claims were dismissed with prejudice.
… but if they were not claiming copyright act, can the plaintiffs get their copyrights registered, and THEN pursue copyright claims (Fourth Estate mentions “pursuing claims for infringement prior to registration”)?
The lawyers in oral said they were not pursuing copyright for those (dismissed with prejudice) claims. Are they barred by res judicata for having been raised and dismissed here? Or do the lawyers get to weasel out (“Oh, no, we weren’t pursuing copyright claims for those guys THEN, but we are NOW”)?
A judge who isn’t completely insane, phew. I hope the other cases are as lucky.