Dister's Techdirt Profile

About Dister

Latest Comments (19)

Dister's Comments

Apr 06, 2026 @ 08:34am

on The Social Media Addiction Verdicts Are Built On A Scientific Premise That Experts Keep Telling Us Is Wrong

I submit that teh appropriate way to handle this is to test for 2 things. Is harm from social media at least partially the cause of a result? Does teh platform use algorithms to actively push more “harming content” to the user? This last can be readily tested by the history of content and its position in a feed. If the content, e.g., related to suicide, is increased by teh algorithm, then the platform is guilty of “pushing”, and as with dealers pushing dangerous substances, punishment needs to be meted out to the guilty.
Exactly. This is in fact what the defect design case was doing. Identify a harm, identify an act (or set of acts), show that the act(s) is the proximal cause of the harm. Specifically, the plaintiff had mental health harms, the defendants (Meta and Youtube) had certain includes related to decisions on how their platforms are designed, and the plaintiff was able to show with sufficient proof to satisfy a jury that those acts cause the harms.
Apr 06, 2026 @ 07:58am

on The Social Media Addiction Verdicts Are Built On A Scientific Premise That Experts Keep Telling Us Is Wrong

reshape the entire internet through lawsuits built on a scientific premise that the actual scientists keep telling us is wrong
This is a strawman. It seems to be suggesting that a particular lawsuit creates a regulatory framework akin to actual policymaking. It does not. That is not how product liability, or even civil liability more generally, works. The recent defective design case against Meta and Youtube stands for liability for a particular set of acts that found to have a causal relationship with the harms experienced by a particular plaintiff that a jury found the defendants should reasonably have foreseen. While this can show a way for other plaintiffs to succeed against social media companies, it is not the same as actual policymaking. As such, the plaintiff could win in that lawsuit while the following still being true:
If you know your teen is vulnerable, perhaps due to existing mental health challenges or social struggles, you may want to be extra careful. If your teen is using social media in moderation, and it does not seem to be affecting them negatively, it probably isn’t. ... Social media may be one piece of the puzzle, but it’s certainly not the whole thing.
Indeed, these above passages to which Mike cites include an acknowledgement that certain people may be harmed. The lawsuit found that the plaintiff was one such person. That is enough. Whether it is widespread is immaterial and, quite frankly, misleading. All that needs to be shown is "proximal cause" between a negligent or reckless act and the harm experienced by the plaintiff. I get that Mike doesn't want social media companies to bear responsibility, but this is how civil liabilities works for everyone else, and even if one disagrees with the policy implications, it is not for the judge or jury on any particular case to set that policy. And we wouldn't want them to either. The judicial system is there to apply the law, not make it, and it would be unworkable to have every jury and every judge in the nation arguing with each other over whether a harm is widespread enough to decide in favor of an otherwise meritorious case. The fact that professor Nesi explicitly states that there are at-risk users and that social media is a piece of the puzzle is an acknowledgement that such meritorious cases do in fact exist. Again, Mike may disagree on a policy level that a social media company should be responsible for that, but the correct avenue to create that immunity is legislation that would remove this cause of action. And I am sorry, Section 230 is not that legislation because this lawsuit was not decided on the grounds of any particular content. In the same way that calling heavy usage of social media "addiction" is misleading and makes solving the problem of actual addiction more difficult, conflating a particular lawsuit with a "blanket ban" founded on inherent and widespread harm makes solving the challenge of liability and balancing the interests of the companies and those potentially harmed by them much more difficult. A lawsuit is not the same as a "blanket ban." A particular finding of harm caused by a particular act under particular facts is not commentary on how "widespread" or "inherent" that harm is. I think Mike makes a great point that community based solutions could be a better way to solve these issues on the societal and policy level. But lawsuits are not the policy level and different considerations are at play in every case. No one is served by conflating the two. If Mike has a problem with the way the jury decided, Mike should address the facts that the Jury were exposed to. Not some amorphous strawman of "widespread" or "inherent" rationales for supposed legislative bans.
Mar 26, 2026 @ 09:22pm

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

Very well stated.
Mar 26, 2026 @ 09:10pm

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

From my perspective, people seems to be conflating necessary and sufficient conditions. My understanding of this case is not that "infinite scroll is addictive and causes anxiety." It is that infinite scroll is one of many features that contributed to the addictive qualities, were known to contribute to the addictive qualities, and were implemented to drive engagement (i.e. to take advantage of those qualities). It alone would not be sufficient, but it is, or at least can be, necessary in combination with other features. Whether you agree or disagree with the outcome, I think it is important to understand what the case actually stands for, which is not that infinite scroll, on its own, is evil.
Mar 26, 2026 @ 08:59pm

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

I think there will be a lot of cases against Meta and Youtube in the near future (and probably some others like X). But my generally feeling about all of this is three-fold: 1) I think there is a world where a platform could deliberately or negligently use its product design to cause harm. An example would be if Elon Musk decided to adjust the X algorithm to inundate certain individuals or groups with harmful and egregious posts. This feels to me like the kind of example where there is a bad act likely to cause harm that is not dependent on any particular item of content, and I think it is reasonable to say this kind of action should be subject to legal accountability. 2) Drawing a line from an act to a harm is the entire purpose of civil law. Indeed, courts, at a fundamental level, exist for primarily two reasons - criminal accountability, and resolving civil disputes regarding whether an act caused an injury. There are many other circumstances where showing that causation is also hard, and yet we still endeavor to find to the truth of it. "Hard," I think, is a bad a excuse to avoid answering the question. 3) Litigation against platforms in not inherently bad. It isn't inherently good either, but the tone seems to be "heavens to betsy, however will we survive if a company is sued." I said this above in another response, but tons of industries are heavy regulated and thus subject to lawsuits. They adjust. Everyone survives. In some cases, it even creates a healthier market because users can trust the product and that if something goes wrong, they aren't hung out to dry, and because overall safety increases. You could make the same argument about the FDA and pharmaceuticals. FDA clearance takes years and costs millions of dollars. Biotech and pharma seem to make it work, and we can all be assured (at least in theory) that there is some degree of safety in an FDA cleared product. Overall, my point isn't really that this particular case was correct, or that some of what Mike says won't be true. I think it will be tough for online platforms, at least for a while. But does that means "no company is going to allow anyone to raise concerns ever again” and "You’ll get overly lawyered-up systems that prevent you from doing useful things online, and eventually the end of the open internet." Maybe the facts of this case were bad and the jury should have gone the other way, or maybe the judge needed to better delineate between design and content such that this case should have been dismissed. Questions I guess we will need to wait for the appeal for answers on. But the idea that the only possible result is no less than the death of the open internet seems like a bit much. Algorithmic feeds exist in other countries that don't have section 230. Why are we special.
Mar 26, 2026 @ 08:37pm

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

Yeah, I mentioned above, I was probably being a bit silly in being pedantic on this point. I get the idea. I just think it's important to remember that the same motion to dismiss exists for all civil causes of action. Section 230 provides a relative bright line rule that makes it easier to get a case thrown out at that stage. But the great thing about common law is that the courts develop tests and standards that can help define a good versus bad case.
Mar 26, 2026 @ 08:31pm

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

"Sure, not every decision made will result in a lawsuit, but if discussion on weighing the risks and benefits of making one decision over another can be recontextualized as, “See? They knew there could be a risk, and they did it anyway,” that’s going to chill any future discussion." Not super convinced by this. Most companies, even small ones, have legal counsel and some understanding of how to document their design decisions to reduce liability. And even with a case like this, the outcome was not simply that someone raised a comment that the platform might be unsafe or that safety could be better, but rather, from my understanding, was that the way it was discussed and the decisions were made showed that the issues were ignored or even preferred in order to serve business interests. Again, there are ways to make these decisions and to document them to show reasonable efforts to mitigate safety issues. "As Mike pointed out, the algorithm, design features, etc., are dependent on both the content presented and how the person seeing it interacts with it." From my view, it is yes and no. Mike seems to be saying that the design features (infinite scroll, notifications, algorithm, etc.) alone cannot be blamed for harm unless it is serving content, in which case the content is to blame. While it is true that content sufficient to cause the harm is needed in order for the design features to play a role in causing the harm. However, I would say that it is a bit silly to say that the design features are implemented in a content agnostic way, and it is solely via the content that the features become harmful. We need to live in the real world where, without blaming any particular piece of content (remember, the judge said all claims about the content itself could not proceed in the case), harmful content does exist. It just does. On instragram. On facebook. On youtube. On X. If the design features are constructed in a way to "weaponize" that harmful content, then I think you would be reasonable in saying that those design features are bad, regardless of any particular item of content. I dunno, this argument just feels like saying "well the conspirator didn't rob the bank, he just sent a volunteer to do it for him, so he can't be guilty of any crime." "So, whether you categorize it as just procedural or a legal defense, the difference is the cost involved. For tech giants like YouTube and Meta, it’s pocket change. For new startups and small operations, the difference is whether they can afford to continue operating after the litigation has concluded, assuming they’re victorious in court." True. I was being unnecessarily pedantic. But I will say, people always seem to look at only one side of this equation. Yes litigation is costly for defendants. But it is also costly for plaintiffs. It takes up a ton of time and a ton of money to bring a suit. Sure you could bring a frivolous case, and maybe it survives a motion to dismiss, but each submission to the court will be tens of thousands of dollars, even for the plaintiff. Discover is also expensive for the plaintiff. You could say that maybe these plaintiffs will get lawyers on contingency fees so they only pay if they win, but the lawyers will then only take the case if they are likely to succeed, otherwise they burn a bunch of money. This why I say the catastrophizing of this is a bit over the top. Is there increased exposure? Yes. Is every tom, dick and harry going to come out of the woodwork to file a lawsuit? Probably only the ones that have some reasonable expectation of success. Last point is that Meta and Youtube will probably face a deluge of these of these cases now that they have already been found liable so making the case is much easier. But that does not extend to other platforms like Bluesky or Mastodon or Reddit or whoever else.
Mar 26, 2026 @ 10:24am

on Everyone Cheering The Social Media Addiction Verdicts Against Meta Should Understand What They’re Actually Cheering For

"But that means no company is going to allow anyone to raise concerns ever again." The catastrophizing of this is a bit much. I get that, in some sense, this makes a platform liable for the content it hosts, but this is not suddenly a world where every single design decision will result in a lawsuit. That was not true for car design after Ford Pintos started blowing up, and it won't be true here. Even with this decision, a platform is not liable for a person positing something defamatory or otherwise illegal. What they are liable for is, given the existence of unsafe material, if they design their product in a way that causes harm, then they are potentially liable. That is a very different legal assertion. I also am unconvinced by the "procedural safeguard" nonsense. A lawsuit is a lawsuit, regardless of the grounds and the defenses. Bringing a suit against a corporation that loses because of 230 is STILL BRINGING A LAWSUIT. Having an additional defense is not "procedural" it is legal. You still need to prosecute the suit. There are actual procedural mechanisms to end a suit early and throw out frivolous cases (Rule 12(b)(6) motion to dismiss and state equivalents, and summary judgement, to name two primary ones). But a LEGAL defense is not PROCEDURAL device, the legal defense still needs to be asserted and litigated. I dunno man, I am sympathetic to the "23 words that built the internet" thing, but these absolutes about how 230 is now effectively dead and no one will ever moderate ever again, and reddit and bluesky and mastodon and snap and tik tok are now going to die because they can't moderate is a little dramatic. I think the effects here are much narrower than you are protraying them.
Aug 28, 2025 @ 10:28am

on The Threat Of Extreme Statutory Damages For Copyright Almost Certainly Made Anthropic Settle With Authors

$150k for a company knowingly breaking the law is pretty reasonable. We often complain about laws lacking teeth, and companies treating fines as a cost of doing business. A company deliberately building a ‘pirate library’ should be a company-ending threat. Especially when your CEO is running around saying preferred to steal them to avoid “legal/practice/business slog,” as cofounder and chief executive officer Dario Amodei put it.

One hundred percent. I get that Anthropic does a lot of R&D developing the models themselves and the framework for implementing them, but at the end of the day, the differentiator between AI solutions is in very large part (not all, but certainly very significant) is the training data. So while Mike wants to spin this as "turning a narrow legal loss into a company-ending threat," the offense is a significant driver of the company value in the first place. A ruling can be narrow in its legal applicability, as it was here, but extremely significant in its market applicability. A pirate library of unlicensed works is indeed extremely significant in the market. Training data is very expensive and very valuable. Access to quality data is almost the entirety of Google's business model. So stealing that data from copyrighted works is not a "narrow loss." It is fundamental, both to the copyright holder who derived value from the integrity of the copyrighted work, and to the infringing entity that builds an entire business on the back of that data. The works themselves, both in the aggregate and individually, are what give these models their value. Claude would be useless without training, and would be barely better than useless unless trained on quality data. And these companies absolutely pick works of authorship that contribute to the quality of the training set. So even if one book is of minor value to the training set, it is still at least some value. We cannot just be running around saying because some particular author would not get what we would consider enough money in return for their work, that there should be no guardrails at all on taking their work and taking its value. Especially when that value is a significant driver of the value for the infringer, as it is here. In fact, to say otherwise presents this somewhat bizarre scenario where a potential infringer should infringe a lot to decrease the incremental value of each infringement, thereby escaping responsibility for any infringement. I am not sure we want to be incentivizing large scale pirating with such a framework.
Jul 01, 2025 @ 11:14am

Human Craft vs Mass-Produced Commodity

on Two Judges, Same District, Opposite Conclusions: The Messy Reality Of AI Training Copyright Cases

I dunno, I think what Chhabria is getting at is that certain types of works (e.g., fact-based books) could be relatively easily and cheaply produced by an LLM "factory" that a human couldn't really compete with. The analogy that your comment makes me think of is manmade furniture vs factory produced furniture. Some people will pay a premium for the manmade stuff because of the craft and knowing a human made it, but that is a pretty small market. By an large, in an industry that was once reliant on human production, people buy mass produced furniture made by fairly automated processes that leaves little room for the human artisan. My main point is that a sufficiently accurate AI-produced text on something may, and probably will, be cheaper and faster to produce, while a human created version would be slow, laborious and expensive that would not differ greatly to much of the market. Whether you view that as relevant to copyright and fair use, I think is another question and kind of depends on whether you think copyright is for the purpose of incentivizing human authorship, or simply authorship.
Jul 01, 2025 @ 11:02am

on Two Judges, Same District, Opposite Conclusions: The Messy Reality Of AI Training Copyright Cases

I actually find this discrepancy super interesting. What I am inferring from this split is the philosophical perspective of whether non-human competition with human works of authorship is the type of "market harm" that would affect a finding of fair use. Alsup brushes off the market harm analysis as just being competition, while Chhabria takes the perspective that AI-originated competition on authorship is more than simple competition. I don't know what the correct view should be, but I do think there is something different about automated mass produced content than other people creating content. People should be expected to compete with people, but can we expect people to be able to compete with AI systems? And is that even a question relevant to copyright? Alsup seems to say no to the latter while Chhabria says yes. The constitution says "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries;" and this question seems to turn on whether one would seek to protect human-originated progress, or just progress regardless of the origin.
Aug 30, 2024 @ 08:08am

on Third Circuit’s Section 230 TikTok Ruling Deliberately Ignores Precedent, Defies Logic

My interpretation of the what the court said is not that "first amendment applied == section 230 cannot apply", rather it is about to whom is the expression attributed to. The obvious examples are that TikTok is not liable for defamation if it hosts a post that lies about a person, but TikTok would be liable for defamation if TikTok itself posted a lie about the person from TikTok's own account. TikTok is obviously liable in the latter case because it is TikTok's expression that formed the lie. Now if TikTok can tune its algorithms such that the algorithm is expressive of some view, then that view could also (potentially) be attributable to TikTok. There are plenty of cases that say no, but even in many of those, including in Force v. Facebook and others, there is often this concept of the algorithm being "neutral." So even without the Netchoice first amendment view, there is the potential for an algorithm to be non-neutral such that 230 does not apply. All that being said, I think the courts have actually already developed a test for this where the court is to analyze whether the platform "materially contributed" to the unlawfulness of the content, which is part of the reason why I kind of think this opinion is fishing for supreme court intervention, to force the issue and create a test or standard for how to understand the expressive relevance of the algorithm itself.
Aug 29, 2024 @ 07:46pm

on Third Circuit’s Section 230 TikTok Ruling Deliberately Ignores Precedent, Defies Logic

I mean it literally is what the court ruled and what Mike quoted and then said is "wrong": "Anderson asserts that TikTok’s algorithm “amalgamat[es] [] third-party videos,” which results in “an expressive product” that “communicates to users . . . that the curated stream of videos will be interesting to them[.]” ECF No. 50 at 5. The Supreme Court’s recent discussion about algorithms, albeit in the First Amendment context, supports this view. In Moody v. NetChoice, LLC, the Court considered whether state laws that “restrict the ability of social media platforms to control whether and how third-party posts are presented to other users” run afoul of the First Amendment. 144 S. Ct. 2383, 2393 (2024). The Court held that a platform’s algorithm that reflects “editorial judgments” about “compiling the third-party speech it wants in the way it wants” is the platform’s own “expressive product” and is therefore protected by the First Amendment…. Given the Supreme Court’s observations that platforms engage in protected first-party speech under the First Amendment when they curate compilations of others’ content via their expressive algorithms, id. at 2409, it follows that doing so amounts to first-party speech under § 230, too…." This whole opinion seems to circle around this concept of "if the algorithm is expression of the platform, then it is not the expression of another, and therefore is not covered by 230". So ... very relevant.
Aug 29, 2024 @ 12:20pm

on Third Circuit’s Section 230 TikTok Ruling Deliberately Ignores Precedent, Defies Logic

This decision is really trying to resolve at a contradiction created in light of Netchoice (in a way that may be fishing for supreme court intervention): --Netchoice found the recommendation algorithms of social media platforms to be expressive within the meaning of the first amendment (i.e., the act of recommending content is, in some form, content in itself, like how a mosaic can be a work of art unto itself even though it is created from other works of art, including where the other works of art are from other artists), --meanwhile, section 230 says the content of another (the underlying art within the mosaic) is not the responsibility of the social media platform. It would be contradictory to say that the recommendation of content is the platforms' speech under the first amendment while simultaneously being the speech of another under 230. At some point you have to draw the line to distinguish between the recommendations as speech of the platform versus the underlying content as speech of the content creators. Put simply, when does the arrangement and presentation of the content of others become content unto itself? I get that mere "publishing" should be protected (though I think this is not the word you want to use because 230 says the opposite - "NO PROVIDER or user of an interactive computer service shall be TREATED AS THE PUBLISHER..."), but even publishers can be held responsible for the content of their publications (e.g., defamation, incitements to violence, etc.). But again, the fact that the recommendations can be expressive implies that there is more than simply distribution of content, and section 230 only insulates the platform from the content of others, NOT from its own expressions.
Jan 04, 2024 @ 12:40pm

on The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

Downloading content for studying and professional development falls within fair use. That doesn't make downloading for any personal use in the US ok, it makes downloading for educational uses ok. The point I keep trying to make is that there is a difference between the rule and the exception. Fair use is the exception. Don't treat it as the rule. Rather, fair use defines small realm of situations that fall within the rule, so you cannot extend it to all situations. For example, being allowed to copy copyrighted content for education purposes does not mean copying to train a for-sale AI service is obviously ok. Indeed, the LLM is not a person, in law or in fact. It is not helpful to keep equating the computer to a human. They are not the same. Moreover, the LLM is a product provided to users in exchange for value, and is thus a commercial use of the content in the training set. This potentially, though I am not sure if it actually would, remove this type of use from the fair use except because use for commercial activities typically weighs pretty heavily against fair use. Regarding torrents or downloading YouTube videos: you absolutely can get sued because it absolutely is copyright infringement. You probably won't though cause it's not worth the effort for anyone to start going after individual users. It would cost a lot to find out who the people doing the downloading are and a lawsuit would cost more than they could collect in damages. Instead, they go after the makers of the tools (e.g., Napster) as a contributory infringer to both go after the root of the problem and go after the people with the money. But make no mistake, downloading copyrighted material without permission is copyright infringement (unless it falls within fair use, which again, is a particular exception that does not cover all personal uses of copyrighted works).
Jan 03, 2024 @ 07:50am

on The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

I understand that the model is not the dataset, but to train the model you nevertheless need the dataset, which means creating your own copy of the dataset on your own database (typically). You literally need the text of that article imported into your own system, which is a copying of another's work, and thus potentially an infringement of copyright. I am not saying that the act of training the model or the model itself are copying. The copying of the dataset is the copying which is then used for training. And we know OpenAI does this because they published a paper. See Section 2.2 of https://arxiv.org/pdf/2005.14165.pdf where they saying things like: "(1) we downloaded and filtered a version of CommonCrawl..." and "...including an expanded version of the WebText dataset [RWC+19],collected by scraping links over a longer period of time...". "Downloading" and "scraping" are instances of copying content from another source, such as, it seems, NYT articles. Again, how fair use ends up applying to this, I am not sure, but it seems to me that taking the text of articles from websites and saving them for training an ML (or any other purpose) fits within the language of the copyright act that forbids anyone but the owner "to reproduce the copyrighted work in copies".
Jan 02, 2024 @ 12:03pm

on The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

They do indeed apply a number analyses and transformations, but they are not doing that on someone else's servers. You need look no further than WebText and WebText2 (feel free to google), an OpenAI produced dataset of text scraped from URL links identified on Reddit. You can even download this dataset, which includes the text of the webpages of those URLs (https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai/ even states that WebText2 includes "the text of web pages from all outbound Reddit links from posts with 3+ upvotes"). This is a literal copy regardless of what they do to process it afterwards.
Jan 02, 2024 @ 07:32am

on The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

Just to quickly align our frameworks, in order to have "fair use" of a copyrighted work, you must first perform an otherwise impermissible copying. Fair use is an exception to the rule, copyright is the rule. So to help clarify things, I am going to call an act of copying a work a "candidate infringement", and once a candidate infringement is discovered, it must be determined whether fair use applies to determine whether the candidate infringement is indeed infringement or not. I saw this because my first point is all about discovering that candidate infringement. You are definitely right, conditions of copying a work is important. But the example you describe is a fair use question (educational purposes is a recognized fair use exception to copyright). But my first point was more saying that the technology you use to copy something is not important to the analysis. If I published and sold a book of someone else's poems without permission, that would be copyright infringement regardless of whether I photocopied each poem, scanned each poem, dictated, transcribed, or reproduced from memory. Similarly with your 1B, it is a fair use argument and does not go to the question of whether there is even a candidate infringement to which the fair use exception needs to be applied. Nevertheless, you might be right. I am not sure if this could be considered fair use or not. On the one hand, OpenAI is making copies of works without permission in order to enrich the value of their commercial activities, which does not seem like it would weigh in their favor. But on the other hand, like you say, the copying they are doing is not really to reproduce the work for consumption by an end user. But I think that is the conversation that needs to be had and "the LLM just reads it" is neither technically nor legally accurate I don't think. Finally, I don't think it matters whether OpenAI "pirated" the articles or acquired from a legitimate source. Indeed, all of the sources listed in the complaint appear "legitimate" in that the NYT is not arguing that those services themselves committed any copyright infringement. And that makes sense because, back to the book of poems example, it shouldn't matter whether I got the poems from the Pirate Bay or from a local library of which I am a member, copying and selling another's work is not permissible in either case. Same with whether people can "get around the paywall" in other ways. Just because their are other ways to access the work does not suddenly make copyright infringement ok. Just because someone can get those poems from the library or from Pirate Bay on their own doesn't make it ok for me to infringe those copyrights. At the end of the day, the copyright act explicitly forbids making copies of work (including an article). So it seems to me that the threshold question of whether OpenAI's activities are candidates for copyright infringement is pretty clearly settled. We have at least two instances of making a copy without authorization of the author. So the discussion really comes down to, in the LLM training instance, whether it is fair use like for a search engine, and in the prompted reproduction instance, whether that is fair use or even OpenAI's responsibility since they are not the ones doing that prompting (i.e., who is the actual copier in this instance, OpenAI or the prompter). There are policy arguments that can go either way, but how LLMs feature in copyright infringement and what that means for the copying itself, seems like a pretty new question.
Dec 29, 2023 @ 01:30pm

on The NY Times Lawsuit Against OpenAI Would Open Up The NY Times To All Sorts Of Lawsuits Should It Win

Interesting article and I always like Mike's take. But I think a couple of important factors are glossed over here. First and foremost, copyright is right to not have others copy your work (to oversimplify). This applies not just to literally copying of physical texts, but also copying data (software, music files, and, yes, written works). The NYT here is not simply saying "this is a mechanism to get around our paywall", the NYT in its complaint is saying that the output of a significant portion of an article is a reproduction of copyrighted work. Again, copyright protects against the copying of works, and yet the NYT shows that ChatGPT can and will copy NYT works by outputting near-verbatim portions of their articles. Regardless of you trigger that reproduction, it is nevertheless a reproduction of NYT works (at least that is NYT's theory). Under that theory, the prompt is immaterial. As far as I know, copyright law does not include any conditions on how reproduction is triggered, and is thus irrelevant to the analysis. Moreover, even before we get to the outputting to the user of a portion of an NYT article, the NYT is saying that OpenAI makes copies of their articles to build the training dataset. Again, this is copying through and through. It does not matter that it is in a back-end database or that it is taken from Common Crawl (which may be fair use itself, but I doubt that fair use transfers to an ultimate beneficiary, for example I cannot take a TechDirt article from Internet Archive and publish it on my own webpage as my own). So there are two alleged instances of reproduction here, a legal right only reserved to the owner of the work and their licensees. Thus, all this discussion about prompting and "reading" is, again, irrelevant, because copyright pertains to the copying and reproduction, not to the methods of reproduction nor the purpose that the reproduction serves (except, as I will discuss next, in limited exceptions where it is deemed fair use). This brings me to point two - fair use. This is a trickier subject here, but fair use typically applies to: commentary, search engines, criticism, parody, news reporting, research, and scholarship. I am not sure any of those apply to building a database of training data or to reproducing portions of articles to users. Nevertheless, the factors for determining fair use are: the purpose and character of the use; the nature of the copyrighted work; the amount and substantiality of the portion used; and the effect of the use upon the potential market for or value of the copyrighted work. I will not analyze each of these here, but will just point out that this is why the NYT goes into how valuable the NYT is for training the LLM and, in turn, how valuable the LLM is when trained on NYT works. It is also why they go to such lengths to show that significant portions of the articles can be reproduced, and that their paywall can be circumvented by cleverly prompting the model. I have no idea how a court would come down on this, but it is more than "the NYT doesn't understand LLMs." In fact, I completely expect that people will use ChatGPT to try to read articles from the NYT and other paywalled sources without paying, people do that stuff all the time and will use whatever tools are available. We may not agree with the potential effects of this lawsuit, but there is more here than "the NYT is greedy" (though that may be true as well).

Dister's Techdirt Profile

About Dister

Dister's Comments

Human Craft vs Mass-Produced Commodity

Tools & Services

Company

Contact

More

Dister's Techdirt Profile

About Dister

Dister's Comments

Human Craft vs Mass-Produced Commodity

Email This Story

Tools & Services

Company

Contact

More