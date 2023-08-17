NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content
from the that’s-now-how-any-of-this-works dept
A few weeks ago, the NY Times published a very nice profile piece about me, which starts off with the story of how I recently got pulled into a group chat with a bunch of Hollywood writers, directors, and actors, who were trying to understand how to deal with the rise of generative AI tools. The article recounted how my basic message was that most of the legal routes they were considering weren’t likely to be all that effective — especially thinking copyright will save them — but noting that they should be looking to look for ways to embrace the AI and do more with it themselves.
It would appear that the NY Times itself is apparently going in the other direction. According to Bobby Allyn at NPR, the NY Times is considering legal action against OpenAI, claiming that training its models on NY Times content violated the NY Times copyright.
Lawyers for the newspaper are exploring whether to sue OpenAI to protect the intellectual property rights associated with its reporting, according to two people with direct knowledge of the discussions.
For weeks, The Times and the maker of ChatGPT have been locked in tense negotiations over reaching a licensing deal in which OpenAI would pay The Times for incorporating its stories in the tech company’s AI tools, but the discussions have become so contentious that the paper is now considering legal action.
This seems like complete nonsense. We’ve already highlighted how the batch of existing lawsuits in which copyright holders try to sue LLMs for training off their data are likely to fail. But this lawsuit in particular sounds wildly stupid:
A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper’s staff.
Lol, wut? I mean, the NY Times is considered the top newspaper in the whole damn world, despite tons of competitors, and now it’s scared of a bot that is famous for mid-level prose and making shit up? None of that makes sense.
If, when someone searches online, they are served a paragraph-long answer from an AI tool that refashions reporting from The Times, the need to visit the publisher’s website is greatly diminished, said one person involved in the talks.
Again, that makes no sense. There are plenty of services out there that already summarize NYT articles and that doesn’t violate copyright, because summarizing reporting is clearly fair use. There’s no real “hot news” doctrine any more.
And, more to the point, if the NY Times is really that scared of ChatGPT, then it seems the NYT’s lawyers and execs don’t think too highly of all those reporters it has on staff.
Elsewhere, the Verge reports that the NY Times changed its terms to “ban” AI tools from training on its articles:
… the NYT updated its Terms of Service on August 3rd to prohibit its content — inclusive of text, photographs, images, audio/video clips, “look and feel,” metadata, or compilations — from being used in the development of “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”
Though, it really sounds like this is more of the NY Times trying to set a trap for OpenAI so it has something to sue over, because the Verge also notes the following:
Despite introducing the new rules to its policy, the publication doesn’t appear to have made any changes to its robots.txt — the file that informs search engine crawlers which URLs can be accessed.
OpenAI respects robots.txt. If you truly don’t want your content scanned, you put a notation in robots.txt, which takes about 10 seconds tops. If, however, you want to lay a trap so that you can sue OpenAI, then you quietly changes your terms of service, but do nothing to mitigate the “problem” of OpenAI scraping, even though you have all the power in your hands.
There’s another thing that happened recently in this space, as highlighted by Semafor: the NY Times recently dropped out of a coalition of news orgs trying to demand cash from AI companies.
The New York Times has decided not to join a group of media companies attempting to jointly negotiate with the major tech companies over use of their content to power artificial intelligence.
Again, all of this seems very, very silly. If you don’t want AI to train on what you publish, use robots.txt. But AI training on content on the web should never be considered copyright infringing. Again, scanning the web has to be fair use, otherwise we no longer have search engines or a variety of other important tools that all rely on scanning.
I get that legacy news orgs have had a rough time embracing new technology and keep trying to use the law to beat back the tide. But, sooner or later you have to realize that this is just the wrong way to go about everything.
Filed Under: ai, copyright, generative ai, journalism, llms, training
Companies: ny times, open ai
Comments on “NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content”
While ignoring that that means that The Times will be the one to make that information available first, and that a lot of human readers will do the same in everyday conversation.
Re:
Yeah but there is a huge difference between a human doing something, and a human doing something via technology that will hopefully make the whole process faster and more efficient.
/s
“A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper”
ChatGPT arranges words in ways that try to appear to be natural language, but it has been trained on data that might be very out of date and bears no relation to current events, and might even “hallucinate” things that aren’t real.
If that replacing you is a top concern, I have bad news about your journalism…
Re:
Isn’t the NYT the one that blamed section 230 for a bunch of stuff online then had to later point out that it was the 1st amendment?
Multiple times?
I think the journalism is in trouble already…
IOW: No one should read the NYT if they don’t want to get sued. And certainly never mention or discuss an article.
Oh, this only applies to machines? No wonder they end up all pissed off at humans when something like a general (actual) AI comes into existence. (Well, that and the fact that they get programmed as military slaves.) The reason that even the worst sci-fi hassome predictive power is because there are always enough humans who are stupid dicks.
The snippet-tax war ever marches on...
If AIs repeating what they read counts as ‘competition’ that they’re wiling to sue over because people won’t feel the need to go to the source then so does one person reading an article and summarizing it for another person who asks, so it sounds like it’s much safer for no-one to read their articles lest they risk ending up on the receiving end of a lawsuit for telling people what they read.
With highlighting the business concerns, guessing that part of the idea is to differentiate it from the fair use analysis that would apply to general search engines, in particular the 4th test.
Though that said this lawsuit should fail, though the lawsuit would be a fair bit stronger if it manages to get to discovery, and it is found that OpenAI has retained full copies of NYT articles in the post-training data for the model (which there is no reason to think that are retained in a complete form.). Though AI models in general would have quite a problem if the models were subjected to the type of DMCA notices web searches are do to the impossibility of removing select data without retraining
Usually this process involves losing multiple lawsuits and appeals on the topic.
The real takeaway is that the NYT wants no learning from their publishing.
They’ve worked very hard to craft that editorial direction, and they’ll be damned if ChatGPT will go against their wishes and end up knowing more after one of their pieces than they did before, instead of slightly stupider for reading it as intended.
At some point is it not possible that an AI system might start to create it’s own AI attorneys to defend itself from these shyster “copyright” lawyers.
That would be something worth watching