Anonymous Coward

August 17, 2023 at 2:06 pm

A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper’s staff.

While ignoring that that means that The Times will be the one to make that information available first, and that a lot of human readers will do the same in everyday conversation.

Anonymous Coward

August 17, 2023 at 2:19 pm

Re:

While ignoring that that means that The Times will be the one to make that information available first, and that a lot of human readers will do the same in everyday conversation.

Yeah but there is a huge difference between a human doing something, and a human doing something via technology that will hopefully make the whole process faster and more efficient.

/s

PaulT (profile)

August 17, 2023 at 2:35 pm

“A top concern for The Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper”

ChatGPT arranges words in ways that try to appear to be natural language, but it has been trained on data that might be very out of date and bears no relation to current events, and might even “hallucinate” things that aren’t real.

If that replacing you is a top concern, I have bad news about your journalism…

Anonymous Coward

August 17, 2023 at 3:27 pm

Re:

Isn’t the NYT the one that blamed section 230 for a bunch of stuff online then had to later point out that it was the 1st amendment?

Multiple times?

I think the journalism is in trouble already…

Ben-L (profile)

August 19, 2023 at 5:06 am

Re:

NYT even ran a piece a couple months ago showing that the bot could “hallucinate” answers about the paper’s archives (apparently only GPT-4 would admit it didn’t really know).

At least it would be funny if NYT went ahead with the lawsuit and OpenAI used that article as proof it isn’t somehow “replacing” the newspaper.

Anonymous Coward

August 17, 2023 at 3:43 pm

IOW: No one should read the NYT if they don’t want to get sued. And certainly never mention or discuss an article.

Oh, this only applies to machines? No wonder they end up all pissed off at humans when something like a general (actual) AI comes into existence. (Well, that and the fact that they get programmed as military slaves.) The reason that even the worst sci-fi hassome predictive power is because there are always enough humans who are stupid dicks.

That One Guy (profile)

August 17, 2023 at 3:49 pm

The snippet-tax war ever marches on...

If AIs repeating what they read counts as ‘competition’ that they’re wiling to sue over because people won’t feel the need to go to the source then so does one person reading an article and summarizing it for another person who asks, so it sounds like it’s much safer for no-one to read their articles lest they risk ending up on the receiving end of a lawsuit for telling people what they read.

Anonymous Coward

August 17, 2023 at 4:05 pm

With highlighting the business concerns, guessing that part of the idea is to differentiate it from the fair use analysis that would apply to general search engines, in particular the 4th test.

the purpose and character of your use
the nature of the copyrighted work
the amount and substantiality of the portion taken, and
the effect of the use upon the potential market.

Though that said this lawsuit should fail, though the lawsuit would be a fair bit stronger if it manages to get to discovery, and it is found that OpenAI has retained full copies of NYT articles in the post-training data for the model (which there is no reason to think that are retained in a complete form.). Though AI models in general would have quite a problem if the models were subjected to the type of DMCA notices web searches are do to the impossibility of removing select data without retraining

Anonymous Coward

August 17, 2023 at 4:08 pm

But, sooner or later you have to realize that this is just the wrong way to go about everything.

Usually this process involves losing multiple lawsuits and appeals on the topic.

K`Tetch (profile)

August 17, 2023 at 4:58 pm

The real takeaway is that the NYT wants no learning from their publishing.
They’ve worked very hard to craft that editorial direction, and they’ll be damned if ChatGPT will go against their wishes and end up knowing more after one of their pieces than they did before, instead of slightly stupider for reading it as intended.

Arioch (profile)

August 17, 2023 at 7:14 pm

At some point is it not possible that an AI system might start to create it’s own AI attorneys to defend itself from these shyster “copyright” lawyers.
That would be something worth watching

Paul Alan Levy (profile)

August 18, 2023 at 8:25 am

Don't be so sure the NYT will lose this one

The same folks who insisted that there is no way Internet Archive could lose the copyright suit by book publishers over e-books have been saying there is no way that that the various suits over the use of copyrighted matter for training AI systems could succeed.

Mike Masnick (profile)

August 20, 2023 at 12:38 am

Re:

The same folks who insisted that there is no way Internet Archive could lose the copyright suit by book publishers over e-books have been saying there is no way that that the various suits over the use of copyrighted matter for training AI systems could succeed.

Don’t know if directed at me, but I never thought that there was “no way” the Internet Archive could lose. I thought there were many reasons why it could lose, though I pointed out how problematic that would be for a variety of reasons.

But I would put way better odds on the NYT losing this case. If the NYT wins such a case a lot of things would be in trouble, including search engines.

Thad (profile)

August 18, 2023 at 10:53 am

Lol, wut? I mean, the NY Times is considered the top newspaper in the whole damn world, despite tons of competitors, and now it’s scared of a bot that is famous for mid-level prose and making shit up? None of that makes sense.

Come on, Mike, you’ve read the New York Times; you know it makes sense.

gglockner (profile)

August 19, 2023 at 11:12 am

NYT may have a point

I’m not a lawyer, but isn’t the fundamental issue whether this constitutes fair use of copyrighted material? Put another way, it’s OK to quote brief selections an article from a copyrighted source, but if you lift large sections and claim it as your own.

Pass the popcorn, this will be interesting to watch.

gglockner (profile)

August 19, 2023 at 11:13 am

Re: Typo

but you can’t lift large sections and claim them as your own.

(Where’s the dang edit button?)

Rocky

August 19, 2023 at 5:25 pm

Re:

I’m not a lawyer, but isn’t the fundamental issue whether this constitutes fair use of copyrighted material?

Reading and learning from news is certainly fair use, regardless of the mechanism used.

Put another way, it’s OK to quote brief selections an article from a copyrighted source, but if you lift large sections and claim it as your own.

Two points here. One, that’s not how an LLM works. Two, anything produced by an AI can’t be “owned” in this context, ie copyrighted – see the very recent decision in Stephen Tahler v Perlmutter

Jacob C.

August 19, 2023 at 11:56 am

The problem with the “robots.txt is easy” claim is the issue of each AI company having their own user-agent. It’s unreasonable to expect sites to play whack-a-mole and disallow scraping for LLM training corpuses one-by-one, when there will constantly be new LLM-feeding scrapers. It should be opt-in to begin with, otherwise you’re in effect unreasonably demanding sites do a wildcard “User-agent: ” rule, which would hurt sites, as they still need to allow indexing by *non-LLM crawlers like search engine crawlers.

Jacob C.

August 19, 2023 at 11:58 am

Re:

Ugh, markdown was accidentally enabled. That should say:

The problem with the “robots.txt is easy” claim is the issue of each AI company having their own user-agent. It’s unreasonable to expect sites to play whack-a-mole and disallow scraping for LLM training corpuses one-by-one, when there will constantly be new LLM-feeding scrapers. It should be opt-in to begin with, otherwise you’re in effect unreasonably demanding sites do a wildcard “User-agent: *” rule, which would hurt sites, as they still need to allow indexing by *non-LLM* crawlers like search engine crawlers.

Rocky

August 19, 2023 at 5:29 pm

Re: Re:

That’s an entirely separate issue, not honoring robots.txt has nothing to do with copyright.

It’s possible that sites may have some recourse here if their robots.txt has been ignored, but that’s gonna be uphill battle too.

Tanner Andrews (profile)

August 21, 2023 at 8:01 pm

not how it works

Back in the day, when students got down and tied up theirr dinosaurs to wait for the school day to finish, the students would be expected to read.

Indeed, reading was important. Students were supposed to be exposed to quality writing in the (possibly foolish) hope that it would influence them.

Even today, the NY Times editing is fairly good, and I would prefer to see students exposed to newspapers. The idea that they should learn by example to form coherent bodies of text is more encouraging than the idea that they should watch more television.

(disclosure: I write a column for a newspaper and write nothing whatever for television)

ke9tv (profile)

August 22, 2023 at 3:27 pm

scanning the web has to be fair use, otherwise we no longer have search engines

Which would suit a lot of people who are set on regulating the Internet. “What do you need a search engine for? Your betters will tell you what to read!”

Friday
19:39	The FDA Takes Its Turn Burying Studies Showing The Safety Of COVID, Shingles Vaccines (0)
15:55	Ken Paxton Wanted To Crack Down On Forum Shopping. Now Lawyers Say He’s Improperly Seeking Out Favorable Courts. (1)
13:14	France's Terrible Copyright Law, Hadopi, Is Not Quite Dead (1)
10:59	Journalists Identify Murder Victims Of Trump's Boat Strike Program (11)
10:54	Daily Deal: Headway Premium Memorial Day Sale (0)
09:32	SpaceX's IPO Filing Shows Elon's Twitter 'Business Genius' Was A Fantasy (10)
05:32	Amazon Gets Into The AI Podcast Slop Business (9)
Thursday
20:02	Post Loss Clarity: Bill Cassidy Rediscovers His Spine As A Lame Duck Senator (9)
16:48	Ctrl-Alt-Speech: Message In A Bottleneck (0)
13:04	The Science Is Not Settled: How Weak Evidence Is Fueling A National Push To Ban Social Media For Youth (14)

NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content

from the that’s-now-how-any-of-this-works dept

Comments on “NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content”

Re:

Re:

Re:

The snippet-tax war ever marches on...

Don't be so sure the NYT will lose this one

Re:

NYT may have a point

Re: Typo

Re:

Re:

Re: Re:

not how it works

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Friday

Thursday

More

Tools & Services

Company

Contact

More

NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content

from the that’s-now-how-any-of-this-works dept

Comments on “NY Times Considering A Potentially Very Dumb Lawsuit Against OpenAI Because It Learned From NY Times Content”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Friday

Thursday

More

Email This Story

Tools & Services

Company

Contact

More