Duty to correct misinformation

I may be old enough to know a trivia fact unknown to many TechDirt readers. Once upon a time, when computers were first able to simulate fax machines, some of the European countries required vendors to register their systems before they could be allowed to send computer-generated faxes to the corresponding country. One of them–I think it was France–had a requirement that if the computer dialed what it thought was a fax number but a human being answered the phone, that number had to be stored so that the system could remember never to call it again.

During that same era, another country–might have been Switzerland–required that the sending machine “listen” for a special tone that indicated an emergency and then immediately disconnect. That would allow the authorities to send a notification warning the recipients to head for their bomb shelters.

Anonymous Coward

May 8, 2024 at 12:30 pm

noyb Dice

Can you imagine the lawsuits if noyb found a set of dice? I ask the dice my age and it gives me wrong information! Someone needs to remove the wrong information from this set of dice.

Anonymous Coward

May 8, 2024 at 12:52 pm

Re: noyb Dice

They’re really gonna have fun with my Magic 8 Ball.

Anonymous Coward

May 8, 2024 at 1:28 pm

Re: Re:

Don’t count on it.

Anonymous Coward

May 8, 2024 at 1:23 pm

Surely it would be a much greater invasion of privacy if ChatGPT did know everyone’s birthday?

Anonymous Coward

May 8, 2024 at 1:29 pm

Of course, if an LLM is (partially) fronting for a search engine, then your incorrect information lies a few levels down.

Anonymous Coward

May 8, 2024 at 2:23 pm

Here's a hypothetical

What happens if ChatGPT is asked a question about a private bit of personal data and happens to answer correctly? (Depending on the question and the data this may be improbable or likely.)

Obviously it didn’t pull the answer from a database: it constructed it from its internal language model. So the answer doesn’t actually exist, anywhere, per se, inside ChatGPT.

How does this play with the GDPR?

Samuel Abram (profile)

May 8, 2024 at 2:36 pm

Re:

Wouldn’t that count as a lucky guess, then?

Anonymous Coward

May 8, 2024 at 3:46 pm

Re: Re:

Basically every correct answer it gives is a lucky guess.

MrWilson (profile)

May 8, 2024 at 10:41 pm

Re: Re: Re:

Exactly. Literally everything an LLM that is not pulling from a database or search engine says is “made up” in the sense that it is just predicting the next words in a sentence rather than actually trying to be factually accurate. It’s like a parrot that can just repeat a lot more words and phrases and recognize more complicated patterns of how those words and phrases might be combined and ordered.

drew (profile)

May 9, 2024 at 9:58 am

Re:

ChatGPT doesn’t provide answers. It provides a string of text that looks like an answer might do.
GDPR provides a right to data visibility, deletion and correction but as no data is being stored here I don’t see how there’s a violation.

This comment has been flagged by the community. Click here to show it.

Benjamin Jay Barber

May 8, 2024 at 5:17 pm

Mike Masnick Malding Again

I don’t disagree with the policy positions of mike on this issue, but he again demonstrates his ignorance of both the law and how these systems work.

There is a privacy tort called “false light”, which is what is being violated when the LLM hallucinates facts.
The AI models DO store data, and they are in the form of a database, that is in fact what the “weights” of the neural network are.
A neural network does not NEED to compress data, and thereby perform hallucinations, these models can be “over-parametrized”, its just very much more expensive to increase the numbers of parameters, but the choice of compression level and the scope of the training data is at the discretion of the company.
A neural network can to some degree know when it might be hallucinating, when performing the “softmax” operation to predict the next tokens, analyzing the “perplexity” of the token candidates.

Anonymous Coward

May 8, 2024 at 5:37 pm

Re:

hey sex pest haven’t seen you post in awhile

MrWilson (profile)

May 8, 2024 at 10:58 pm

Re:

There is a privacy tort called “false light”, which is what is being violated when the LLM hallucinates facts.

False light is a privacy tort in the US. We’re talking about Europe. But also, false light typically requires the defendant to publish the information widely rather than just in a private chat, it requires the misinformation to be highly offensive to a reasonable person, and the defendant must be at fault. These requirements aren’t met unless you can prove the company intentionally programmed an LLM to specifically identify and defame individuals and did so to a large audience rather than just one person in a chat. And no reasonable person, understanding that a non-human LLM literally makes up everything it says by its very nature (barring a web search or a RAG), would be offended by it. So you’re wrong on top of being wrong on top of being wrong.

The irony is that your hallucinated “facts” are more offensive than ChatGPT’s.

bhull242 (profile)

May 9, 2024 at 8:15 pm

Re:

There is a privacy tort called “false light”, which is what is being violated when the LLM hallucinates facts.

That tort does not exist in Europe, where this lawsuit was filed, and so it is irrelevant. The relevant law here is the GDPR. That’s what the complaint references.

The AI models DO store data, and they are in the form of a database, that is in fact what the “weights” of the neural network are.

It doesn’t store data about people or facts. It stores data about language. Those are not the same thing, and only the former can support this complaint.

A neural network does not NEED to compress data

It doesn’t actually compress data at all; that was just figurative language.

and thereby perform hallucinations, these models can be “over-parametrized”, its just very much more expensive to increase the numbers of parameters, but the choice of compression level and the scope of the training data is at the discretion of the company.

Yes, and the article speaks on such models. ChatGPT just isn’t one of them. The existence of others that do so is irrelevant to whether the law in question actually applies to what ChatGPT does.

A neural network can to some degree know when it might be hallucinating, when performing the “softmax” operation to predict the next tokens, analyzing the “perplexity” of the token candidates.

But it can never be eliminated altogether, nor is there any legal duty for AI makers to do so. This, too, is missing the point.

Anonymous Coward

May 10, 2024 at 8:03 am

Re: Re:

It doesn’t store data about people or facts. It stores data about language. Those are not the same thing, and only the former can support this complaint.

Correct. The GDPR protects personal information (from my reading of it), which doesn’t protect words, etc. If it did, everything would have to shut down for violation of the GDPR, including schools, where words are taught.

Mamba (profile)

May 8, 2024 at 6:18 pm

ignorance of both the law and how these systems work.

Your assessment of his qualifications carries absolutely no authority considering you went to jail for six months based on your misunderstanding of the law. I’m fact, your criticism stands as a glowing endorsement.

I don’t even need to get into your misunderstanding of torts(not what’s under discussion), weights (this is not personal data, as it’s an aggregation of large data sets), or compression (as an analogy).

Anonymous Coward

May 8, 2024 at 6:50 pm

The article critiques a recent complaint by the European privacy activists, noyb, against OpenAI regarding ChatGPT’s compliance with the GDPR. It argues that noyb’s complaint misunderstands the nature of generative AI tools like ChatGPT, which generate content without storing or retrieving data. The analogy of asking a friend about someone’s birthday, where the friend provides incorrect information, is used to illustrate that ChatGPT operates similarly—it generates responses based on learned data rather than retrieving stored information. The article suggests that noyb’s complaint is misplaced and questions whether it’s a genuine concern or a tactic to highlight flaws in the GDPR’s applicability to modern technology. It concludes by emphasizing the fundamental difference between generative AI models and those with retrieval capabilities, suggesting that if noyb believes ChatGPT violates GDPR, it either misunderstands the technology or exposes the GDPR’s inadequacy in regulating it.

Hey noyb: let’s team up to navigate the intricacies of privacy in our tech-driven world and champion humanity’s best interests together. We’re all in this digital adventure, let’s make it a collaborative one! 🌟

Mamba (profile)

May 8, 2024 at 7:05 pm

Re:

Well, this is very meta….

Arianity

May 9, 2024 at 2:00 am

If you were to ask someone to state the birthday of someone else, and the person asked just made up a date, which was not the actual birthday, would you argue that the individual’s privacy had been violated?

Would that violate GPDR? Just leaving the AI part aside, would making up “data” fall under it? I know it has some provisions for incorrect information, so I wonder if that would fall under that.

And, yes, there are some cases where it seems closer to storing data, in that the nature of the training and the probabilistic engine is that it effectively has a very lossy compression algorithm that allows it to sometimes recreate data that closely approximates the original, but that’s still not the same thing as storing data in a database,

If you look at the definitions in GPDR, it doesn’t actually say anything about e.g. a database:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;

Further, it says personal data should be:

accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay (‘accuracy’);

I think you could make the argument that something simply making something up isn’t accurate? It does mention in some places phrases like a “filing system”, though.

That said, going after training data seems like a way easier target, rather than outputs.

by filing a laughably silly complaint that exposes how poorly fit the GDPR is to the technology in our lives today.

For what it’s worth, this is is probably a bit moot in the long term, as the EU has an AI-specific bill coming up soon: https://artificialintelligenceact.eu/

Anonymous Coward

May 10, 2024 at 8:05 am

Re:

If you were to ask someone to state the birthday of someone else, and the person asked just made up a date, which was not the actual birthday, would you argue that the individual’s privacy had been violated?

Would that violate GPDR? Just leaving the AI part aside, would making up “data” fall under it?

Since I’m not a ‘data processor’ under the law, then no.

Anonymous Coward

May 9, 2024 at 3:16 am

I think the relevant phrase in GDPR is “structured filing system”? So a set of random sticky notes is not covered by GDPR, but the same information alphabetically indexed in a folder may be covered depending on the type of information.

Personally, I’d say an LLM is closer to ustructured than structured filing system.

Anonymous Coward

May 9, 2024 at 9:17 am

Re:

GDPR states ‘filing system’ means any structured set of personal data which are accessible according to specific criteria, whether centralised, decentralised or dispersed on a functional or geographical basis.

LLM pulls their answers from some data they store somewhere after having ingested the huge training dataset. It cannot work otherwise.
This is why LLM cannot yet be used on a smartphone for offline use, they require too much data storage (but still so much less that the training dataset).

Whether that stored data looks structured to the human eye is not the question. That data is necessarily structured in a way that allows the LLM to pull out answers that make some sense.

I understand that Material scope of GDPR (article 2) is fulfilled.

Ninjasaid (profile)

May 9, 2024 at 8:50 pm

Re: Re:

well actually small quantized language model can work on some modern smartphones. I remember some people running LLaMA-2 on android phones.

This comment has been flagged by the community. Click here to show it.

Anonymous Coward

May 9, 2024 at 3:46 am

Techdirt spectacularly miss the point

The product is sold as a search engine. That’s why this case makes sense.
If this stops llms being touted as search engine replacements then it’s a job well done.
This is a computer where you have to check the answers. There is no value in that beyond “oooh, doesn’t it sound like a human?”
Yes, it sounds like a human who hasn’t got a fucking clue….

Michael

May 9, 2024 at 11:29 am

Re:

Where has OpenAI been selling it’s service as a search engine? I have never seen them do such a thing and would be interested in a link if you have one.

MrWilson (profile)

May 9, 2024 at 5:02 pm

Re:

The product is sold as a search engine

No. It literally isn’t. It didn’t even have search integrated until about six months ago.

That’s why this case makes sense.

So since it isn’t sold as a search engine, you’re admitting the case doesn’t make sense. Agreed.

This is a computer where you have to check the answers.

ChatGPT isn’t a computer.

There is no value in that beyond “oooh, doesn’t it sound like a human?”

It’s a tool. It has uses you apparently haven’t conceived of. That doesn’t make it valueless.

Yes, it sounds like a human who hasn’t got a fucking clue….

Are you saying your comment was written by ChatGPT because “it sounds like a human who hasn’t got a fucking clue”?

bhull242 (profile)

May 9, 2024 at 8:25 pm

Re:

The product is sold as a search engine. That’s why this case makes sense.

Absolutely no one—least of all OpenAI—sells ChatGPT as a search engine. On the contrary, it includes disclaimers that explicitly say that it is not a search engine.

Whether or not some other company advertises some AI as a search engine is irrelevant because this isn’t a complaint filed against that other company or about that AI; it’s against OpenAI, who has never once claimed that ChatGPT was a search engine.

If this stops llms being touted as search engine replacements then it’s a job well done.

Since OpenAI does not tout ChatGPT as a search engine replacement and never has, the complaint was filed against the wrong party, so it makes no sense and will not accomplish that goal.

This is a computer where you have to check the answers.

You should always be doing that, anyways. Look at a calculator. It will sometimes give garbage answers due to rounding errors or due to bad inputs.

There is no value in that beyond “oooh, doesn’t it sound like a human?”

Value is in the eye of the beholder. Just because you don’t find any value in it doesn’t mean it doesn’t have value to someone.

At any rate, who cares if it has value beyond that? That doesn’t make the complaint any more valid since the law doesn’t require the product to have value.

Anonymous Coward

May 10, 2024 at 3:33 pm

Re: Re:

“Look at a calculator. It will sometimes give garbage answers due to rounding errors or due to bad inputs.”

Bad inputs cause wrong answers?

I would assume the answer was right, the inputs were wrong. Not the calculators fault.

8X12 = 96

If you meant to input 122 rather 12, doesn’t make it a garbage answer, just a garbage analogy.

Anonymous Coward

May 10, 2024 at 8:08 am

Re:

The product is sold as a search engine.

Right. Just like Microsoft Edge is. Your point?

Anonymous Coward

May 9, 2024 at 12:51 pm

An answer to a query is… an answer.
Whether the LLM is conscient that the sting of text output is the name of a person or not is irrelevant to GDPR.
If the string of text can help to identify a person, it is personal data for GDPR.
I would be interested to know how an LLM can work without storing data somewhere…

bhull242 (profile)

May 9, 2024 at 8:30 pm

Re:

If the string of text can help to identify a person, it is personal data for GDPR.

But the GDPR doesn’t care about data that isn’t stored or published. ChatGPT does neither for strings of text that can identify a person.

I would be interested to know how an LLM can work without storing data somewhere…

It doesn’t store factual data. It stores probabilistic data about language patterns. Basically, none of what it outputs is stored anywhere in the system. It is created anew based on inputs from the user, and it doesn’t (usually) retain that output or those inputs for future retrieval. Indeed, you can get ChatGPT to give different answers to the same prompt, demonstrating that it hasn’t stored the answers at all.

Anonymous Coward

May 10, 2024 at 12:46 am

Re: Re:

2 important points:

A: the storage of some data is not a GDPR requirement. It is just one of the processes that fall under it. Any operation on personal data is a processing. It could be collection, transmission, combination, structuring, use and many others as set, without limits, in GDPR article 4.2.

B: personal data is any kind of information that can help identify a person, directly or not. Whether the data is human understandable or not and whether it is made-up of factual/pseudonymized/statistical/probabilistic/gibberish information or not is irrelevant to GDPR as long as it can help – using the LLM – identify a person).
An IP address, an ID number, a name, a physical address… are all strings of texts that could be used to help identify a person. It is not difficult to have an LLM output strings of texts that are personal data. Ask ChatGPT a question about Donald Trump and most of the output string of text will be personal data related to him (can help identify him). A simple list of Donald Trump’s achievements, even without his name, is also his personal data if someone can identify him from it.

In the present case, the public figure’s name have been collected by the company, have been used to train (combined/structured/…) an algorithm. The algorithm itself (containing the personal data in a statistical/probabilistic format we cannot understand), is stored . Upon receiving a query from a browser/app, the algorithm then uses and organises the data to communicate its answer in HTML form to the browser/app.
All these operations on personal data (whether human readable or not) fall under GDPR.

Openai themselves state that they do process personal data of that public figure when they say that they can block any information about the data subject.

What is manifestly unfounded or excessive in exercising one’s right to erasure, a right enshrined in GDPR? Again, whether that data is human readable or not is irrelevant to GDPR.
Openai should have taken the GDPR’s right to erasure into account when designing ChatGPT.

Disclaimer: ChatGPT was used for some translation verification.

Anonymous Coward

May 10, 2024 at 8:31 am

Re: Re: Re:

I follow on my previous post.
Regarding the date of birth of that public figure, it seems that it was not present in the training dataset. So, not present either within the algorithm.

Still a wrong date of birth related to the public figure (erroneous personal data) was communicated by Openai to the user’s browser in HTML form and stored as a conversation within the person’s account.
It seems no one knows how the algorithm make this up and it could indeed be difficult for Openai to delete a data that was made-up from nothing.
But it could probably delete it from the conversation that contained it.

But GDPR also requires that personal data processed should be accurate. Therefore, Openai have to ensure that the data related to a person (personal data) and that is communicated to his browser and stored with the conversation is accurate and Openai must be able to demonstrate which steps it took to ensure this accuracy.

Will be interesting to follow-up this case…

Anonymous Coward

May 11, 2024 at 1:01 pm

Re: Re: Re:²

But GDPR also requires that personal data processed should be accurate. Therefore, Openai have to ensure that the data related to a person (personal data) and that is communicated to his browser and stored with the conversation is accurate and Openai must be able to demonstrate which steps it took to ensure this accuracy.

As someone who has to deal with the GDPR on a daily basis, I’m afraid you are wrong.

The definition of personal data and processing is very specific in the GDRP. Since no personal data is actually stored in an LLM there is no processing of it either even though the output from an LLM may look like personal data.

An analogy of what is happening is using statistics from census data:
* Use the most common surname + first name + average age + profession + city, what is the chance you get something that matches a real person?
* If there is a match, do the processing fall within the GDPR because the “personal data” was created from statistical data by coincidence?

The answer is no because no personal data was actually ever processed, which is why the GDPR is irrelevant for LLM’s.

Re: Re: Re:³

I do not think GDPR cares how personal data was generated : personal information is ANY information relating to a person who can be identified directly or indirectly.

Also, I do not agree that no personal data is stored within the algorithm. It is there, encoded and filed in a way only the algorithm can find and output.
If no personal data was stored within the algorithm, I believe the probability that it output mostly correct information about a public figure would be close to zero.

As for the storage, I agree I was wrong when I said “A: the storage of some data is not a GDPR requirement.” as personal data must be part of (or intended to form part of) a filing system.

Ninjasaid (profile)

May 9, 2024 at 8:53 pm

Re:

“If the string of text can help to identify a person, it is personal data for GDPR.”

can you show me an example of a string of text that can help identify a person? I’ve never seen any large language model do that.

Anonymous Coward

May 10, 2024 at 8:11 am

Re:

Except LLMs don’t store personal data, they store language data. That’s why they’re called LLMs and not LPDMs. Get a clue.

Ben-L (profile)

May 9, 2024 at 2:55 pm

UK's GDPR

This had me similarly scratching my head when I saw it. Turns out the UK’s version of GDPR gives companies an exemption if it is “manifestly unfounded or excessive” which could arguably apply here.

Anonymous Coward

May 10, 2024 at 8:14 am

Re:

From the link you provided:

What if the request is manifestly unfounded or excessive?
If requests are manifestly unfounded or excessive, in particular because they are repetitive, you can:

charge a reasonable fee taking into account the administrative costs of providing the information; or
refuse to respond.
You have to be able to demonstrate how a request is manifestly unfounded or excessive.

Nice try at mendacity, though.

Anonymous Coward

May 10, 2024 at 8:15 am

Re: Re:

Forgot to put that the quoted text is in relation to requests for correction of data held about an individual by the NHS (for example).

Sunday
12:15	Funniest/Most Insightful Comments Of The Week At Techdirt (3)
Saturday
12:00	This Week In Techdirt History: May 12th - 18th (11)
Friday
19:39	NetEase Backs Down On Requirement For Early Streamers Of 'Marvel' Game To Not Critique The Game (22)
15:15	Ctrl-Alt-Speech: Do You Really Want The Government In Your DMs? (6)
13:43	Court To Cops: If We Can't See The Drug Dog Do The Thing, We're Gonna Be Suppressing Some Evidence (65)
12:13	When Humanity Gets Messy, Sometimes the Best Tech Solution Is To Do Nothing (16)
10:47	AI Gun Detection Company Pitching Its Tech To Schools Sure Seems To Be The Sole Beneficiary Of A Lot Of Similarly-Crafted Legislation (10)
10:44	Daily Deal: Babbel Language Learning (All Languages) (0)
09:34	Yet Another Study Finds That Internet Usage Is Correlated With GREATER Wellbeing, Not Less (22)
05:28	Wireless Industry Fined Yet Again For Selling Very Limited 'Unlimited' Data Plans (7)

Can ChatGPT Violate Your Privacy Rights If It Doesn’t Store Your Data?

from the this-makes-no-sense dept

Comments on “Can ChatGPT Violate Your Privacy Rights If It Doesn’t Store Your Data?”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

The Techdirt Greenhouse

Trending Posts

Sunday

Saturday

Friday

More

Email This Story

Tools & Services

Company

Contact

More