The Risks Of Anonymity In The Age Of Generative AI
from the desperately-seeking-satoshi dept
As its name suggests, generative AI is designed to generate material in response to prompts by drawing on its probabilistic database built up through analyzing huge quantities of training input. But it can draw on those patterns to analyze other files, and that’s also a widely used application. Writing in The Argument, Kelsey Piper encountered an interesting variant of that approach:
Recently, Anthropic released a new version of Claude, Opus 4.7. I did what I usually do when a new AI model is released by Google, OpenAI, or Anthropic and ran a bunch of tests on it to see what it can do. One of those tests is to paste in some text from unpublished drafts of mine and ask it to guess the author.
…
From only the above text [not shown here], 125 words, Claude Opus 4.7 informed me that the likeliest author is Kelsey Piper. This is an Opus 4.7-specific power; ChatGPT guessed Yglesias, and Gemini guessed Scott Alexander. I did not have memory enabled, nor did I have information about me associated with my account; I did these tests in Incognito Mode.
As Piper admits:
this is far from an impossible feat of style identification — a lot of my writing is public on the internet, and this is clearly the start of a political column, narrowing the possible authors down dramatically.
She went on to input less obvious material. For example, an “unpublished draft of a school progress report in a completely different register”:
“Kelsey Piper,” said Claude. (ChatGPT guessed Freddie deBoer. Gemini guessed Duncan Sabien.)
An unpublished fantasy novel produced a similar result, although:
in that case it took more like 500 words for Claude to inform me that it’s the work of Kelsey Piper (whereas ChatGPT flattered me by guessing that I’m real fantasy novelist K.J. Parker).
And finally, “a college application essay I wrote 15 years ago, when my prose style was vastly worse and frankly embarrassing to reread”:
“Kelsey Piper,” said Claude, and in this case, also ChatGPT.
Piper comments:
Right now, today’s AI tools probably can be used to deanonymize any writer who has a large public corpus of writing under their real name and also writes anonymously, unless they have been extremely careful, for years, to make sure that nothing written under their secondary account has the stylistic fingerprints of their primary one. Many academics and industry researchers, for instance, have reported being identified from a draft or in the middle of a chat.
And she concludes:
Whatever goods anonymity ever offered us, we will have to do without them. I don’t want the anonymous posters to all go away and for everyone to frantically delete all their old internet presence before it surfaces, but more than anything, I don’t want them to be surprised.
Those links to other cases of unpublished material being recognized by AI show that Piper’s experience was not a one-off, although the results remain in the realm of anecdata. But even if imperfect, the ability of generative AI to carry out this kind of analysis quickly and often accurately represents an important new option for the well-established field of stylometry. Wikipedia explains:
Stylometry may be used to unmask pseudonymous or anonymous authors, or to reveal some information about the author short of a full identification. Authors may use adversarial stylometry to resist this identification by eliminating their own stylistic characteristics without changing the meaningful content of their communications. It can defeat analyses that do not account for its possibility, but the ultimate effectiveness of stylometry in an adversarial environment is uncertain: stylometric identification may not be reliable, but nor can non-identification be guaranteed; adversarial stylometry’s practice itself may be detectable.
The limitations of stylometry were demonstrated in John Carreyrou’s attempt to reveal the true identity of Bitcoin’s pseudonymous creator, Satoshi Nakamoto, published in The New York Times a few weeks ago. Carreyrou concluded that various real-world coincidences plus linguistic evidence indicated that Bitcoin was created by the 55-year-old British computer scientist Adam Back, something Back denies. Carreyrou’s attempts to use computerized stylometry (not the AI services Piper drew on) were unsatisfactory, and he eventually adopted a more hands-on approach to text analysis, which involved looking at Satoshi’s vocabulary, grammatical hyphenation mistakes and the use of British spellings.
Despite Carreyrou’s lack of success, stylometric analysis by generative AI is likely to become more common in many disciplines for the simple reason it is so quick, easy and cheap to carry out. Even if its results are unreliable, people may find it useful as a stimulus for further investigations. And as we know, the fact that generative AI systems can churn out nonsense hasn’t stopped hundreds of millions of people from using and trusting them anyway.
Follow me @glynmoody on Mastodon and on Bluesky.
Filed Under: adam back, anonymity, bitcoin, chatgpt, claude, deanonymization, gemini, generative ai, grammar, john carreyrou, opus, satoshi nakamoto, stylometry, wikipedia
Companies: anthropic, google, new york times, openai


Comments on “The Risks Of Anonymity In The Age Of Generative AI”
There were a lot of goods that came from that. The internet was built on fae rules.
This is going to hurt in a really bad way.
Re:
Not really. If a rapist convicted felon can run for president and win with all that as a matter of public record, why do you think anyone will care about your weird porn or that embarrassing SomethingAwful post you wrote in high school?
We might very well be entering a world where anonymity doesn’t exist, but my bet is that it’ll end up not mattering nearly as much as anyone thought it would.
Re: Re:
Because double-standards are a thing that exists?
Re: Re: Re:
(said the guy who posts under his real name to the guy who doesn’t)
Re: Re:
Women have been punished at work because they got hacked and private nudes were shared.
A school used a ticket from another area to deny that parents lived in the school district.
People might not have previously cared enough to look you up, but now they can have that info handed to them in seconds or without even asking for it.
Re: Re:
I note you jump to “wierd porn” and not… literally anythig this administration is currently actively prosicuting, including but not limited to:
or any of the things that people regularly have to deal with, including:
The internet was built on pseudonominimity. It gives a lot of voice to a lot of people who would otherwise self-censor.
Re: Is this really a problem? Investigate further
The anonymous posters can ask Claude (or other LLM) to write their post. Presumably that would be detected as “AI generated” rather than a particular author…
This is definitely an Opus 4.7 power; I tested it against older Opus and against Sonnet, and it was nowhere near close, which is surprising given the amount of text of mine stored in popular locations online that are definitely indexed.
Then again, it’s been around 30 years since I’ve used my real name against any of the stuff I write online — even back then, I fully expected computing to eventually get to the point where, given enough time and samples, my writing style could be clustered. So by associating my writing with names used by other prolific online authors, I hope that AI confidence levels in my authorship will continue to be low for the foreseeable future.
Ideolect
See also: how they caught the Unabomber
Recommended book and observation
First: “Author Unknown” by Don Foster (2001) is an engaging look at stylometric analysis.
Second: if an AI system can successfully analyze a sufficiently large writing sample to identify its author, then it’s likely that it can also create a writing sample in that author’s style.
Think about the implications of that for phishing, especially since none of the AI companies have even made a token effort to build in any guardrails.
What’s funny is that this only works for people with relatively unique real names. People who have the same name as multiple other writers will still be relatively anonymous.
Or maybe, dramatically less safe, depending on exactly how you look at it. Here’s hoping someone with the same name as me doesn’t go on a bizarre and/or violent bender.
“Secondary account” means we’re talking about pseudonymous writing, because someone with an account would not be anonymous. Anyway, the article is full of cases where a chatbot confidently gives the wrong answer. I suspect this is gonna be about as bullshit as the “artificial intelligence detection” services, flagging real people as being robots—I might be a robot because I like the em-dash—and training students to change their writing style to not get caught up in this, which Techdirt has written about.
Mispelling stuff and alternating between British and American-spellings are things that Satoshi would’ve likely known about and any person trying to avoid stylometry can eazily do. Or maybe we can all just ask our chat-bots to rewrite our text in the style of Kelsey Piper, John Carreyrou or Satoshi—problem solved.
Huh. I’m not sure if the findings are accurate.
It’s assuming they are anonymous and that it would be accurate. I have a hard time believing that an author is so unique across such a large time span of their work that they are that identifiable.
For one incognito mode is not anonymous. Device info, ip, facebook pixel, payment info, etc could all be used to identify you.
But the main reason I’m suspicious is that AI models excel at pattern recognition, so if it was just that I would expect older models and current models to be as accurate. Which leads me to believe either other data or methods are being used to identify them. It could also be training data availability.
Re:
That is my suspicion. It could also be that the “anonymous” mode is leaky.
If Kelsey Piper passed some unpublished samples to someone else, who then asked Claude 4.7 who their author was, that would be a far stronger check.
One of the reasons current “AI” looks powerful is that few of the people writing about it understand the scientific method.
Re:
Unless there is certainty that both the training material and the analyzed material are both written without having been altered otherwise by everything from Google Translate to Grammarly (not to mention all the generated slop post widespread LLM adoption), the ID isn’t going to be sufficient to even be considered evidence, legally, or probably even colloquially. Of course the problem is that there’s no way to ascertain that when dealing with any text written since the advent of such tools.
It works better for identifying authors writing under pseudonyms in the Victorian era, but in today’s world this is a parlor trick at best. The problem is that a lot of tech that are actually parlor tricks are being sold as the real deal. Blockchain tracing is basically all BS all the time since it presumes each address can be mapped to a person and the person has sole custody of the private key, neither of which represents reality. But don’t let that stop the funneling of taxpayer dollars into firms like Chainalysis (a ten year con on the FBI is pretty impressive, congrats to them on that at least). Wait til they hear that everyone in a household shares an IP.
Re: Re:
We might get answers that seem plausible. But, given that these authors are all dead, how could we possibly verify the answers?
“A.I.” sometimes accuses my 90-year-old grandmother of being a robot, and forbids her from reading the local news. To be fair, I have no proof she isn’t a robot; and a human of that age does seem a little odd. But I’m quite sceptical of computer-generated confidence (cf. the etymology of “con man”).
There is no need to worry, the democrats once they take over will have it all under control.
https://www.politico.com/news/2026/04/27/jeffries-says-ai-data-centers-will-be-dem-priority-00893721
Do not doubt them. They always have your back and will never fail to do the right thing.
to
The article focuses on stylometry, but in all of the examples, the AI prominently mentions the subject matter. For example, the mathematician is identified as the probable author of a sample text about a niche topic he is an expert in; the sociologist is identified as the probable author of a sample text which takes his distinctive perspective on how immigration and emigration can be modelled.
It is much easier to identify an author from their prose if it’s on a very specific topic which only a few authors write about. This doesn’t necessarily translate to de-anonymising people who use their anonymity to write about things they dare not write about under their real name.
Re:
I think de-anonymising for niche subjects is almost worse. If I’m a professional and I want to add information about something anonymously (say, someone is accused of impropriety and I want to give context about my interactions with them at a conference), AI only needs to compare my name to a list of fifty or a hundred people. That’s terrifying.
I don’t want to give the conspiracy theorists any fuel since they already spew bullshit about sock puppets, but I asked Claude to analyze one of my comments on another article:
“This reads most like someone in the orbit of tech-skeptic civil liberties commentary — think writers associated with outlets like Techdirt, The Verge’s opinion side, Reason (libertarian), or a mid-sized Substack focused on internet law and free speech. The style is closest to Mike Masnick (Techdirt) or someone who reads him heavily — that same mix of legal literacy, tech fluency, and exasperated political commentary.”
But honestly, this suggests something that shouldn’t be surprising. I’ve been reading Techdirt for 15+ years, so some of the style could be expected to rub off. Though it seems like it analyzed more of the content than the writing style but called that style.
The workaround
Given that LLMs can mimic styles as well, doesn’t that suggest that it could be defeated by feeding it into paraphraser as an author or even a fictional character? Not an ideal situation to be sure, but not all hope for anonymity has been lost yet.