The Most Famous Blunder Of Content Moderation: Do NOT Quote The Princess Bride
from the inconceivable dept
We’ve written stories about people having difficulty recognizing people joking around quoting movies. Sometimes it ends up ridiculously, like the guy who was arrested for quoting Fight Club and had to spend quite some time convincing people he wasn’t actually looking to shoot up an Apple store. We’ve also talked a lot about the impossibility of doing content moderation well at scale. Here’s a story where the two collide (though in a more amusing way).
Kel McClanahan is a well known lawyer in national security/FOIA realm, and the other day on Twitter was lucky enough to have a discussion with Cary Elwes, the actor perhaps best known for his role as Westley in the best movie ever made, The Princess Bride. Kel did what one does after getting to have such a discussion, which was to celebrate it on Twitter.
As one does once the discussion turns to The Princess Bride, people start quoting the movie, which remains one of the most quotable movies of all time.
At one point in the ensuing conversation, Kel had a chance to trot out one of those quotable lines:
That’s Kel saying “I’ll most likely kill you in the morning,” the classic line (SPOILERS!) that Westley says was told to him each night by the Dread Pirate Roberts.
Take a guess what happened next? Yup.

That’s Twitter telling Kel that his tweet, quoting The Princess Bride violated its policies on “abuse and harassment” and asking him to delete it to get back into his account. Eventually Twitter reversed course and gave Kel his account back.
It’s easy to laugh this off (because, well, it is funny). But, it’s also a useful lesson in the impossibility of content moderation. In general, absent any context, “I’ll most likely kill you in the morning” sure could come off as a threatening statement, one that could be seen as abusive or harassing. In many scenarios, that statement would be abusive or harassing and would make users on a social media platform feel unwelcome and threatened.
But, in context, it’s quite clear that this is a joke, a quote from a funny movie.
The issue is that so much of content moderation involves context. This is something that critics of content moderation (both those who want more and those who want less) never seem to fully grasp. How does a content moderator (whether AI or human) have enough context to handle all sorts of issues like this? Do you need to train your AI on classic movies? Do you need to make sure that everyone you hire has seen every popular movie and knows them by heart and can recognize when someone is quoting them?
How do you deal with a situation where someone tries to hide behind the quote — but is actually threatening someone? (Not what Kel did here, but just noting, you can’t just say “okay, leave this line if it’s quoting a movie”).
The point is that it’s ridiculously complicated.
Many people — especially policymakers and the media — seem to think that content moderation is obvious. You take down the bad stuff, and you leave up the good stuff.
But a ridiculous amount of content moderation involves trying to interpret statements where you don’t (or, more often, can’t) know the actual context. Is the comment between friends joking around? Is the comment made to be threatening? Is there a deeper meaning behind it? Is it quoting a movie? What if it’s an inside joke between people?
These things are not easy because there is no easy answer.
And that includes “do nothing.” Because if you do nothing at all you end up in a world in which the world’s worst people embrace that to legitimately threaten and harass people.
This is why I keep saying content moderation is impossible to do well. It’s not that people aren’t trying hard enough. It’s that it’s literally impossible to fully understand all the context. And it’s silly to expect that anyone can.
I asked Kel if he had any thoughts on all this, and here’s his take on the whole thing:
I’m bemused that Twitter’s AI suspended me for a comment that the purported target of the “threat” liked and responded to, but I guess if I’m going to go to Twitter Jail for posting a movie quote, the best way to do so is with the OG movie star in the thread.
Filed Under: content moderation, kel mcclanahan, masnick's impossibility theorem, quotes, the princess bride
Comments on “The Most Famous Blunder Of Content Moderation: Do NOT Quote The Princess Bride”
My Name is Halifaxidocious, you locked my Twitter account, prepare to cry
You keep using those words. I do not think it means what you think it means.”
What do you expect me to say? “As You Wish”?
Re:
and yes, I fucked up, the first line should be NatlSecCnslrs not Halifaxidocious
Mike to further reinforce your point (and because I haven’t seen you put it from this angle before):
If the ‘good vs bad’ really was obvious, why have civilizations failed to remove (or expel) the ‘bad’ for much more 2000[1] years?
Maybe, just maybe it’s extremely difficult to separate good from bad in general, until much after the fact (and that assuming that you even believe there is a common set of values that should define the good/bad)
[1] A somewhat arbitrary number, but it was easy to come up with. Obviously civilization is older than that. But I don’t feel like becoming a historian and looking up when. And further more, the Romans (and greeks before them) had some pretty advanced civilization stuff going on.
Re:
Oh defining ‘bad stuff’ is easy enough, the problem is that every person has their own definition that intersects and differs from the definition held by everyone else to varying degrees, some minor, some not so minor
Let a single person make the definition and the task becomes trivial, you’d just better hope that their definition mostly matches your definition in that situation however otherwise things might get unpleasant.
Re: Re:
Actually, I don’t think defining “bad stuff” is all that easy. Many, if not most, people have situation-dependent definitions of “good” versus “bad”, sometimes self-contradictory.
A: “Fred did X”
B: “Well, that’s just Fred being Fred. Nothing bad.”
A: “Mary did X” (same X)
B: “Mary is a bad girl and that is just plain evil”
How much behaviour is labelled “boys will be boys” while the same behaviour will evoke “good girls just don’t do that”.
Or when a person is in a good mood they see something and think “no biggie” while when they are in a bad mood they’ll yell “stop that right now, you wicked, wicked child”.
But you are absolutely correct that even if you know exactly how one person will react to a situation, it doesn’t necessarily tell you anything about a different person’s reaction.
Re: Re:
We let straight white males decide what’s good and bad and look what happened. It’s time the women, the sexual minorities and the furries took over. At the very least we won’t do a worse job.
Re: Re: Re:
Sounds mostly reasonable, however I have one small concern.
Since misdirections are distractions are like 99% of government activities, when someone screams “Look! A squirrel”, won’t that be significantly more distracting to any furries?
Although, considering how distractible our current lot are, I would not be too surprised if most of them were furries who had a thing for dressing up as a human. So maybe taking off the outer layer would be helpful.
Re: Re: Re:
But what about ふたなり
Surely there’s a space in decisions, right? Right? Where’d everyone, ohh… squirrels.
Its way worse when they use the content moderation as an excuse…
Who me bitter that I still can’t have my data or delete MY account unless I give unto Twitter my real world identity.
You bet your ass I’m bitter.
I guess they only like anonymous users when they know who they are… makes as much sense as their content moderation “policies”.
CMAS?
Buttercup: But Westley, what about the C.M.A.S.s?
Westley: Content Moderation At Scale? I don’t think it exists.
BANNED!
Re:
Nonsense. You’re only saying that because no one has ever achieved it.
Re:
“BANNED!”
I don’t think that word means what you think it means.
Mel brooks
Well at least he didn’t quote a Mel Brooks movie.
Re:
With this laurel, and hardy handshake…
Or were you thinking “Silent Movie”?
Re:
“We find the defendants incredibly guilty. ”
There.
Re:
Two men enter one man leaves.
Content moderation and banning
I got permanently banned from twitter in March 2020 because I said I trusted Gov Inslee on COVID and some doofus said he trusted Trump and I said, “Have fun dying.” Boom. Permanent ban. Appeal denied.
And yet all the misogynists and Nazis….
Context: The bane of filters, platforms, and the people who don’t understand either.
Prodigy was told to take down “defamatory” comments about Stratton Oakmont absent the context that those claims were true.
Well, that’s the thing about life: everything is in context.
Regarding that Paul Chambers guy mentioned in one of the linked articles, since it is apparently the opinion of the UK courts that it is illegal to transmit the movie quote he uttered over the internet… wouldn’t it follow that it is illegal to stream that movie to any UK address?
Twitter Nanny
They don’t like to be called Twitter Cops.
They don’t like when you use popular movie quotes.
They don’t like when you don’t use their popular movie quotes.
So, I fixed my small rant about Twitter Cops being jerks by inserting “the fine people at” in from of Twitter Cops. So much better.
They cannot do moderation well, but they are completely devoted to their way of fucking it up.
I got a temporary facebook ban for mentioning the name of the song “lets lynch the landlord” by the dead kennedys in a thread about the band
My tale of CMAS impossibility
This reminded me when I got “zucced” on Facebook for quoting the groundbreaking graphic adventure game The Secret of Monkey Island in a Point-and-Click Adventure game group.
Once, when I was in a Facebook Point-and-Click group, somebody said “You Fight Like A Dairy Farmer”. I attempted to reply with “How appropriate, you fight like a cow.” (both of which are quotes from the game The Secret of Monkey Island for anyone not in the know) but I was put in Facebook jail for two days for “Abuse and Harassment”. I appealed the decision in the court of Facebook Trust and Safety law which was upheld.
So I couldn’t use Facebook for two days for quoting a classic video game in a group where everyone would understand me and the context of the quote. If this isn’t evidence of Mike Masnick’s Impossibility Theorem, then I don’t know what is.
The best solution to this problem is usually to stop trying to evaluate based on single events with limited context, and instead look for patterns of behavior. Malicious users rarely do things once and then vanish, and non-malicious users rarely use threatening language over and over again, even in contextually appropriate ways.
More evidence increases the odds that context and level of intent will be accurately assessed – not just clearing innocent users but also making it possible to pick out threats that are genuinely credible and imminent. The space is admittedly tricky and perfect certainty is unrealistic, but things can be improved by focusing the priority where it should be with threats of violence – dangerous and abusive users, not the display of bad content.
Unfortunately there are a lot of internal pressures that push moderation teams away from this approach. The fear of missing obvious individual examples, the boost to reportable numbers from rating 100 comments instead of 10 accounts, desire for smaller discrete tasks to automate or send to vendors for scalability – all of these things play against user focused strategies in this space.
Re:
Plenty of people do things once and then go away. It was a whole thing with the Christchurch shootings, and (though less widespread) a copycat in NY earlier this year.
Less extreme, creating new accounts solely to post one or two things, then rinse and repeat is a common troll tactic to get around bans. Extending the timeline before they are banned will do nothing at all to handle them, and could actually help them out once such people figure out how many posts they can get away with before the banhammer comes down. Especially if creating enough innocuous posts elsewhere is enough to delay it.
It wouldn’t exactly be hard to get some very objectional content left up forever, if we require that they do objectional things in multiple places before considering whether those things are objectionable.
I got banned from Twitter for a post replying to a covid mis-informing troll, that went along these lines:
“Look up the actual statistic ‘A’ for countries X, Y, and Z. After you’ve done that compare those values for ‘A’ — and smack your self in the forehead for being an idiot.
Then look up statistic B for countries X, Y, and Z. After you’ve done that compare the values of ‘B’ for each of those countries — and then smack yourself in the forehead again for being such an idiot.”
Apparently (according to Twitter) I was advocating violence and self-harm. But the covid mis-informing troll rolled on, untouched by any threat of Twitter moderation…
Recidivism and connecting bad actors to past accounts is an active area platforms work on – there are options for detecting and aggregating evidence from people doing this. Media reports on the Christchurch perpetrator had racked up significant threats, white supremacy associations, leakage, and preperarion through in person and online behaviour stretching back years.
There are also some assumptions about trolls vs targeted violence perpetrators here – the two populations have substantial differences, depending on whether their behaviour is intent leakage or an attempt to harass and intimidate.
On content, of course it makes no sense to wait and watch the violations pile up – content review and removal should continue and if a person has done enough to justify a ban, that ban should be implemented. However, individual threats should lead to review of the cumulative evidence, including content that isn’t a strong violation. This is the standard threat assessment playbook for offline threats from law enforcement orgs, and the same principles apply.