The Most Famous Blunder Of Content Moderation: Do NOT Quote The Princess Bride

from the inconceivable dept

We’ve written stories about people having difficulty recognizing people joking around quoting movies. Sometimes it ends up ridiculously, like the guy who was arrested for quoting Fight Club and had to spend quite some time convincing people he wasn’t actually looking to shoot up an Apple store. We’ve also talked a lot about the impossibility of doing content moderation well at scale. Here’s a story where the two collide (though in a more amusing way).

Kel McClanahan is a well known lawyer in national security/FOIA realm, and the other day on Twitter was lucky enough to have a discussion with Cary Elwes, the actor perhaps best known for his role as Westley in the best movie ever made, The Princess Bride. Kel did what one does after getting to have such a discussion, which was to celebrate it on Twitter.

As one does once the discussion turns to The Princess Bride, people start quoting the movie, which remains one of the most quotable movies of all time.

At one point in the ensuing conversation, Kel had a chance to trot out one of those quotable lines:

That’s Kel saying “I’ll most likely kill you in the morning,” the classic line (SPOILERS!) that Westley says was told to him each night by the Dread Pirate Roberts.

Take a guess what happened next? Yup.

That’s Twitter telling Kel that his tweet, quoting The Princess Bride violated its policies on “abuse and harassment” and asking him to delete it to get back into his account. Eventually Twitter reversed course and gave Kel his account back.

It’s easy to laugh this off (because, well, it is funny). But, it’s also a useful lesson in the impossibility of content moderation. In general, absent any context, “I’ll most likely kill you in the morning” sure could come off as a threatening statement, one that could be seen as abusive or harassing. In many scenarios, that statement would be abusive or harassing and would make users on a social media platform feel unwelcome and threatened.

But, in context, it’s quite clear that this is a joke, a quote from a funny movie.

The issue is that so much of content moderation involves context. This is something that critics of content moderation (both those who want more and those who want less) never seem to fully grasp. How does a content moderator (whether AI or human) have enough context to handle all sorts of issues like this? Do you need to train your AI on classic movies? Do you need to make sure that everyone you hire has seen every popular movie and knows them by heart and can recognize when someone is quoting them?

How do you deal with a situation where someone tries to hide behind the quote — but is actually threatening someone? (Not what Kel did here, but just noting, you can’t just say “okay, leave this line if it’s quoting a movie”).

The point is that it’s ridiculously complicated.

Many people — especially policymakers and the media — seem to think that content moderation is obvious. You take down the bad stuff, and you leave up the good stuff.

But a ridiculous amount of content moderation involves trying to interpret statements where you don’t (or, more often, can’t) know the actual context. Is the comment between friends joking around? Is the comment made to be threatening? Is there a deeper meaning behind it? Is it quoting a movie? What if it’s an inside joke between people?

These things are not easy because there is no easy answer.

And that includes “do nothing.” Because if you do nothing at all you end up in a world in which the world’s worst people embrace that to legitimately threaten and harass people.

This is why I keep saying content moderation is impossible to do well. It’s not that people aren’t trying hard enough. It’s that it’s literally impossible to fully understand all the context. And it’s silly to expect that anyone can.

I asked Kel if he had any thoughts on all this, and here’s his take on the whole thing:

I’m bemused that Twitter’s AI suspended me for a comment that the purported target of the “threat” liked and responded to, but I guess if I’m going to go to Twitter Jail for posting a movie quote, the best way to do so is with the OG movie star in the thread.

Filed Under: , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Most Famous Blunder Of Content Moderation: Do NOT Quote The Princess Bride”

Subscribe: RSS Leave a comment
This comment has been deemed funny by the community.
BJones says:

My Name is Halifaxidocious, you locked my Twitter account, prepare to cry

“Violating our rules against Violence and harassment

You keep using those words. I do not think it means what you think it means.”

“By clicking Delete you acknowledge that your tweet violated the rules”

What do you expect me to say? “As You Wish”?

Anonymous Coward says:

Mike to further reinforce your point (and because I haven’t seen you put it from this angle before):

Many people — especially policymakers and the media — seem to think that content moderation is obvious. You take down the bad stuff, and you leave up the good stuff.

If the ‘good vs bad’ really was obvious, why have civilizations failed to remove (or expel) the ‘bad’ for much more 2000[1] years?

Maybe, just maybe it’s extremely difficult to separate good from bad in general, until much after the fact (and that assuming that you even believe there is a common set of values that should define the good/bad)

[1] A somewhat arbitrary number, but it was easy to come up with. Obviously civilization is older than that. But I don’t feel like becoming a historian and looking up when. And further more, the Romans (and greeks before them) had some pretty advanced civilization stuff going on.

This comment has been deemed insightful by the community.
That One Guy (profile) says:


Oh defining ‘bad stuff’ is easy enough, the problem is that every person has their own definition that intersects and differs from the definition held by everyone else to varying degrees, some minor, some not so minor

Let a single person make the definition and the task becomes trivial, you’d just better hope that their definition mostly matches your definition in that situation however otherwise things might get unpleasant.

Anonymous Coward says:

Re: Re:

Actually, I don’t think defining “bad stuff” is all that easy. Many, if not most, people have situation-dependent definitions of “good” versus “bad”, sometimes self-contradictory.

A: “Fred did X”
B: “Well, that’s just Fred being Fred. Nothing bad.”

A: “Mary did X” (same X)
B: “Mary is a bad girl and that is just plain evil”

How much behaviour is labelled “boys will be boys” while the same behaviour will evoke “good girls just don’t do that”.

Or when a person is in a good mood they see something and think “no biggie” while when they are in a bad mood they’ll yell “stop that right now, you wicked, wicked child”.

But you are absolutely correct that even if you know exactly how one person will react to a situation, it doesn’t necessarily tell you anything about a different person’s reaction.

Anonymous Coward says:

Re: Re: Re:

Sounds mostly reasonable, however I have one small concern.

Since misdirections are distractions are like 99% of government activities, when someone screams “Look! A squirrel”, won’t that be significantly more distracting to any furries?

Although, considering how distractible our current lot are, I would not be too surprised if most of them were furries who had a thing for dressing up as a human. So maybe taking off the outer layer would be helpful.

That Anonymous Coward (profile) says:

Its way worse when they use the content moderation as an excuse…

Who me bitter that I still can’t have my data or delete MY account unless I give unto Twitter my real world identity.
You bet your ass I’m bitter.

I guess they only like anonymous users when they know who they are… makes as much sense as their content moderation “policies”.

Darkness Of Course (profile) says:

Twitter Nanny

They don’t like to be called Twitter Cops.
They don’t like when you use popular movie quotes.
They don’t like when you don’t use their popular movie quotes.

So, I fixed my small rant about Twitter Cops being jerks by inserting “the fine people at” in from of Twitter Cops. So much better.

They cannot do moderation well, but they are completely devoted to their way of fucking it up.

Samuel Abram (profile) says:

My tale of CMAS impossibility

This reminded me when I got “zucced” on Facebook for quoting the groundbreaking graphic adventure game The Secret of Monkey Island in a Point-and-Click Adventure game group.

Once, when I was in a Facebook Point-and-Click group, somebody said “You Fight Like A Dairy Farmer”. I attempted to reply with “How appropriate, you fight like a cow.” (both of which are quotes from the game The Secret of Monkey Island for anyone not in the know) but I was put in Facebook jail for two days for “Abuse and Harassment”. I appealed the decision in the court of Facebook Trust and Safety law which was upheld.

So I couldn’t use Facebook for two days for quoting a classic video game in a group where everyone would understand me and the context of the quote. If this isn’t evidence of Mike Masnick’s Impossibility Theorem, then I don’t know what is.

James Gresham (profile) says:

The best solution to this problem is usually to stop trying to evaluate based on single events with limited context, and instead look for patterns of behavior. Malicious users rarely do things once and then vanish, and non-malicious users rarely use threatening language over and over again, even in contextually appropriate ways.

More evidence increases the odds that context and level of intent will be accurately assessed – not just clearing innocent users but also making it possible to pick out threats that are genuinely credible and imminent. The space is admittedly tricky and perfect certainty is unrealistic, but things can be improved by focusing the priority where it should be with threats of violence – dangerous and abusive users, not the display of bad content.

Unfortunately there are a lot of internal pressures that push moderation teams away from this approach. The fear of missing obvious individual examples, the boost to reportable numbers from rating 100 comments instead of 10 accounts, desire for smaller discrete tasks to automate or send to vendors for scalability – all of these things play against user focused strategies in this space.

Anonymous Coward says:


Plenty of people do things once and then go away. It was a whole thing with the Christchurch shootings, and (though less widespread) a copycat in NY earlier this year.

Less extreme, creating new accounts solely to post one or two things, then rinse and repeat is a common troll tactic to get around bans. Extending the timeline before they are banned will do nothing at all to handle them, and could actually help them out once such people figure out how many posts they can get away with before the banhammer comes down. Especially if creating enough innocuous posts elsewhere is enough to delay it.

It wouldn’t exactly be hard to get some very objectional content left up forever, if we require that they do objectional things in multiple places before considering whether those things are objectionable.

BernardoVerda (profile) says:

I got banned from Twitter for a post replying to a covid mis-informing troll, that went along these lines:

“Look up the actual statistic ‘A’ for countries X, Y, and Z. After you’ve done that compare those values for ‘A’ — and smack your self in the forehead for being an idiot.

Then look up statistic B for countries X, Y, and Z. After you’ve done that compare the values of ‘B’ for each of those countries — and then smack yourself in the forehead again for being such an idiot.”

Apparently (according to Twitter) I was advocating violence and self-harm. But the covid mis-informing troll rolled on, untouched by any threat of Twitter moderation…

Anonymous Coward says:

Recidivism and connecting bad actors to past accounts is an active area platforms work on – there are options for detecting and aggregating evidence from people doing this. Media reports on the Christchurch perpetrator had racked up significant threats, white supremacy associations, leakage, and preperarion through in person and online behaviour stretching back years.

There are also some assumptions about trolls vs targeted violence perpetrators here – the two populations have substantial differences, depending on whether their behaviour is intent leakage or an attempt to harass and intimidate.

On content, of course it makes no sense to wait and watch the violations pile up – content review and removal should continue and if a person has done enough to justify a ban, that ban should be implemented. However, individual threats should lead to review of the cumulative evidence, including content that isn’t a strong violation. This is the standard threat assessment playbook for offline threats from law enforcement orgs, and the same principles apply.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...