The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale

from the what's-in-a-name dept

Maybe someday AI will be sophisticated, nuanced, and accurate enough to help us with platform content moderation, but that day isn’t today.

Today it prevents an awful lot of perfectly normal and presumably TOS-abiding people from even signing up for platforms. A recent tweet from someone unable to sign up to use an app because it didn’t like her name, as well as many, many, MANY replies from people who’ve had similar experiences, drove this point home:

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

But of course, Facebook is not the only instance where censorship rules based on bare pattern matching interfere not just with speech but with speaker’s ability to even get online to speak.

This dynamic is what’s known as the Scunthorpe Problem. Scunthorpe is a town in the UK whose residents have had an appallingly difficult time using the Internet due to a naughty word being contained within the town name.

The Scunthorpe problem is the blocking of e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that are shared with another (usually obscene) word. While computers can easily identify strings of text within a document, broad blocking rules may result in false positives, causing innocent phrases to be blocked.

The problem was named after an incident in 1996 in which AOL’s profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town’s name contains the substring cunt. Years later, Google’s opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.

(A related dynamic, the Clbuttic Problem, creates issues of its own when, instead of outright blocking, software automatically replaces the allegedly naughty words with ostensibly less-naughty words instead. People attempting to discuss such non-purient topics as Buttbuttin’s Creed and the Lincoln Buttbuttination find this sort of officious editing particularly unhelpful?)

While examples of these dynamics can be amusing, each is also quite chilling to speech, and to speakers wishing to speak.

It’s not something we should be demanding more of, but every time people call for “AI” as a solution to online content challenges these are the censoring problems the call invites.

A big part of the problem is that calls for “AI” tend to treat it like some magical incantation, as if just adding it will solve all our problems. But in the end, AI is just software. Software can be very good at doing certain things, like finding patterns, including patterns in words (and people’s names?). But it’s not good at necessarily knowing what to make of those patterns.

More sophisticated software may be better at understanding context, or even sometimes learning context, but there are still limits to what we can expect from these tools. They are at best imperfect reflections of the imperfect humans who created them, and it’s a mistake to forget that they have not yet replicated, or replaced, human judgment, which itself is often imperfect.

Which is not to say that there is no role for software to help in content moderation. The things that software is good at can make it an important tool to help support human decision-making about online content, especially at scale. But it is a mistake to expect software to supplant human decision-making. Because, as we see from these accruing examples, when we over-rely on them, it ends up being real humans that we hurt.

Filed Under: , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale”

Subscribe: RSS Leave a comment
Mason Wheeler (profile) says:

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

I remember the story, from around 7-8-ish years ago, of a guy named Mark Zuckerberg who had a heck of a time signing up for a Facebook account, because its automated filters kept flagging him as fraudulently attempting to impersonate their founder, despite multiple manual interventions and appropriate documentation provided that yes, this was in fact his real, legal name.

Christenson says:

Duplicate Problem

I once accidentally collected the prescription for a namesake of mine (first and last name) in a CVS pharmacy. The birthdate straightened it out.

But dayum, don’t you think I should be able to sign my name Tom, Dick, or Harry?? lol (or *Blue*, here’s grinning at TD!)

And Facebook, grow the fuck up, or I’ll have to shove something in someone’s Scunthorpe, just like in a Philip K Dick novel involving Wang computers, or was that an ee cummings poem?

Anonymous Coward says:

Re: Duplicate Problem

There was a story I saw online about someone who found two records in their student database, differing only by sex. Same name, birthdate, address. It ended up being two married students—last name and address shared due to marriage, and shared birthdates happen when most people start at the same age.

Handles are probably better than "real" names at avoiding these problems.

Wendy Cockcroft (user link) says:

Re: Re: Duplicate Problem

Indeed, I have to bowdlerize my own name on some platforms because their net nanny doesn’t like “Cockcroft.”

One twerp on Twitter told me I should change it, but hell, no. It’s my name and it’s up to all the stupid little weenies to grow the hell up. Then go look up British place names to find more things to be artificially offended about. The seaside ones are the funniest.

PaulT (profile) says:

Re: Re:

Typically, it’s so that other people can actually find you, since the entire point of social networking is to converse with people who know you IRL.

If you don’t care for that, fair enough, but it’s no mystery why people who want to talk to family and friends they may have previously lost contact with wish to make themselves easy to find.

PaulT (profile) says:

Re: Re: Re: Re:

What happens when you search on Facebook?

I have a very common name and you couldn’t find me on Google very easily, but if you search for my name on Facebook you will see me listed along with a recognisable photo. I’ll probably come up fairly early in the list if we were to share some contacts. I’ve caught up with a lot of lost acquaintances I made pre-social media that way, which may not have happened had I used some kind of unique pseudonym (since people who had lost contact wouldn’t know what to search for).

I do also know people who use pseudonyms exclusively on there, but they tend to be the people deliberately trying to keep old friends away from them, which is not the majority in my experience.

Ninja says:

It gets particularly annoying when you are playing a goddamn single game that MUST be connected and you can’t go silly on names.

Old but gold:

Ppl need to stop being stupid moralists. Dicks, pussies and other bodily functions should have stopped being taboo for a long time now. Facebook and other platforms overmoderating are just a symptom of our stupid moralism.

Mark says:

I am the author of an open source program used by several thousand people worldwide in the science and engineering fields. I often get emails from people with questions about use or some feature of the program. Recently I had an exchange with a gentleman from Belgium (?) with the unfortunate last name of Niggerman. His emails were always filtered to the “Deleted” folder despite there being no rules set to do so. I could not even whitelist his email address.

Also, remember that story about some Christian oriented browsing / publishing filter that changed well know runner Tyson Gay’s name to Tyson Homosexual and actor Dick van Dyke’s name to Penis van Lesbian?

And who could forget the kerfuffle over the naming of the Harry Baals Government Center.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...