The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale

from the what's-in-a-name dept

Maybe someday AI will be sophisticated, nuanced, and accurate enough to help us with platform content moderation, but that day isn't today.

Today it prevents an awful lot of perfectly normal and presumably TOS-abiding people from even signing up for platforms. A recent tweet from someone unable to sign up to use an app because it didn't like her name, as well as many, many, MANY replies from people who've had similar experiences, drove this point home:

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

But of course, Facebook is not the only instance where censorship rules based on bare pattern matching interfere not just with speech but with speaker's ability to even get online to speak.

This dynamic is what's known as the Scunthorpe Problem. Scunthorpe is a town in the UK whose residents have had an appallingly difficult time using the Internet due to a naughty word being contained within the town name.

The Scunthorpe problem is the blocking of e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that are shared with another (usually obscene) word. While computers can easily identify strings of text within a document, broad blocking rules may result in false positives, causing innocent phrases to be blocked.

The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town's name contains the substring cunt. Years later, Google's opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.

(A related dynamic, the Clbuttic Problem, creates issues of its own when, instead of outright blocking, software automatically replaces the allegedly naughty words with ostensibly less-naughty words instead. People attempting to discuss such non-purient topics as Buttbuttin's Creed and the Lincoln Buttbuttination find this sort of officious editing particularly unhelpful…)

While examples of these dynamics can be amusing, each is also quite chilling to speech, and to speakers wishing to speak.

It's not something we should be demanding more of, but every time people call for "AI" as a solution to online content challenges these are the censoring problems the call invites.

A big part of the problem is that calls for "AI" tend to treat it like some magical incantation, as if just adding it will solve all our problems. But in the end, AI is just software. Software can be very good at doing certain things, like finding patterns, including patterns in words (and people's names…). But it's not good at necessarily knowing what to make of those patterns.

More sophisticated software may be better at understanding context, or even sometimes learning context, but there are still limits to what we can expect from these tools. They are at best imperfect reflections of the imperfect humans who created them, and it's a mistake to forget that they have not yet replicated, or replaced, human judgment, which itself is often imperfect.

Which is not to say that there is no role for software to help in content moderation. The things that software is good at can make it an important tool to help support human decision-making about online content, especially at scale. But it is a mistake to expect software to supplant human decision-making. Because, as we see from these accruing examples, when we over-rely on them, it ends up being real humans that we hurt.


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • icon
    Mason Wheeler (profile), 31 Aug 2018 @ 12:07pm

    Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

    I remember the story, from around 7-8-ish years ago, of a guy named Mark Zuckerberg who had a heck of a time signing up for a Facebook account, because its automated filters kept flagging him as fraudulently attempting to impersonate their founder, despite multiple manual interventions and appropriate documentation provided that yes, this was in fact his real, legal name.

    reply to this | link to this | view in chronology ]

  • icon
    Thad (profile), 31 Aug 2018 @ 12:24pm

    I've been on multiple political and current events sites that flag or bawdlerize Senator Chris Coons's name.

    reply to this | link to this | view in chronology ]

  • identicon
    Christenson, 31 Aug 2018 @ 12:34pm

    Duplicate Problem

    I once accidentally collected the prescription for a namesake of mine (first and last name) in a CVS pharmacy. The birthdate straightened it out.

    But dayum, don't you think I should be able to sign my name Tom, Dick, or Harry?? lol (or *Blue*, here's grinning at TD!)

    And Facebook, grow the fuck up, or I'll have to shove something in someone's Scunthorpe, just like in a Philip K Dick novel involving Wang computers, or was that an ee cummings poem?

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 31 Aug 2018 @ 1:59pm

      Re: Duplicate Problem

      There was a story I saw online about someone who found two records in their student database, differing only by sex. Same name, birthdate, address. It ended up being two married students—last name and address shared due to marriage, and shared birthdates happen when most people start at the same age.

      Handles are probably better than "real" names at avoiding these problems.

      reply to this | link to this | view in chronology ]

      • identicon
        Wendy Cockcroft, 4 Sep 2018 @ 5:58am

        Re: Re: Duplicate Problem

        Indeed, I have to bowdlerize my own name on some platforms because their net nanny doesn't like "Cockcroft."

        One twerp on Twitter told me I should change it, but hell, no. It's my name and it's up to all the stupid little weenies to grow the hell up. Then go look up British place names to find more things to be artificially offended about. The seaside ones are the funniest.

        reply to this | link to this | view in chronology ]

  • icon
    discordian_eris (profile), 31 Aug 2018 @ 12:36pm

    I still do not understand why anyone uses their "real" name online. I've been online since the mid '80s, on the 'Net since the early '90s, never used my legal name. Never have, never will.

    reply to this | link to this | view in chronology ]

    • identicon
      Seymore Butts, 31 Aug 2018 @ 12:37pm

      Re:

      You mean your real name is not discordian_eris?

      reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 31 Aug 2018 @ 1:22pm

      Re:

      Typically, it's so that other people can actually find you, since the entire point of social networking is to converse with people who know you IRL.

      If you don't care for that, fair enough, but it's no mystery why people who want to talk to family and friends they may have previously lost contact with wish to make themselves easy to find.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Aug 2018 @ 2:01pm

        Re: Re:

        the entire point of social networking is to converse with people who know you IRL.

        That's not a fact, it's an opinion. Many people use it partially or entirely to converse with people they've never met.

        reply to this | link to this | view in chronology ]

      • icon
        Chris-Mouse (profile), 31 Aug 2018 @ 3:02pm

        Re: Re:

        That doesn't work verry well for me. I happen to share a first and last name with a couple of moderatly famous people. The result is that a Google search for my name pulls up at least ten pages of false leads before you find me.

        reply to this | link to this | view in chronology ]

        • icon
          PaulT (profile), 31 Aug 2018 @ 7:12pm

          Re: Re: Re:

          What happens when you search on Facebook?

          I have a very common name and you couldn't find me on Google very easily, but if you search for my name on Facebook you will see me listed along with a recognisable photo. I'll probably come up fairly early in the list if we were to share some contacts. I've caught up with a lot of lost acquaintances I made pre-social media that way, which may not have happened had I used some kind of unique pseudonym (since people who had lost contact wouldn't know what to search for).

          I do also know people who use pseudonyms exclusively on there, but they tend to be the people deliberately trying to keep old friends away from them, which is not the majority in my experience.

          reply to this | link to this | view in chronology ]

    • icon
      Bergman (profile), 31 Aug 2018 @ 1:48pm

      Re:

      Because if you login to some sites with a fake name they will delete your account?

      reply to this | link to this | view in chronology ]

  • icon
    Designerfx (profile), 31 Aug 2018 @ 12:57pm

    I have a cousin with the spampata last name

    I never get responses when I email her. Maybe I should check my spam folder.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Aug 2018 @ 12:59pm

    AI Is Not A Silver Bullet

    The word "bullet" has been flagged as inappropriate; our AI engine suggests "Silver Suppository" as an alternative.

    reply to this | link to this | view in chronology ]

    • icon
      Bergman (profile), 31 Aug 2018 @ 1:49pm

      Re: AI Is Not A Silver Bullet

      I imagine it also suggests suppository points when editing documents?

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 31 Aug 2018 @ 3:33pm

        Re: Re: AI Is Not A Silver Bullet

        "I imagine it also suggests suppository points when editing documents?"

        Nah, those are "power points"; don't you know anything?

        -- Micro Soft (which name is also banned as derogatory member dissing)

        reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 1 Sep 2018 @ 7:11am

      Re: AI Is Not A Silver Bullet

      AI = Artificial (Political) Incorrectness

      Our singing group could never exist with today's P.C. filters.

      -- (They Say We're) The Monkees

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 2 Sep 2018 @ 12:44pm

        Re: Re: AI Is Not A Silver Bullet

        "Our singing group could never exist with today's P.C. filters. -- (They Say We're) The Monkees"

        "... we're too busy singing (lipsyncing ?)
        To put anybody down."

        reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 7 Sep 2018 @ 4:29am

      "I have yet to meet someone that can outsmart a silver suppository"

      reply to this | link to this | view in chronology ]

  • icon
    Ninja (profile), 31 Aug 2018 @ 1:04pm

    It gets particularly annoying when you are playing a goddamn single game that MUST be connected and you can't go silly on names.

    Old but gold: http://www.cracked.com/blog/5-reasons-diablo-iii-represents-gamings-annoying-future/

    Ppl need to stop being stupid moralists. Dicks, pussies and other bodily functions should have stopped being taboo for a long time now. Facebook and other platforms overmoderating are just a symptom of our stupid moralism.

    reply to this | link to this | view in chronology ]

  • icon
    Mononymous Tim (profile), 31 Aug 2018 @ 1:05pm

    I once worked with a guy named Dick Cummins, and no, Dick wasn't short (oh my, the innuendos keep cummin') for Richard.

    Some people's parents!

    reply to this | link to this | view in chronology ]

  • icon
    Ninja (profile), 31 Aug 2018 @ 1:07pm

    Aaaaaand, ironically my comment filled with all those words got held for moderation. Laughing like a maniac here lmao

    reply to this | link to this | view in chronology ]

    • icon
      Mike Masnick (profile), 31 Aug 2018 @ 3:12pm

      Re:

      Aaaaaand, ironically my comment filled with all those words got held for moderation. Laughing like a maniac here lmao

      Ha! Hilarious. I just cleared it... Sorry about that, but... yeah.

      reply to this | link to this | view in chronology ]

  • icon
    ECA (profile), 31 Aug 2018 @ 1:12pm

    Ai is interesting..

    AI isnt a bad thing, but there is a problem with it..
    The better it is, the longer it is, the SLOWER it is..

    There are ways to make things faster, but then we ADD to the AI, and make it even slower..

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Aug 2018 @ 1:18pm

    I've run into problems several times where a Techdirt article has a cuss word in its URL, which makes it impossible to post in certain comment sections that limit vulgarity. I really do wish y'all would chill with the cuss words in article titles and URLs so I can share your work more frequently.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 31 Aug 2018 @ 1:20pm

    My friend, Norman Conquest and I have the same problem.

    -- Ben Dover

    reply to this | link to this | view in chronology ]

  • identicon
    Mark, 31 Aug 2018 @ 5:16pm

    I am the author of an open source program used by several thousand people worldwide in the science and engineering fields. I often get emails from people with questions about use or some feature of the program. Recently I had an exchange with a gentleman from Belgium (?) with the unfortunate last name of Niggerman. His emails were always filtered to the "Deleted" folder despite there being no rules set to do so. I could not even whitelist his email address.

    Also, remember that story about some Christian oriented browsing / publishing filter that changed well know runner Tyson Gay's name to Tyson Homosexual and actor Dick van Dyke's name to Penis van Lesbian?

    And who could forget the kerfuffle over the naming of the Harry Baals Government Center. https://en.wikipedia.org/wiki/Harry_Baals

    reply to this | link to this | view in chronology ]

  • icon
    NaBUru38 (profile), 1 Sep 2018 @ 8:41am

    Some years ago, the official IndyCar official fantasy league website censored the word Ganassi, which is a major team.

    reply to this | link to this | view in chronology ]

  • icon
    got_runs? (profile), 1 Sep 2018 @ 4:07pm

    Never use your real name.

    reply to this | link to this | view in chronology ]

  • icon
    Alasdair Fox (profile), 3 Sep 2018 @ 6:17am

    I knew a dutch guy called Eggie Prick. I imagine he has some issues signing up on web sites!

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 4 Sep 2018 @ 4:05am

    I once had an email flagged as spam because of the word "specialist". the filter thought I was selling Cialis :-/

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Show Now: Takedown
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories

Close

Email This

This feature is only available to registered users. Register or sign in to use it.