from the language-is-a-funny-thing dept
Multiple platforms have run into the phenomenon known as the “Scunthorpe problem.” In this famous case, a town whose name no one would ever mistake for offensive was deemed offensive by moderation blocklists simply because within the name of the town is the word “cunt” which many blocklists forbids.
Deploying automated blocklists can be even more challenging when dealing with specialized or niche content, which may use certain terms that are offensive outside of this specific context, but are essential to discussing and understanding the relevant subject matter. A paleontologists’ conference was derailed when the moderation blocklist made it impossible for participants to use words like “bone,” “pubic,” “stream,” and “beaver.”
Facebook has worked continuously to refine its moderation processes, but it still occasionally makes the wrong call when it comes to their blocklists. In January 2021, residents of (and visitors to) a Devon, England landmark were surprised to find their posts and comments vanishing from the site. After a little investigation, it became clear Facebook was deleting posts containing references to the landmark known as Plymouth Hoe.
In addition to being the name of a common garden tool (more on that in a moment), “hoe” also refers to a “sloping ridge shaped like an inverted foot or heel,” such as Plymouth Hoe, which is known locally as the Hoe. Users were temporarily forced to self-censor the harmless term to avoid moderation, either by adding unnecessary punctuation or dropping the “h.” It appeared Facebook’s automated processes believed these comments and posts were using a derogatory term for a romantic partner who is only in a relationship to better their own financial position.
Facebook soon apologized for the moderation error and stated it was “taking steps to rectify the error” and figure out what caused the mistaken moderation in the first place. Problem solved?
The same problem popped up again, this time affecting a New York gardening group. WNY Gardeners, a group with more than 8,000 members, is the latest to be affected by Facebook’s “hoe” pruning. A member responded to the prompt “most loved & indispensable weeding tool” with “Push pull hoe!” Not long after that, the member was informed by Facebook that the comment violated the site’s policy on bullying and harassment.
- How could blocklists and keyword searches be better utilized to detect and remove violations of site policies?
- How much collateral damage from automated moderation should be considered acceptable? Is this an acceptable trade-off for lower moderation costs, which often relies on more automated moderation and fewer human moderators?
- Can AI-based moderation more reliably detect actual violations (rather than innocuous uses of blocklisted terms) as the technology advances? What are the trade-offs with AI-based moderation tools as compared to simple blocklists?
- What mitigation measures might be put in place to deal with a blocklist that catches words with different meanings depending on context?
- Who should be in charge of reviewing a blocklist and how frequently should it be updated?
- Does prohibiting words like “hoe” make a significant dent in online harassment and abuse? Does the tech have the capability to “catch up” (or surpass) the ability of humans to route around moderation efforts?
- Should more resources go to staffing human moderators in order to prevent errors and/or allow for a more robust challenge process that allows content to remain “live” until the challenge process has concluded?
- What ways might automation and human reviewers be used in combination to avoid the more egregious automated blocklist mistakes?
Resolution: Once again, Facebook has apologized for not recognizing the word “hoe” in contexts where it’s appropriate to use. But after two highly-publicized incidents in less than a year — both involving the same word — Facebook has added human moderators to backstop automated calls on flagged terms like these in order to prevent unjustified removals of posts, accounts, or groups.
Originally posted to the Trust & Safety Foundation website.