Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Content Moderation Case Studies: Using AI To Detect Problematic Edits On Wikipedia (2015)

from the ai-to-the-rescue? dept

Summary: Wikipedia is well known as an online encyclopedia that anyone can edit. This has enabled a massive corpus of knowledge to be created, that has achieved high marks for accuracy, while also recognizing that at any one moment some content may not be accurate, as anyone may have entered in recent changes. Indeed, one of the key struggles that Wikipedia has dealt with over the years is with so-called ?vandals? who change a page not to improve the quality of an entry, but to deliberately decrease the quality.

In late 2015, the Wikimedia Foundation, which runs Wikipedia, announced an artificial intelligence tool, called ORES (Objective Revision Evaluation Service) which they hoped might be useful to effectively pre-score edits for the various volunteer editors so they could catch vandalism quicker.

ORES brings automated edit and article quality classification to everyone via a set of open Application Programming Interfaces (APIs). The system works by training models against edit- and article-quality assessments made by Wikipedians and generating automated scores for every single edit and article.

What?s the predicted probability that a specific edit be damaging? You can now get a quick answer to this question. ORES allows you to specify a project (e.g. English Wikipedia), a model (e.g. the damage detection model), and one or more revisions. The API returns an easily consumable response in JSON format:

The system was not designed, necessarily, to be user-facing, but rather as a system that others could build tools on top of to help with the editing process. Thus it was designed to feed some of its output into other existing and future tools.

Part of the goal of the system, according to the person who created it, Aaron Halfaker, was to hopefully make it easier to teach new editors how to be productive editors on Wikipedia. There was a concern that more and more of the site was controlled by an increasingly small number of volunteers, and new entrants were scared off, sometimes by the various arcane rules. Thus, rather than seeing ORES as a tool for automating content moderation, or as a tool for ?quality control? over edits, Halfaker saw it more as a tool to help experienced editors better guide new, well-meaning, but perhaps inexperienced editors in ways to improve.

The motivation for Mr. Halfaker and the Wikimedia Foundation wasn?t to smack contributors on the wrist for getting things wrong. ?I think we who engineer tools for social communities, have a responsibility to the communities we are working with to empower them,? Mr. Halfaker said. After all, Wikipedia already has three AI systems working well on the site?s quality control, Huggle, STiki and ClueBot NG.

?I don?t want to build the next quality control tool. What I?d rather do is give people the signal and let them work with it,? Mr. Halfaker said.

The artificial intelligence essentially works on two axes. It gives edits two scores: first, the likelihood that it?s a damaging edit, and, second, the odds that it was an edit made in good faith or not. If contributors make bad edits in good faith, the hope is that someone more experienced in the community will reach out to them to help them understand the mistake.

?If you have a sequence of bad scores, then you?re probably a vandal,? Mr. Halfaker said. ?If you have a sequence of good scores with a couple of bad ones, you?re probably a good faith contributor.?

Decisions to be made by Wikipedia:

  • How useful is artificial intelligence in helping to determine the quality of edits?
  • How best to implement a tool like ORES?
    • Should it automatically revert likely ?bad? edits?
    • Should it be used for quality control?
    • Should it be a tool to just highlight edits for volunteers to review?
  • What is likely to encourage more editors to help keep Wikipedia as up to date and clean of vandalism?
  • What data do you train ORES on?  How do you validate the accuracy of the training data?

Questions and policy implications to consider:

  • Are there issues when, because the AI has scored something, the tendency is to assume the AI must be ?correct?? How do you make sure the AI is accurate?
  • Does AI help bring on new editors or does it scare away new editors?
  • Are there ways to prevent inherent bias from being baked into any AI moderation system, especially one trained by existing moderators?

Resolution: Halfaker, who later left Wikimedia to go to Microsoft Research, has published a few papers about ORES since it launched. In 2017, a paper by Halfaker and a few others noted that the tool was increasingly used over the previous three years.

The ORES service has been online since July 2015. Since then, usage has steadily risen as we?ve developed and deployed new models and additional integrations are made by tool developers and researchers. Currently, ORES supports 78 different models and 37 different language-specific wikis.

Generally, we see 50 to 125 requests per minute from external tools that are using ORES? predictions (excluding the MediaWiki extension that is more difficult to track). Sometimes these external requests will burst up to 400-500 requests per second

One thing they noticed was that those using the ORES output often wanted search through the metrics and set their own thresholds rather than accepting the hard coded ones in ORES:

Originally, when we developed ORES, we defined these threshold optimizations in our deployment configuration. But eventually, it became apparent that our users wanted to be able to search through fitness metrics to choose thresholds that matched their own operational concerns. Adding new optimizations and redeploying quickly became a burden on us and a delay for our users. In response, we developed a syntax for requesting an optimization from ORES in realtime using fitness statistics from the models tests

The project also appeared to be successful in getting built into various editing tools, and possibly inspiring ideas for new editing quality tools:

Many tools for counter-vandalism in Wikipedia were already available when we developed ORES. Some of them made use of machine prediction (e.g. Huggle27, STiki, ClueBot NG), but most did not. Soon after we deployed ORES, many developers that had not previously included their own prediction models in their tools were quick to adopt ORES. For example, RealTime Recent Changes includes ORES predictions along-side their realtime interface and FastButtons, a Portuguese Wikipedia gadget, began displaying ORES predictions next to their buttons for quick reviewing and reverting damaging edits. Other tools that were not targeted at counter-vandalism also found ORES predictions? specifically that of article quality (wp10)?useful. For example, RATER,30 a gadget for supporting the assessment of article quality began to include ORES predictions to help their users assess the quality of articles and SuggestBot,31[5] a robot for suggesting articles to an editor, began including ORES predictions in their tables of recommendations.

Many new tools have been developed since ORES was released that may not have been developed at all otherwise. For example, the Wikimedia Foundation product department developed a complete redesign on MediaWiki?s Special:RecentChanges interface that implements a set of powerful filters and highlighting. They took the ORES Review Tool to it?s logical conclusion with an initiative that they referred to as Edit Review Filters. In this interface, ORES scores are prominently featured at the top of the list of available features, and they have been highlighted as one of the main benefits of the new interface to the editing community.

In a later paper, Halfaker explored, among other things, concerns about how AI systems like ORES might reinforce inherent bias.

A 2016 ProPublica investigation [4] raised serious allegations of racial biases in a ML-based tool sold to criminal courts across the US. The COMPAS system by Northpointe, Inc. produced risk scores for defendants charged with a crime, to be used to assist judges in determining if defendants should be released on bail or held in jail until their trial. This expos? began a wave of academic research, legal challenges, journalism, and organizing about a range of similar commercial software tools that have saturated the criminal justice system. Academic debates followed over what it meant for such a system to be ?fair? or ?biased?. As Mulligan et al. discuss, debates over these ?essentially contested concepts? often focused on competing mathematically-defined criteria, like equality of false positives between groups, etc.

When we examine COMPAS, we must admit that we feel an uneasy comparison between how it operates and how ORES is used for content moderation in Wikipedia. Of course, decisions about what is kept or removed from Wikipedia are of a different kind of social consequence than decisions about who is jailed by the state. However, just as ORES gives Wikipedia?s human patrollers a score intended to influence their gatekeeping decisions, so does COMPAS give judges a similarly functioning score. Both are trained on data that assumes a knowable ground truth for the question to be answered by the classifier. Often this data is taken from prior decisions, heavily relying on found traces produced by a multitude of different individuals, who brought quite different assumptions and frameworks to bear when originally making those decisions

Filed Under: , , , ,
Companies: wikimedia

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Studies: Using AI To Detect Problematic Edits On Wikipedia (2015)”

Subscribe: RSS Leave a comment
Mike Masnick (profile) says:

Re: Re: Re: Re:

New post idea: Feed a couple of years of Techdirt posts to an AI and let it write a post based on that input.

A couple months ago, when a bunch of publications were doing that with GPT-3, I asked a few friends who had GPT-3 access about doing that, but by the time someone was willing to, too many people had already done exactly that.

bobob says:

Technology cannot solve social problems. AI should not be used at all to decide bail and at best, on wikipedia, it should serve only to flag poorly written (for some definition of "poorly written") articles, not make a final decision. AI is useful for many things, but not everything can be replaced by an algorithm. At least different people can look at the same thing, have differing opinions and resolve them in creative ways.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Older Stuff
15:43 Content Moderation Case Study: Facebook Struggles To Correctly Moderate The Word 'Hoe' (2021) (21)
15:32 Content Moderation Case Study: Linkedin Blocks Access To Journalist Profiles In China (2021) (1)
16:12 Content Moderation Case Studies: Snapchat Disables GIPHY Integration After Racist 'Sticker' Is Discovered (2018) (11)
15:30 Content Moderation Case Study: Tumblr's Approach To Adult Content (2013) (5)
15:41 Content Moderation Case Study: Twitter's Self-Deleting Tweets Feature Creates New Moderation Problems (2)
15:47 Content Moderation Case Studies: Coca Cola Realizes Custom Bottle Labels Involve Moderation Issues (2021) (14)
15:28 Content Moderation Case Study: Bing Search Results Erases Images Of 'Tank Man' On Anniversary Of Tiananmen Square Crackdown (2021) (33)
15:32 Content Moderation Case Study: Twitter Removes 'Verified' Badge In Response To Policy Violations (2017) (8)
15:36 Content Moderation Case Study: Spam "Hacks" in Among Us (2020) (4)
15:37 Content Moderation Case Study: YouTube Deals With Disturbing Content Disguised As Videos For Kids (2017) (11)
15:48 Content Moderation Case Study: Twitter Temporarily Locks Account Of Indian Technology Minister For Copyright Violations (2021) (8)
15:45 Content Moderation Case Study: Spotify Comes Under Fire For Hosting Joe Rogan's Podcast (2020) (64)
15:48 Content Moderation Case Study: Twitter Experiences Problems Moderating Audio Tweets (2020) (6)
15:48 Content Moderation Case Study: Dealing With 'Cheap Fake' Modified Political Videos (2020) (9)
15:35 Content Moderation Case Study: Facebook Removes Image Of Two Men Kissing (2011) (13)
15:23 Content Moderation Case Study: Instagram Takes Down Instagram Account Of Book About Instagram (2020) (90)
15:49 Content Moderation Case Study: YouTube Relocates Video Accused Of Inflated Views (2014) (2)
15:34 Content Moderation Case Study: Pretty Much Every Platform Overreacts To Content Removal Stimuli (2015) (23)
16:03 Content Moderation Case Study: Roblox Tries To Deal With Adult Content On A Platform Used By Many Kids (2020) (0)
15:43 Content Moderation Case Study: Twitter Suspends Users Who Tweet The Word 'Memphis' (2021) (10)
15:35 Content Moderation Case Study: Time Warner Cable Doesn't Want Anyone To See Critical Parody (2013) (14)
15:38 Content Moderation Case Studies: Twitter Clarifies Hacked Material Policy After Hunter Biden Controversy (2020) (9)
15:42 Content Moderation Case Study: Kik Tries To Get Abuse Under Control (2017) (1)
15:31 Content Moderation Case Study: Newsletter Platform Substack Lets Users Make Most Of The Moderation Calls (2020) (8)
15:40 Content Moderation Case Study: Knitting Community Ravelry Bans All Talk Supporting President Trump (2019) (29)
15:50 Content Moderation Case Study: YouTube's New Policy On Nazi Content Results In Removal Of Historical And Education Videos (2019) (5)
15:36 Content Moderation Case Study: Google Removes Popular App That Removed Chinese Apps From Users' Phones (2020) (28)
15:42 Content Moderation Case Studies: How To Moderate World Leaders Justifying Violence (2020) (5)
15:47 Content Moderation Case Study: Apple Blocks WordPress Updates In Dispute Over Non-Existent In-app Purchase (2020) (18)
15:47 Content Moderation Case Study: Google Refuses To Honor Questionable Requests For Removal Of 'Defamatory' Content (2019) (25)
More arrow