Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Content Moderation Case Study: Game Developer Deals With Sexual Content Generated By Users And Its Own AI (2021)

from the ai-to-the-unrescue? dept

Summary: Dealing with content moderation involving user generated content from humans is already quite tricky — but those challenges can reach a different level when artificial intelligence is generating content as well. While the cautionary tale of Microsoft’s AI chatbot Tay may be well known, other developers are still grappling with the challenges of moderating AI-generated content.

AI Dungeon wasn’t the first online text game to leverage the power of artificial intelligence. For nearly as long as gaming has been around, attempts have been made to pair players with algorithmically-generated content to create unique experiences.

AI Dungeon has proven incredibly popular with players, thanks to its use of powerful machine learning algorithms created by Open AI, the latest version of which substantially expands the input data and is capable of generating text that, in many cases, is indistinguishable from content created by humans.

For its first few months of existence, AI Dungeon used an older version of Open AI’s machine learning algorithm. It wasn’t until Open AI granted access to the most powerful version of this software (Generative Pre-Trained Transformer 3 [GPT-3]) that content problems began to develop.

As Tom Simonite reported for Wired, Open AI’s moderation of AI Dungeon input and interaction uncovered some disturbing content being crafted by players as well as its own AI.

A new monitoring system revealed that some players were typing words that caused the game to generate stories depicting sexual encounters involving children. OpenAI asked Latitude to take immediate action. “Content moderation decisions are difficult in some cases, but not this one,” OpenAI CEO Sam Altman said in a statement. “This is not the future for AI that any of us want.”

While Latitude (AI Dungeons’ developer) had limited moderation methods during its first few iterations, its new partnership with Open AI and the subsequent inappropriate content, made it impossible for Latitude to continue its limited moderation and allow this content to remain unmoderated. It was clear that the inappropriate content wasn’t always a case of users feeding input to the AI to lead it towards generating sexually abusive content. Some users reported seeing the AI generate sexual content on its own without any prompts from players. What may have been originally limited to a few users specifically seeking to push the AI towards creating questionable content had expanded due to the AI’s own behavior, which assumed all input sources were valid and usable when generating its own text.

Company Considerations:

  • How can content created by a tool specifically designed to iteratively generate content be effectively moderated to limit the generation of impermissible or unwanted content?
  • What should companies do to stave off the inevitability that their powerful algorithms will be used (and abused) in unexpected (or expected) ways? 
  • How should companies apply moderation standards to published content? How should these standards be applied to content that remains private and solely in the possession of the user?
  • How effective are blocklists when dealing with a program capable of generating an infinite amount of content in response to user interaction?

Issue Considerations:

  • What steps can be taken to ensure a powerful AI algorithm doesn’t become weaponized by users seeking to generate abusive content?

Resolution: AI Dungeon’s first response to Open AI’s concerns was to implement a blocklist that would prevent users from nudging the AI towards generating questionable content, as well as prevent the AI from creating this content in response to user interactions.

Unfortunately, this initial response generated a number of false positives and many users became angry once it was apparent that their private content was being subjected to keyword searches and read by moderators.

AI Dungeon’s creator made tweaks to filters in hopes of mitigating collateral damage. Finally, Latitude arrived at a solution that addressed over-blocking but still allowed it access to Open AI’s algorithm. This is from the developer’s latest update on AI Dungeon’s moderation efforts, published in mid-August 2021:

We’ve agreed upon a new approach with OpenAI that will allow us to shift AI Dungeon’s filtering to have fewer incorrect flags and allow users more freedom in their experience. The biggest change is that instead of being blocked from playing when input triggers OpenAI’s filter, those requests will be handled by our own AI models. This will allow users to continue playing without broader filters that go beyond Latitude’s content policies.

While the fix addressed the overblocking problem, it did create other issues for players, as AI Dungeon’s developer acknowledged in the same post. Users who were shunted to AI Dungeon’s AI would suffer lower performance due to slower processing. On the other hand, routing around Open AI’s filtering system would allow AI Dungeon users more flexibility when crafting stories and limit false flags and account suspensions.

Originally posted to the Trust & Safety Foundation website.

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Study: Game Developer Deals With Sexual Content Generated By Users And Its Own AI (2021)”

Subscribe: RSS Leave a comment
Romane says:

Re: The overblocking might be likely a scunthorpe problem

User-generated content moderation is not obvious you’re right, especially in the gaming sector with real online hate issues (misogyny, threats, harassment, etc.), Two hat, Sentropy or Spectrum Labs can help! They use adaptive moderation / contextual moderation ????

Anonymous Coward says:

The way AI generated stories work is you frequently re-generate prompts before you submit them.

The way the suspension worked was if you submitted a prompt the AI generated back into the AI to get the next prompt, you were auto-banned. It was a load of horseshit.

And of course, the now repeated phrase:
Won’t someone please think of the procedurally generated children?

nasch (profile) says:

Re: Re: Re:

There’s an argument to be made that it’s better to let people interested in it have all the fake child porn they want so there might be less demand for the real thing, and thus fewer children being abused. But there are zero plus or minus zero politicians who want to campaign on a platform of improved accessibility of child porn.

Scary Devil Monastery (profile) says:

Monkey see, Monkey do.

"Some users reported seeing the AI generate sexual content on its own without any prompts from players."

What really surprises me is that the developers failed to foresee that an algorithm dedicated to learning from online behaviors wouldn’t come up with Rule 34. I’m sure the techies who programmed it will be all too happy to invoke the Abigail Oath when they gleefully inform their employers that "learns to mimic human behavior" means exactly that.

Lostinlodos (profile) says:

But, sword and boobs?

Seriously. Medieval dungeon boobies.
This is not exactly new. And fully should have been expecting.
Stories of heroes in loin cloth saving women and then fornicating date to pre Rome.
Hell, by the 800s we had stories of naked women saving abused men (girl power!).

That this would not be a known aspect of user generated content day one…? Facepalm.
The choice of solution to this, is up to them. But only a prude fool would think it wouldn’t have come up.

Oh, and save the dirty nerd cliché! Some of the most brutally assaultive sexual stories are written by women for women.
Serious, go pick up a romance novel.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Older Stuff
15:43 Content Moderation Case Study: Facebook Struggles To Correctly Moderate The Word 'Hoe' (2021) (21)
15:32 Content Moderation Case Study: Linkedin Blocks Access To Journalist Profiles In China (2021) (1)
16:12 Content Moderation Case Studies: Snapchat Disables GIPHY Integration After Racist 'Sticker' Is Discovered (2018) (11)
15:30 Content Moderation Case Study: Tumblr's Approach To Adult Content (2013) (5)
15:41 Content Moderation Case Study: Twitter's Self-Deleting Tweets Feature Creates New Moderation Problems (2)
15:47 Content Moderation Case Studies: Coca Cola Realizes Custom Bottle Labels Involve Moderation Issues (2021) (14)
15:28 Content Moderation Case Study: Bing Search Results Erases Images Of 'Tank Man' On Anniversary Of Tiananmen Square Crackdown (2021) (33)
15:32 Content Moderation Case Study: Twitter Removes 'Verified' Badge In Response To Policy Violations (2017) (8)
15:36 Content Moderation Case Study: Spam "Hacks" in Among Us (2020) (4)
15:37 Content Moderation Case Study: YouTube Deals With Disturbing Content Disguised As Videos For Kids (2017) (11)
15:48 Content Moderation Case Study: Twitter Temporarily Locks Account Of Indian Technology Minister For Copyright Violations (2021) (8)
15:45 Content Moderation Case Study: Spotify Comes Under Fire For Hosting Joe Rogan's Podcast (2020) (64)
15:48 Content Moderation Case Study: Twitter Experiences Problems Moderating Audio Tweets (2020) (6)
15:48 Content Moderation Case Study: Dealing With 'Cheap Fake' Modified Political Videos (2020) (9)
15:35 Content Moderation Case Study: Facebook Removes Image Of Two Men Kissing (2011) (13)
15:23 Content Moderation Case Study: Instagram Takes Down Instagram Account Of Book About Instagram (2020) (90)
15:49 Content Moderation Case Study: YouTube Relocates Video Accused Of Inflated Views (2014) (2)
15:34 Content Moderation Case Study: Pretty Much Every Platform Overreacts To Content Removal Stimuli (2015) (23)
16:03 Content Moderation Case Study: Roblox Tries To Deal With Adult Content On A Platform Used By Many Kids (2020) (0)
15:43 Content Moderation Case Study: Twitter Suspends Users Who Tweet The Word 'Memphis' (2021) (10)
15:35 Content Moderation Case Study: Time Warner Cable Doesn't Want Anyone To See Critical Parody (2013) (14)
15:38 Content Moderation Case Studies: Twitter Clarifies Hacked Material Policy After Hunter Biden Controversy (2020) (9)
15:42 Content Moderation Case Study: Kik Tries To Get Abuse Under Control (2017) (1)
15:31 Content Moderation Case Study: Newsletter Platform Substack Lets Users Make Most Of The Moderation Calls (2020) (8)
15:40 Content Moderation Case Study: Knitting Community Ravelry Bans All Talk Supporting President Trump (2019) (29)
15:50 Content Moderation Case Study: YouTube's New Policy On Nazi Content Results In Removal Of Historical And Education Videos (2019) (5)
15:36 Content Moderation Case Study: Google Removes Popular App That Removed Chinese Apps From Users' Phones (2020) (28)
15:42 Content Moderation Case Studies: How To Moderate World Leaders Justifying Violence (2020) (5)
15:47 Content Moderation Case Study: Apple Blocks WordPress Updates In Dispute Over Non-Existent In-app Purchase (2020) (18)
15:47 Content Moderation Case Study: Google Refuses To Honor Questionable Requests For Removal Of 'Defamatory' Content (2019) (25)
More arrow