Content Moderation Case Study: Game Developer Deals With Sexual Content Generated By Users And Its Own AI (2021)
from the ai-to-the-unrescue? dept
Summary: Dealing with content moderation involving user generated content from humans is already quite tricky — but those challenges can reach a different level when artificial intelligence is generating content as well. While the cautionary tale of Microsoft’s AI chatbot Tay may be well known, other developers are still grappling with the challenges of moderating AI-generated content.
AI Dungeon wasn’t the first online text game to leverage the power of artificial intelligence. For nearly as long as gaming has been around, attempts have been made to pair players with algorithmically-generated content to create unique experiences.
AI Dungeon has proven incredibly popular with players, thanks to its use of powerful machine learning algorithms created by Open AI, the latest version of which substantially expands the input data and is capable of generating text that, in many cases, is indistinguishable from content created by humans.
For its first few months of existence, AI Dungeon used an older version of Open AI’s machine learning algorithm. It wasn’t until Open AI granted access to the most powerful version of this software (Generative Pre-Trained Transformer 3 [GPT-3]) that content problems began to develop.
As Tom Simonite reported for Wired, Open AI’s moderation of AI Dungeon input and interaction uncovered some disturbing content being crafted by players as well as its own AI.
A new monitoring system revealed that some players were typing words that caused the game to generate stories depicting sexual encounters involving children. OpenAI asked Latitude to take immediate action. “Content moderation decisions are difficult in some cases, but not this one,” OpenAI CEO Sam Altman said in a statement. “This is not the future for AI that any of us want.”
While Latitude (AI Dungeons’ developer) had limited moderation methods during its first few iterations, its new partnership with Open AI and the subsequent inappropriate content, made it impossible for Latitude to continue its limited moderation and allow this content to remain unmoderated. It was clear that the inappropriate content wasn’t always a case of users feeding input to the AI to lead it towards generating sexually abusive content. Some users reported seeing the AI generate sexual content on its own without any prompts from players. What may have been originally limited to a few users specifically seeking to push the AI towards creating questionable content had expanded due to the AI’s own behavior, which assumed all input sources were valid and usable when generating its own text.
- How can content created by a tool specifically designed to iteratively generate content be effectively moderated to limit the generation of impermissible or unwanted content?
- What should companies do to stave off the inevitability that their powerful algorithms will be used (and abused) in unexpected (or expected) ways?
- How should companies apply moderation standards to published content? How should these standards be applied to content that remains private and solely in the possession of the user?
- How effective are blocklists when dealing with a program capable of generating an infinite amount of content in response to user interaction?
- What steps can be taken to ensure a powerful AI algorithm doesn’t become weaponized by users seeking to generate abusive content?
Resolution: AI Dungeon’s first response to Open AI’s concerns was to implement a blocklist that would prevent users from nudging the AI towards generating questionable content, as well as prevent the AI from creating this content in response to user interactions.
Unfortunately, this initial response generated a number of false positives and many users became angry once it was apparent that their private content was being subjected to keyword searches and read by moderators.
AI Dungeon’s creator made tweaks to filters in hopes of mitigating collateral damage. Finally, Latitude arrived at a solution that addressed over-blocking but still allowed it access to Open AI’s algorithm. This is from the developer’s latest update on AI Dungeon’s moderation efforts, published in mid-August 2021:
We’ve agreed upon a new approach with OpenAI that will allow us to shift AI Dungeon’s filtering to have fewer incorrect flags and allow users more freedom in their experience. The biggest change is that instead of being blocked from playing when input triggers OpenAI’s filter, those requests will be handled by our own AI models. This will allow users to continue playing without broader filters that go beyond Latitude’s content policies.
While the fix addressed the overblocking problem, it did create other issues for players, as AI Dungeon’s developer acknowledged in the same post. Users who were shunted to AI Dungeon’s AI would suffer lower performance due to slower processing. On the other hand, routing around Open AI’s filtering system would allow AI Dungeon users more flexibility when crafting stories and limit false flags and account suspensions.
Originally posted to the Trust & Safety Foundation website.