taylor.rhyne's Techdirt Profile


About taylor.rhyne

Posted on Techdirt - 2 September 2020 @ 12:07pm

Content Moderation Best Practices for Startups

To say content moderation has become a hot topic over the past few years would be an understatement. The conversation has quickly shifted from how to best deal with pesky trolls and spammers ?—? straight into the world of intensely serious topics like genocide and destabilization of democracies.

While this discussion often centers around global platforms like Facebook and Twitter, even the smallest of communities can struggle with content moderation. Just a limited number of toxic members can have an outsize effect on a community’s behavioral norms.

That’s why the issue of content moderation needs to be treated as a priority for all digital communities, large and small. As evidenced by its leap from lower-order concern to front-page news, content moderation is deserving of more attention and care than most are giving it today. As I see it, it’s a first-class engineering problem that calls for a first-class solution. In practical terms, that means providing:

  1. accessible, flexible policies and procedures that account for the shades of gray moderators see day to day; and

  2. technology that makes those policies and procedures feasible, affordable, and effective.

Fortunately, this doesn’t have to be a daunting task. I’ve spent years having conversations with platforms that are homes to tens to hundreds of millions of monthly active users, along with advisors spanning the commercial, academic, and non-profit sectors. From these conversations, I’ve created this collection of content moderation and community building best practices for platforms of all sizes.

Content policies

  1. Use understandable policies.

This applies to both the policies you publish externally and the more detailed, execution-focused version of these policies that help your moderators make informed and consistent decisions. While the decisions and trade-offs underlying these policies are likely complex, once resolution is reached the policies themselves need to be expressed in simple terms so that users can easily understand community guidelines and moderators can more easily recognize violations.

When the rules aren’t clear, two problems arise: (i) moderators may have to rely on gut instincts rather than process, which can lead to inconsistency; and (ii) users lose trust because policies appear arbitrary. Consider providing examples of acceptable and unacceptable behaviors to help both users and moderators see the application of your policies in action (many examples will be more clarifying than just a few). Again, this is not to say that creating policies is an easy process, there will be many edge cases that make this process challenging. We touch more on this below.

  1. Publicize policies and changes.

Don’t pull the rug out from under your users. Post policies in an easy-to-find place, and notify users when they change. How to accomplish the latter will depend on your audience, but you should make a good faith effort to reach them. For some, this may mean emailing; for others, a post pinned to the top of a message board will suffice.

  1. Build policies on top of data.

When your policies are called into question, you want to be able to present a thoughtful approach to their creation and maintenance. Policies based on intuition or haphazard responses to problems will likely cause more issues in the long run. Grounding your content policies on solid facts will make your community a healthier, more equitable place for users.

  1. Iterate.

Times change, and what works when you start your community won’t necessarily work as it grows. For instance, new vocabulary may come into play, and slurs can be reappropriated by marginalized groups as counterspeech. This can be a great opportunity to solicit feedback from your community to both inform changes and more deeply engage users. Keep in mind that change need not be disruptive ?—? communities can absorb lots of small, incremental changes or clarifications to policies.

Harassment and abuse detection

  1. Be proactive.

Addressing abusive content after it’s been posted generally only serves to highlight flaws and omissions in your policies, and puts the onus of moderation on users. Proactive moderation can make use of automated initial detection and human moderators working in concert. Automated systems can flag potentially abusive content, after which human moderators with a more nuanced understanding of your community can jump in to make a final call.

  1. Factor in context.

Words or phrases that are harmful in one setting, may not be in another. Simple mechanisms like word filters and pattern matching are inadequate for this task, as they tend to under-censor harmful content and over-censor non-abusive content. Having policies and systems that can negotiate these kinds of nuances is critical to maintaining a platform’s health.

  1. Create a scalable foundation.

Relying on human moderation and sparse policies may work when your goal is to get up and running, but can create problems down the road. As communities grow, the complexity of expression and behavior grows. Establishing policies that can handle increased scale and complexity over time can save time and money ?— ?and prevent harassment ?— ?in the long term.

  1. Brace for abuse.

There’s always the danger of persistent bad actors poisoning the well for an entire community. They may repeatedly test keyword dictionaries to find gaps, or manipulate naive machine learning-based systems to “pollute the well.” Investing in industrial-grade detection tooling early on is the most effective way to head off these kinds of attacks.

  1. Assess effectiveness.

No system is infallible, so you’ll need to build regular evaluations of your moderation system into your processes. Doing so will help you understand whether a given type of content is being identified correctly or incorrectly? —? or missed entirely. That last part is perhaps the biggest problem you’ll face. I recommend using production data to build evaluation sets, allowing you to track performance over time.

Moderation actions

  1. Act swiftly.

Time is of the essence. ?The longer an offensive post remains, the more harm can come to your users and your community’s reputation. Inaction or delayed response can create the perception that your platform tolerates hateful or harassing content, which can lead to a deterioration of user trust.

  1. Give the benefit of the doubt.

From time to time, even “good” community members may unintentionally post hurtful content. That’s why it’s important to provide ample notice of disciplinary actions like suspensions. Doing so will allow well-intentioned users to course-correct, and, in the case of malicious users, provide a solid basis for more aggressive measures in the future.

  1. Embrace transparency.

One of the biggest risks in taking action against a community member is the chance you’ll come across as capricious or unjustified. Regularly reporting anonymized, aggregated moderation actions will foster a feeling of safety among your user base.

  1. Prepare for edge cases.

Just as you can’t always anticipate new terminology, there will likely be incidents your policies don’t clearly cover. One recommendation for handling these types of hiccups is a process that triggers the use of an arbiter that holds final authority.

Another method is to imagine the content or behavior to be 10,000 times as common as it is today. The action you would take in that scenario can inform the action you take today. Regardless of the system you develop, be sure to document all conversations, debates, and decisions. And once you’ve reached a decision, formalize it by updating your content policy.

  1. Respond appropriately.

Typically, only a small portion of toxic content comes from persistent, determined bad actors. The majority of incidents are due to regular users having an off-day. That’s why it’s important to not apply draconian measures like permanent bans at the drop of a hat. Lighter measures like email or in-app warnings, content removal, and temporary bans send a clear signal about unacceptable behavior while allowing users to learn from their mistakes.

  1. Target remedies.

Depending on the depth of your community, a violation may be limited to a subgroup within a larger group. Be sure to focus on the problematic subgroup to avoid disrupting the higher-level group.

  1. Create an appeals process.

In order to establish and build trust, it’s important to create an equitable structure that allows users to appeal when they believe they’ve been wrongly moderated. As with other parts of your policies, transparency plays a big role. The more effort you put into explaining and publicizing your appeals policy up front, the safer and stronger your community will be in the long run.

  1. Protect moderators.

While online moderation is a relatively new field, the stresses it causes are very real. Focusing on the worst parts of a platform can be taxing psychologically and emotionally. Support for your moderators in the form of removing daily quotas, enforcing break times, and providing counseling is good for your community?—?and the ethical thing to do.

And if you’re considering opening a direct channel for users to communicate with Trust & Safety agents, be aware of the risks. While it can help dissipate heightened user reactions, protecting moderators here is also critical. Use shared, monitored inboxes for inbound messages and anonymized handles for employee accounts. Use data to understand which moderators are exposed to certain categories or critical levels of abusive content. Lastly, provide employees with personal online privacy-protecting solutions such as DeleteMe.


  1. Maintain logs.

Paper trails serve as invaluable reference material. Be sure to keep complete records of flagged content including the content under consideration, associated user or forum data, justification for the flag, moderation decisions, and post mortem notes, when available. This information can help inform future moderation debates and identify inconsistencies in the application of your policies.

  1. Use metrics.

Moderation is possibly the single most impactful determinant of a community user’s experience. Measurement of its effectiveness should be subject to the same rigor you’d apply to any other part of your product. By evaluating your moderation process with both quantitative and qualitative data, you’ll gain insight into user engagement, community health, and the impact of toxic behavior.

  1. Use feedback loops.

A final decision on a content incident need not be the end of the line. Don’t let the data you’ve collected through the process go to waste. Make it a part of regular re-evaluations and updates of content policies to not only save effort on similar incidents, but also to reinforce consistency.

Most importantly, though, your number one content moderation concern should be strategic in nature. As important as all of these recommendations are for maintaining a healthy community, they’re nothing without an overarching vision. Before you define your policies, think through what your community is, who it serves, and how you’d like it to grow. A strong sense of purpose will help guide you through the decisions that don’t have obvious answers?—?and, of course, help attract the audience you want.

This collection of best practices is by no means the be-all and end-all of content moderation, but rather a starting point. This industry is constantly evolving and we’ll all need to work together to keep best practices at the frontier. If you have any comments or suggestions, feel free to share on this Gitlab repo.

Let’s help make the internet a safer, more respectful place for everyone.

Taylor Rhyne is co-founder and Chief Operating Officer of Sentropy, an internet security company building machine learning products to detect and fight online abuse. Rhyne was previously an Engineering Project Manager at Apple on the Siri team where he helped develop and deploy advanced Natural Language Understanding initiatives.

More posts from taylor.rhyne >>