liz.woolery's Techdirt Profile

liz.woolery

About liz.woolery

Posted on Techdirt - 29 March 2019 @ 12:13pm

Three Lessons In Content Moderation From New Zealand And Other High-Profile Tragedies

Following the terrorist attacks on two mosques in Christchurch, New Zealand, social media companies and internet platforms have faced renewed scrutiny and criticism for how they police the sharing of content. Much of that criticism has been directed at Facebook and YouTube, both platforms where video of the shooter’s rampage found a home in the hours after the attacks. The footage was filmed with a body camera and depicts the perpetrator’s attacks over 17 minutes. The video first appeared on Facebook Live, the social network’s real-time video streaming service. From there, Facebook says, it was uploaded to a file-sharing site, the link posted to 8Chan, and began to spread.

While the world struggles to make sense of these horrific terrorist attacks, details about how tech companies handled the shooter’s video footage and written manifesto have been shared, often by the companies themselves. Collectively, these details in combination with the public discourse on and reaction to, as the New York Times referred to it, “a mass murder of, and for, the internet,” have made clear three fundamental facts about content moderation, especially when it comes to live and viral content:

1. Automated Content Analysis is Not a Magic Wand

If you remember nothing else about content moderation, remember this: There is no magic wand. There is no magic wand that can be waved and instantly remove all of the terrorist propaganda, hate speech, graphically violent or otherwise objectionable content. There are some things that automation and machine learning are really good at: functioning within a specific and particular environment (rather than on a massive scale) and identifying repeat occurrences of the exact same (completely unaltered) content, for example. And there are some things they are really bad at: interpreting nuance, understanding slang, and minimizing discrimination and social bias, among many others. But perfect enforcement of a complex rule against a dynamic body of content is not something that automated tools can achieve. For example, the simple change of adding a watermark was enough to defeat automated tools aimed removing video of the New Zealand shooter.

Some, then, have suggested banning of all live video. However, that overlooks activists’ use of live streams to hold government accountable and report on corruption as it is happening, among other uses. Further, the challenges of automated content analysis are by no means limited to video. As a leaked email from Google to its content moderators reportedly warned: “The manifesto will be particularly challenging to enforce against given the length of the document and that you may see various segments of various lengths within the content you are reviewing.”

All of this is to reiterate: There is no magic wand and there never will be. There is absolutely a role for automated content analysis when it comes to keeping certain content off the web. Use of PhotoDNA and similar systems, for example, have reportedly been effective at ensuring that  child pornography stays off platforms. However, the nuance, news value, and intricacies of most speech should give pause to those calling for mass implementation of automated content removal and filtering.

2. The Scale, Speed, and Iterative Nature of Online Content ? Particularly in This Case ? is Enormous

It is a long-standing fact of the internet that it enables communication on a vast scale. Reports from YouTube and Facebook about the New Zealand attack seem to indicate that this particular incident was unprecedented in its volume, speed, and variety. Both of these companies have dedicated content moderation staff and it would be easy to fall into the trap of thinking that this staff could handily keep up with what seems to be multiple copies of a single live video. But that overlooks a couple of realities:

  • The videos are not carbon copies of each other. Any number of changes can make identifying variations of a piece of content difficult. The iterations could include different audio, animation overlays, cropping, color filters, use of overlaid text and/or watermarks, and the addition of commentary (as in news reporting). Facebook alone reported 800 “visually distinct” videos.
  • There is other content ? the normal, run-of-the-mill stuff ? that continues to be posted and needs to be addressed by the same staff that now is also scrambling to keep up with the 17 copies of the video that are being uploaded every second to that single platform (Facebook in this case; YouTube’s numbers were somewhat lower, but still reaching one video upload every second, culminating in hundreds of thousands of copies).

It’s worth noting here that not a single person reported the live video stream to Facebook for review. The video was reportedly viewed “fewer than 200 times” while it was live but the first report came 12 minutes after the stream ended ? a full 29 minutes after the broadcast began. That’s a lot of time for a video to be shared and reposted by people motivated to ensure it spread widely, not only on Facebook, but on other sites as well.

In addition to proving the challenges of automated content review, the New Zealand attacks demonstrated weaknesses of the companies’ own systems, particularly when dealing with emergencies at scale. YouTube, for example, was so overwhelmed by the flood of videos that it opted to circumvent its standard human review process to hasten their removal. Facebook, too, struggled. The company has a process for handling particularly sensitive content, such as an individual threatening to commit suicide; however, that process wasn’t designed to address a live-streamed mass shooting and likely could not easily be adapted to this emergency.

3. We Need Much Greater Transparency

As non-governmental and civil society organizations have hammered home for years, there needs to be more transparency from tech companies about their policies, processes, and practices that impact user rights. One of the more promising developments from 2018 in this space was the release of reports by YouTube, Twitter, and Facebook providing a quick peek under the hood with respect to their content enforcement. While there is still a long way to go from the first year’s reports, the reaction to their publication shows a hunger for and deep interest in further information from tech companies about their handling of content.

Among companies’ next steps should be transparency around specific major incidents, including the New Zealand attacks. Social media platforms are still reeling from over a week of whack-a-mole with a heavy side of criticism. But once they are able to identify trends or data points across the incident, they should be shared publicly and contextualized appropriately. For example, how did Facebook identify and handle the 800 distinct versions of the video? Did those include uses of the video in news reporting? How was the Global Internet Forum to Counter Terrorism ? an entity formed to share information on images and videos between companies ? engaged?

One challenge for companies when providing transparency into their policies and practices is doing so without providing a roadmap to those looking to circumvent the platforms’ systems. However, extant transparency reporting practices ? around, for example, government requests for user data ? suggest companies have found a balance between transparency and security, tweaking their reports over time and contextualizing the data within their larger efforts.

What’s Next?

There are no quick fixes. There are no magic wands. We will continue to debate and discuss and argue about whether the tech companies “did the right thing” as they responded to the New Zealand shooter’s video, his manifesto, and public reaction to both. But as we do so, we need transparency and insight into how those companies have responded, and we need a shared understanding of the tools and realities of the problem.

As details have emerged about the attacks in New Zealand and how they played out on social media, much of the analysis around internet companies’ handling of user content has fallen into one of two buckets:

  • Tech companies aren’t doing enough and refuse to use their enormous money/power/tools/resources to address the problem; or
  • The problem is unsolvable because the volume of content is too great for platforms to handle effectively.

The problem with presenting the issue as this dichotomy is that it overlooks ? really, completely ignores ? the fact that an unknown number of viewers watched the live video but did not report it to Facebook. Perhaps some viewers were confused and maybe others believed it was a joke. But the reality is that some people will always chose to use technology for harm. Given that fact, the question that will ultimately guide this debate and shape how we move forward is: What do we want our social media networks to be? Until we can answer that question, it will be hard, if not impossible, to address all of these challenges.

Reposted from the Center for Democracy & Technology

Posted on Techdirt - 31 January 2018 @ 01:29pm

We Need To Shine A Light On Private Online Censorship

On February 2nd, Santa Clara University is hosting a gathering of tech platform companies to discuss how they actually handle content moderation questions. Many of the participants have written short essays about the questions that will be discussed at this event — and over the next few weeks we’ll be publishing many of those essays, including this one.

In the wake of ongoing concerns about online harassment and harmful content, continued terrorist threats, changing hate speech laws, and the ever-growing user bases of major social media platforms, tech companies are under more pressure than ever before with respect to how they treat content on their platforms—and often that pressure is coming from different directions. Companies are being pushed hard by governments and many users to be more aggressive in their moderation of content, to remove more content and to remove it faster, yet are also consistently coming under fire for taking down too much content or lacking adequate transparency and accountability around their censorship measures. Some on the right like Steve Bannon and FCC Chairman Ajit Pai have complained that social media platforms are pushing a liberal agenda via their content moderation efforts, while others on the left are calling for those same platforms to take down more extremist speech, and free expression advocates are deeply concerned that companies’ content rules are so broad as to impact legitimate, valuable speech, or that overzealous attempts to enforce those rules are accidentally causing collateral damage to wholly unobjectionable speech.

Meanwhile, there is a lot of confusion about what exactly the companies are doing with respect to content moderation. The few publicly available insights into these processes, mostly from leaked internal documents, reveal bizarrely idiosyncratic rule sets that could benefit from greater transparency and scrutiny, especially to guard against discriminatory impacts on oft-marginalized communities. The question of how to address that need for transparency, however, is difficult. There is a clear need for hard data about specific company practices and policies on content moderation, but what does that look like? What qualitative and quantitative data would be most valuable? What numbers should be reported? And what is the most accessible and meaningful way to report this information?

Part of the answer to these questions can be found by looking to the growing field of transparency reporting by internet companies. The most common kind of transparency report that companies voluntarily publish gives detailed numbers about government demands for information about the companies’ users—showing, for example, how many requests were received, from what countries or jurisdictions, what kind of data was requested, and whether they were complied with or not. As reflected in this history of the practice published by our organization, New America’s Open Technology Institute (OTI), transparency reporting about government demands for data has exploded over the past few years, so much so that projects like the Transparency Reporting Toolkit by OTI and Harvard’s Berkman-Klein Center for Internet & Society have emerged to try and define consistent standards and best practices for such reporting. Meanwhile, a decent number of companies have also started publishing reports about the legal demands they receive for the takedown of content, whether copyright-based or otherwise.

However, almost no one is publishing data about what we’re talking about here: voluntary takedowns of content by companies based on their own terms of service (TOS). Yet especially now, as private censorship gets even more aggressive, the need for transparency also increases. This need has led to calls from a variety of corners for companies to report on content moderation. For example, a working group of the Freedom Online Coalition, composed of representatives from industry, civil society, academia, and government, called for meaningful transparency about companies’ content takedown efforts, complaining that “there is very little transparency” around TOS enforcement mechanisms. The 2015 Ranking Digital Rights Corporate Accountability Index found that every company surveyed received a failing grade with respect to reporting on TOS-based takedowns; the 2017 Index findings fared only slightly better. Finally, David Kaye, the United Nations Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression, called for companies to “disclose their policies and actions that implicate freedom of expression.” Specifically, he observed that “there are … gaps in corporate disclosure of statistics concerning volume, frequency and types of request for content removals and user data, whether because of State-imposed restrictions or internal policy decisions.”

The benefits to companies issuing such transparency reports around their content moderation activities would be significant: For those companies under pressure to “do something” about problematic speech online, this is a an opportunity to outline the lengths to which they have gone to do just that; for companies under fire for “not doing enough,” a transparency report would help them express the size and complexity of the problems they are addressing, and explain that there is no magic artificial intelligence wand they can wave and make online extremism and harassment disappear; and finally, public disclosure about content moderation and terms of service practices will go a long way toward building trust with users—a trust that has crumbled in recent years. Putting aside the benefit to companies, though, there is the even more significant need of policymakers and the public. Before we can have an intelligent conversation about hate speech, terrorist propaganda, or other worrisome content online, or formulate fact-based policies about how to address that content, we need hard data about the breadth and depth of those problems, and about the platforms’ current efforts to solve those problems.

While there have been calls for publication of such information, there has been little specificity with respect to what exactly should be published. No doubt this is due, in great part, to the opacity of individual companies’ content moderation policies and processes: It is difficult to identify specific data that would be useful without knowing what data is available in the first place. Anecdotes and snippets of information from companies like Automattic and Twitter offer a starting point for considering what information would be most meaningful and valuable. Facebook has said they are entering a new of era transparency for the platform. Twitter has published some data about content removed for violating its TOS, Google followed suit for some of the content removed from YouTube, and Microsoft has published data on “revenge porn” removals. While each of these examples is a step in the right direction, what we need is a consistent push across the sector for clear and comprehensive reporting on TOS-based takedowns.

Looking to the example of existing reports about legally-mandated takedowns, data that shows the scope and volume of content removals, account removals, and other forms of account or content interference/flagging would be a logical starting point. Information about content that has been flagged for removal by a government actor—such as the U.K.’s Counter Terrorism Internet Referral Unit, which was granted “super flagger” status on YouTube, allowing the agency to flag content in bulk—should also be included, to guard against undue government pressure to censor. More granular information, such as the number of takedowns in particular categories of content (whether sexual content, harassment, extremist speech, etc.), or specification of the particular term of service violated by each piece of taken-down content, would provide even more meaningful transparency. This kind of quantitative data (i.e., numbers and percents) would be valuable on its own, but would be even more helpful if paired with qualitative data to shed more light on the platforms’ opaque content moderation practices and tell users a clear story about how those processes actually work, using compelling anecdotes and examples.

As has already and often happened with existing transparency reports, this data will help keep companies accountable. Few companies will want to demonstrably be the most or least aggressive censor, and anomalous data such as huge spikes around particular types of content will be called out and questioned by one stakeholder group or another. It will also help ensure that overreaching government pressure to takedown more content is recognized and pushed back on, just as in current reporting it has helped identify and put pressure on countries making outsized demands for users’ information. And most importantly, it will help drive policy proposals that are based on facts and figures rather than on emotional pleas or irrational fears—policies that hopefully will help make the internet a safer space for a range of communities while also better protecting free expression.

Unquestionably, the major platforms have become our biggest online gatekeepers when it comes to what we can and cannot say. Whether we want them to have that power or not, and whether we want them to use more or less of that power in regard to this or that type of speech, are questions we simply cannot answer until we have a complete picture of how they are using that power. Transparency reporting is our first and best tool for gaining that insight.

Kevin Bankston is the Director of the Open Technology Institute at New America). Liz Woolery is Senior Policy Analyst at the Open Technology Institute at New America.

More posts from liz.woolery >>