kevin.bankston's Techdirt Profile

kevin.bankston

About kevin.bankston

Posted on Techdirt - 11 July 2018 @ 10:46am

How We Can 'Free' Our Facebook Friends

In the wake of the recent privacy controversy over Facebook and Cambridge Analytica, internet users and policymakers have had a lot of questions on the topic of “data portability”: Is my social network data really mine? Can I take it with me to another platform if I’m unhappy with Facebook? What does the new European privacy law, the General Data Protection Regulation (GDPR), demand in terms of my being able to export my data? What even counts as my data that I should be able to download or share, and as my friends’ data that I shouldn’t?

There’s a growing consensus that being able to easily move your data between social platforms, and perhaps even being able to communicate between different platforms, is necessary to promote competition online and enable new services to emerge. But that raises some difficult technical and policy questions about how to balance such portability and interoperability with your and your friends’ privacy interests—and how to guarantee that new privacy efforts don’t have the unintended consequence of locking in current platforms’ dominance by locking down their control over your data.

To investigate a potential path forward, New America’s Open Technology Institute partnered with Mozilla to host an event earlier this month, “A Deep Dive Into Data Portability: How Can We Enable Platform Competition and Protect Privacy at the Same Time.” It included a tutorial from OTI’s senior policy technologist Ross Schulman on the basic terminology and technologies at issue—for instance, distinguishing between “data portability” and “interoperability,” and explaining what the heck an “Application Programming Interface,” or “API,” is.

The event opened with a forceful keynote from David Cicilline, who’s a congressman for Rhode Island and the top Democrat on the House Judiciary Committee’s Antitrust Subcommittee. “We need pro-competitive policies that give power back to Americans in the form of more rights and greater control over their data,” Cicilline argued. “This starts by taking on walled gardens that block startups and other competitors from entering the market through high switching costs.”

Echoing a Wired op-ed he had previously co-authored, Cicilline highlighted how “[p]eople who may want to leave Facebook are less likely to do so if they aren’t able to seamlessly rebuild their network of contacts, photos, and other social graph data on a competing service or communicate across services.” Just as Congress gave cellphone users the right to “number portability”—lessening the switching cost of changing your cell carrier by giving you the ability to take your phone number with you—Cicilline argued that social network users should have the right to portability of their social media data. Unless we “free the social graph,” as one commentator put it, we may find ourselves locked into the current platform ecosystem with no chance of meaningful competitors emerging.

Importantly, Facebook has offered a feature called “Download Your Information” (DYI) since 2010. This lets users download all of the content they’ve ever posted on Facebook as a browsable HTML archive. (As described in our tech tutorial, other providers like Twitter and Google offer similar options.) However, Facebook’s download feature was originally designed as a personal archiving tool, rather than for easy porting of your data to another service. Indeed, when it was launched, Facebook clearly stated that “[t]his file and the information contained within it, is designed for an individual’s use and not for developers or other services.” That said, over the past couple of months, in response to both the Cambridge Analytica scandal and its data portability obligations under the GDPR, Facebook has revamped the DYI tool to be more portability-friendly. Most notably, Facebook now allows users to download their data in the structured JSON data format (see the tutorial for what that is!) instead of in unstructured HTML, making it much easier to move the data between different services.

But here comes the irony: The one thing you can’t download from Facebook is the one thing you’d most need if you wanted to move to a competing social network—your friends’ contact information, or any other unique information that would help you reconnect with them on another service. Instead, all you get is a list of their names, which isn’t very helpful for re-identifying specific individuals, considering how common many names are. Indeed, as was highlighted during the event, Facebook has long treated its possession of your friends’ contact information as a key competitive advantage, making it difficult for users to collect or export it.

For example, when users were first able to share an email address with friends on their profile page, it was displayed as a graphic rather than as text so that it couldn’t be cut and pasted. Some users may also recall when Facebook, in 2012, temporarily replaced users’ non-Facebook addresses with new “@facebook.com” addresses by default, making it harder to obtain off-Facebook contact information about your friends. And although there’s a hard-to-find setting where Facebook users can allow their friends to download their contact information, it is by default set not to allow such downloading—one of the rare Facebook settings that defaults away from, rather than toward, more sharing with friends.

Facebook has consistently justified its attempts to restrict sharing contact info as a privacy and security measure, but the alignment with its own business goals was always more than a little convenient. In addition, it’s also rather ironic, considering that a huge part of Facebook’s meteoric growth was driven by importing contact information from other services, especially Gmail (which led to a dispute between Google and Facebook back in 2010, when Google briefly cut off Facebook’s ability to access Google contacts over its API because Facebook wasn’t reciprocally allowing other services to access contact information on Facebook). Convenient and ironic or not, Facebook’s reticence to share contact information has only been bolstered by recent events: It was, of course, users’ ability to export data about their friends to outside apps that was at the root of the Cambridge Analytica scandal that has put Facebook in the privacy hot-seat. Meanwhile, thanks to GDPR’s privacy requirements, Facebook would now probably need to get affirmative consent from your friends before letting you export their email addresses, even if they arguably didn’t have to before.

There were no easy answers to this privacy-versus-portability conundrum coming out of our panel discussion. However, there were a few critical takeaways in terms of things that Facebook can and should do now to promote portability—and which are in its own interest to do, as it may face unwanted regulatory action if it doesn’t.

Help Set Clear Technical Standards. Easy portability of data between services will require open standards that everyone can use. Facebook’s offering downloadable data in the JSON file format is a good start, but it and other social networks should consider using the Activity Streams 2.0 open standard, a particular JSON-based format for exporting social media items. Facebook helped develop the standard at the World Wide Web Consortium, but right now only decentralized social network tools like Mastodon use it. On top of that, Facebook and all the other major cloud and social platforms should contribute to the open source Data Transfer Project, which aims to establish a common framework for easily moving data directly between services with just a few clicks and without having to download the data yourself. Google and Microsoft are already participating; others should, too.

Solve the Graph Portability Problem. Social platforms should allow you to export your friends’ contact information—or, if they can’t due to privacy restrictions, otherwise provide unique identifiers or other information sufficient to easily re-identify your friends on another platform. Your social graph is yours, and we need a standardized way to move that graph around. Some ideas that came out of the panel: Facebook could ask all users to give consent for their friends to export their contact information as part of Download Your Information—or at least give friends the power to ask each other for that permission. Or, Facebook could allow users to download some other unique piece of a friend’s data, like the URL of their profile or their unique Facebook user ID number. If that raises security concerns, the data could perhaps be “hashed” to obscure it while maintaining its usefulness as a unique identifier, as Josh Constine at TechCrunch has suggested. Facebook and others could maybe even petition the European Data Protection Board for an interpretation of the GDPR that would clearly allow such sharing for competition purposes. There are a range of possible solutions; the only certainty is that Facebook needs to start identifying and testing approaches now.

Allow Competitive Apps to Use the Facebook Platform. Data portability—letting someone download their data and transfer it elsewhere—isn’t the only way that people can leverage their Facebook data on another service. There’s also interoperability—the ability to use the Facebook Platform API to run an app that can make use of your Facebook data on an ongoing basis. The problem is that Facebook’s policy for app developers has long required that in order to make full use of the API, apps “must not replicate core Facebook features or functionality, and must not promote [their] other apps that do so.” For example, “your app is not eligible… if it contains its own in-app chat functionality or its own user generated feed” akin to Facebook’s messaging product or Facebook’s newsfeed. If Facebook doesn’t want to continue to be viewed by the public and by regulators as a platform monopolist, it needs to remove this anti-competitive provision and allow users to easily make use of their Facebook data on interoperable competing services.

Some of these steps would be easy for Facebook to take. Others would be more challenging. But all are worthwhile, and ultimately necessary, for ensuring an internet ecosystem that continues to be open, innovative, and competitive.

Reposted from New America’s Weekly Newsletter.

Posted on Techdirt - 26 April 2018 @ 01:38pm

Facebook And Google Finally Take First Steps On Road To Transparency About Content Moderation

As internet platforms are aggressively expanding their “moderation” of problematic content in response to increased pressure from policymakers and the public, how can we best hold them accountable and make sure that these private censorship regimes are fair, proportionate, accurate and unbiased?

As we wrote in our last piece for Techdirt at the beginning of the year, right before the first Content Moderation and Removal at Scale Conference in Santa Clara, there is a dire need for meaningful transparency and accountability around content moderation efforts in order to ensure that the new rulers of our virtual public squares–practically governments in their own right, with billions of citizens–are using their power to moderate speech responsibly. This need has only grown as the pressure on Facebookistan and Googledom to deal with the extremists, white supremacists, and fake news operations on their platforms has also grown, and as questions about whether they are abusing their power by not taking down enough content–or by taking down too much–have proliferated.

This trend was most evident in the recent Congressional hearings prompted by the Cambridge Analytica scandal, where some lawmakers rebuked Facebook CEO Mark Zuckerberg for not doing enough to keep certain content off the platform, while others raised concerns that Facebook had demonstrated political bias against the right when determining what content to take down. Similar concerns were voiced by Republicans at today’s hearing in the House Judiciary Committee focused on examining major internet platforms’ content moderation practices (despite the fact that claims of anti-conservative bias having been thoroughly debunked). Such concerns are not limited to the right wing, though–charges of racially-biased censorship have also been levelled from the left.

In response to these growing pressures–and in no small part thanks to years of consistent demands from free expression advocates–Google and Facebook this week both took major strides towards “doing the right thing” and promoting greater transparency around their content moderation practices, in ways that mirror what we were advocating for in our previous article.

First, on Monday afternoon, Google released the industry’s first detailed transparency report focused on content moderation, giving statistics about YouTube content removals based on violation of the service’s Community Guidelines. Among other things, the report highlights the total number of videos removed in the last quarter of 2017 (a staggering 8,284,039 videos), the percentage of videos flagged by human users versus YouTube’s automated flagging systems (the robots flagged four times as many videos as the humans), and a percentage breakdown of the different reasons human flaggers had flagged content (whether it was spam, sexual content, hate speech, terrorist content, etc.) This is the first time any company has published this sort of data at this level of detail–and now that YouTube has taken the first step, it certainly won’t be the last.

Soon after YouTube’s trailblazing transparency report, on Tuesday morning, Facebook made a trailblazing announcement of its own. The company published a much more comprehensive version of its Community Standards, including the detailed internal guidelines the company uses to make moderation decisions, and highlighting the “spirit” of their content policies in order to generate greater understanding about why and how the company removes content. In addition, for the first time, the company is giving users the ability to appeal takedown decisions made on individual posts. Posts that are appealed will be reviewed by a human moderator on the company’s appeals team within 24 hours. Prior to this announcement, users could appeal the removal of pages and groups, but the introduction of this process for individual posts is a valuable step towards providing users with greater agency over their content and more engagement in the moderation process.

Taken together, these moves have sharply increased both the quantitative transparency (Google’s numbers) and the qualitative transparency (Facebook’s explanations) around content takedowns, while also improving due process around those takedowns (Facebook’s new appeals). These are both critical first steps, but there is definitely more to be done. For example, although YouTube published a significant amount of data related to the types of objectionable content removed as a result of human flaggers, it does not produce similar data for content flagged by automated flagging systems, which is especially concerning since automated systems flagged the vast majority of objectionable content. Meanwhile, although Facebook’s introduction of an appeals process is a valuable step towards providing users with stronger due process, it currently only applies to hate speech, graphic violence, and nudity/sexual activity, which have been the most controversial categories of objectionable content. In order for this process to be truly impactful, it needs to apply to all forms of content that are being taken down–and the process needs to give impacted users a way to argue their case for why their content should stay up.

Going forward, Facebook and Google also need to take a page out of each other’s books. Like Google, Facebook needs to start reporting quantitative data on its takedowns and how they have impacted different categories of objectionable content, not only for itself but for its other products like Whatsapp and Instagram. Similarly, Google needs to provide users with greater qualitative insight into the guidelines that impact content takedowns, just as Facebook has. They should also expand their takedown reporting to include other Google products and services such as Google+ and the Google Play store. Doing so could help pressure Apple to similarly report on takedowns in the Apple Store, therefore further expanding transparency reporting in this space.

And that’s the real value of these new steps, beyond the transparency itself: Google and Facebook’s new efforts will hopefully push the rest of the industry to compete with them on transparency. Google’s first innovations around transparency reporting on government surveillance demands nearly a decade ago helped set the stage for a domino effect of widespread adoption once the Snowden surveillance scandal broke, as detailed in this timeline and case study on the spread of that reporting practice. In this political moment of “techlash” that has now been turbo-charged by the Cambridge Analytica scandal, the adoption of strong content moderation transparency practices may happen even faster–but only if policymakers and advocates keep demanding it. That includes voices that have been pressing on this issue for years such as the ACLU of Northern California, the Electronic Frontier Foundation, our own organization the Open Technology Institute, and the Ranking Digital Rights project (which just yesterday released its third annual ranking of how well tech companies’ are protecting users’ human rights. Spoiler alert: they’re not doing so great). And since we’re catching this practice at its beginning, perhaps with the right pressure we can not only get all the companies to issue reports but also get them to standardize their reporting formats. Otherwise we may end up with the same crazy quilt of formats that we have in other areas of transparency reporting, which makes it that much harder to meaningfully compare and combine data.

More than pressure, though, we’ll also need continued dialogue with the companies, to better understand how their content moderation and reporting processes do and don’t work, what their biggest challenges are when moderating at scale, and where they think the technology and practice of content moderation and reporting is heading. That’s why our organization along with many others is co-hosting the second Content Moderation at Scale Conference in Washington, DC on May 7, where representatives from a wide range of tech companies both big and small will be talking in detail and on the record about their internal content moderation processes (the conference will be livestreamed and Techdirt’s Mike Masnick will be co-running a session on some of the challenges of content moderation).

We may see even more dominoes fall at that conference, with fresh new announcements about increased transparency and due process around content moderation on even more platforms. Let’s hope so, because internet users deserve to know more about exactly when and how their online expression is censored.

Posted on Techdirt - 31 January 2018 @ 01:29pm

We Need To Shine A Light On Private Online Censorship

On February 2nd, Santa Clara University is hosting a gathering of tech platform companies to discuss how they actually handle content moderation questions. Many of the participants have written short essays about the questions that will be discussed at this event — and over the next few weeks we’ll be publishing many of those essays, including this one.

In the wake of ongoing concerns about online harassment and harmful content, continued terrorist threats, changing hate speech laws, and the ever-growing user bases of major social media platforms, tech companies are under more pressure than ever before with respect to how they treat content on their platforms—and often that pressure is coming from different directions. Companies are being pushed hard by governments and many users to be more aggressive in their moderation of content, to remove more content and to remove it faster, yet are also consistently coming under fire for taking down too much content or lacking adequate transparency and accountability around their censorship measures. Some on the right like Steve Bannon and FCC Chairman Ajit Pai have complained that social media platforms are pushing a liberal agenda via their content moderation efforts, while others on the left are calling for those same platforms to take down more extremist speech, and free expression advocates are deeply concerned that companies’ content rules are so broad as to impact legitimate, valuable speech, or that overzealous attempts to enforce those rules are accidentally causing collateral damage to wholly unobjectionable speech.

Meanwhile, there is a lot of confusion about what exactly the companies are doing with respect to content moderation. The few publicly available insights into these processes, mostly from leaked internal documents, reveal bizarrely idiosyncratic rule sets that could benefit from greater transparency and scrutiny, especially to guard against discriminatory impacts on oft-marginalized communities. The question of how to address that need for transparency, however, is difficult. There is a clear need for hard data about specific company practices and policies on content moderation, but what does that look like? What qualitative and quantitative data would be most valuable? What numbers should be reported? And what is the most accessible and meaningful way to report this information?

Part of the answer to these questions can be found by looking to the growing field of transparency reporting by internet companies. The most common kind of transparency report that companies voluntarily publish gives detailed numbers about government demands for information about the companies’ users—showing, for example, how many requests were received, from what countries or jurisdictions, what kind of data was requested, and whether they were complied with or not. As reflected in this history of the practice published by our organization, New America’s Open Technology Institute (OTI), transparency reporting about government demands for data has exploded over the past few years, so much so that projects like the Transparency Reporting Toolkit by OTI and Harvard’s Berkman-Klein Center for Internet & Society have emerged to try and define consistent standards and best practices for such reporting. Meanwhile, a decent number of companies have also started publishing reports about the legal demands they receive for the takedown of content, whether copyright-based or otherwise.

However, almost no one is publishing data about what we’re talking about here: voluntary takedowns of content by companies based on their own terms of service (TOS). Yet especially now, as private censorship gets even more aggressive, the need for transparency also increases. This need has led to calls from a variety of corners for companies to report on content moderation. For example, a working group of the Freedom Online Coalition, composed of representatives from industry, civil society, academia, and government, called for meaningful transparency about companies’ content takedown efforts, complaining that “there is very little transparency” around TOS enforcement mechanisms. The 2015 Ranking Digital Rights Corporate Accountability Index found that every company surveyed received a failing grade with respect to reporting on TOS-based takedowns; the 2017 Index findings fared only slightly better. Finally, David Kaye, the United Nations Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression, called for companies to “disclose their policies and actions that implicate freedom of expression.” Specifically, he observed that “there are … gaps in corporate disclosure of statistics concerning volume, frequency and types of request for content removals and user data, whether because of State-imposed restrictions or internal policy decisions.”

The benefits to companies issuing such transparency reports around their content moderation activities would be significant: For those companies under pressure to “do something” about problematic speech online, this is a an opportunity to outline the lengths to which they have gone to do just that; for companies under fire for “not doing enough,” a transparency report would help them express the size and complexity of the problems they are addressing, and explain that there is no magic artificial intelligence wand they can wave and make online extremism and harassment disappear; and finally, public disclosure about content moderation and terms of service practices will go a long way toward building trust with users—a trust that has crumbled in recent years. Putting aside the benefit to companies, though, there is the even more significant need of policymakers and the public. Before we can have an intelligent conversation about hate speech, terrorist propaganda, or other worrisome content online, or formulate fact-based policies about how to address that content, we need hard data about the breadth and depth of those problems, and about the platforms’ current efforts to solve those problems.

While there have been calls for publication of such information, there has been little specificity with respect to what exactly should be published. No doubt this is due, in great part, to the opacity of individual companies’ content moderation policies and processes: It is difficult to identify specific data that would be useful without knowing what data is available in the first place. Anecdotes and snippets of information from companies like Automattic and Twitter offer a starting point for considering what information would be most meaningful and valuable. Facebook has said they are entering a new of era transparency for the platform. Twitter has published some data about content removed for violating its TOS, Google followed suit for some of the content removed from YouTube, and Microsoft has published data on “revenge porn” removals. While each of these examples is a step in the right direction, what we need is a consistent push across the sector for clear and comprehensive reporting on TOS-based takedowns.

Looking to the example of existing reports about legally-mandated takedowns, data that shows the scope and volume of content removals, account removals, and other forms of account or content interference/flagging would be a logical starting point. Information about content that has been flagged for removal by a government actor—such as the U.K.’s Counter Terrorism Internet Referral Unit, which was granted “super flagger” status on YouTube, allowing the agency to flag content in bulk—should also be included, to guard against undue government pressure to censor. More granular information, such as the number of takedowns in particular categories of content (whether sexual content, harassment, extremist speech, etc.), or specification of the particular term of service violated by each piece of taken-down content, would provide even more meaningful transparency. This kind of quantitative data (i.e., numbers and percents) would be valuable on its own, but would be even more helpful if paired with qualitative data to shed more light on the platforms’ opaque content moderation practices and tell users a clear story about how those processes actually work, using compelling anecdotes and examples.

As has already and often happened with existing transparency reports, this data will help keep companies accountable. Few companies will want to demonstrably be the most or least aggressive censor, and anomalous data such as huge spikes around particular types of content will be called out and questioned by one stakeholder group or another. It will also help ensure that overreaching government pressure to takedown more content is recognized and pushed back on, just as in current reporting it has helped identify and put pressure on countries making outsized demands for users’ information. And most importantly, it will help drive policy proposals that are based on facts and figures rather than on emotional pleas or irrational fears—policies that hopefully will help make the internet a safer space for a range of communities while also better protecting free expression.

Unquestionably, the major platforms have become our biggest online gatekeepers when it comes to what we can and cannot say. Whether we want them to have that power or not, and whether we want them to use more or less of that power in regard to this or that type of speech, are questions we simply cannot answer until we have a complete picture of how they are using that power. Transparency reporting is our first and best tool for gaining that insight.

Kevin Bankston is the Director of the Open Technology Institute at New America). Liz Woolery is Senior Policy Analyst at the Open Technology Institute at New America.

More posts from kevin.bankston >>