No, Tech Companies Can't Easily Create A 'ContentID' For Harassment, And It Would Be A Disaster If They Did
from the not-how-it-works dept
If Twitter, Facebook or Google wanted to stop their users from receiving online harassment, they could do it tomorrow.See? Just like that. Snap your fingers and boom, harassment goes away. Except, no, it doesn't. Sarah Jeong has put together a fantastic response to Valenti's magical tech thinking, pointing out that ContentID doesn't work well and that harassment is different anyway. As she notes, the only reason ContentID "works" at all (and we use the term "works" loosely) is because it's a pure fingerprinting algorithm, matching content against a database of claimed copyright-covered material. That's very different than sorting out "harassment" which involves a series of subjective determinations.
When money is on the line, internet companies somehow magically find ways to remove content and block repeat offenders. For instance, YouTube already runs a sophisticated Content ID program dedicated to scanning uploaded videos for copyrighted material and taking them down quickly – just try to bootleg music videos or watch unofficial versions of Daily Show clips and see how quickly they get taken down. But a look at the comments under any video and it’s clear there’s no real screening system for even the most abusive language.
If these companies are so willing to protect intellectual property, why not protect the people using your services?
Furthermore, Jeong goes into great detail about how ContentID isn't even particularly good on the copyright front, as we've highlighted for years. It creates both Type I and Type II errors: pulling down plenty of content that isn't infringing, and still letting through plenty of content that is. Add in an even more difficult task of determining "harassment" which is much less identifiable than probable copyright infringement, and you would undoubtedly increase both types of errors to a hilarious degree -- likely shutting down many perfectly legitimate conversations, while doing little to stop actual harassment.
None of this is to suggest that harassment online isn't a serious problem. It is. And it's also possible that some enterprising folks may figure out some interesting, unique and compelling ways of dealing with it, sometimes via technological assistance. But this sort of "magic bullet" thinking is as dangerous as it is ridiculous -- because it often leads to reframing the debate, sometimes to the point of shifting the actual liability of the issue from those actually responsible (whether copyright infringers or harassers) to intermediaries who are providing a platform for communication.
The more aggressive the tool, the greater the chance it will filter out communications that aren’t harassing — particularly, communications one wishes to receive. You can see this in the false positives flagged by systems like Content ID. For example, there’s the time that Content ID took down a video with birds chirping in the background, because it matched an avant-garde song that also had some birds chirping in the background. Or the time NASA’s official clips of a Mars landing got taken down by a news agency. Or the time a livestream was cut off because people began singing "Happy Birthday." Or when a live airing on UStream of the Hugo Awards was interrupted mid-broadcast as the awards ceremony aired clips from Doctor Who and other shows nominated for Hugo Awards.
In the latter case, UStream used something similar but not quite the same as Content ID—one in which blind algorithms automatically censored copyrighted content without the more sophisticated appeals process that YouTube has in place. Robots are not smart; they cannot sense context and meaning. Yet YouTube’s appeals system wouldn’t translate well to anti-harassment tools. What good is a system where you must report each and every instance of harassment and then follow through in a back-and-forth appeals system?