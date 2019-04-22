Scribd's Takedown Of The Public Domain Mueller Report Is A Preview Of The EU's Future Under The Copyright Directive

For years now, people who understand this stuff have been screaming from the rooftops that automated filtering leads to all sorts of legitimate content being taken down -- and yet, the EU went ahead and approved the EU Copyright Directive and its mandatory filters anyway (and, if you're still repeating the lie that it does not require filters, a quick reminder that multiple politicians who supported the Directive have now admitted that of course it requires filters, so don't even bother).

And it didn't take long for a new example of why demanding filters for copyright purposes is incredibly stupid. Last Thursday, the DOJ finally released the redacted version of the Mueller Report. The document is obviously in the public domain as a work of the federal government -- and tons of publishers rushed to get book versions on store shelves as they do with every big government report, often turning them into best sellers, despite their availability for free.

Among the many places that the digital version of the report was made available was Scribd, the sort of Youtube-for-PDFs that remains annoying but very popular. And what happened? Well, of course, Scribd took the report down claiming it violated someone's copyright.

However, in a taste of what’s to come with the EU’s views on copyright enforcement, the online document portal Scribd took down multiple copies of the Mueller Report claiming that their algorithms identified it as a copyrighted work. Scribd thought the Mueller Report was copyrighted because there was no one to think otherwise—the company uses an algorithm to make determinations about intellectual property violations.

Filters to the rescue in censoring perfectly legitimate public domain content. It wasn't hard to guess what actually happened. With so many big name publishers offering up their own versions in a rush, some probably used their standard processes to alert Scribd to the text it publishes to prevent uploads of actually pirated books. But, here, there's no such thing as a "pirated" copy. Indeed, Scribd eventually admitted this is exactly what happened:

A spokeswoman for the company says that “a leading global publisher” released the report as a book, fooling Scribd’s systems into thinking the report was owned by the publisher. The company identified 32 copies of the document, all of which were removed and then reinstated, she said.

Quartz, who noticed its own copy was taken down due to this automated failure, says that it received a weird sort of apology email from Scribd that basically says "filters, amirite?"

Users affected by the takedown received an email from the company indicating that “Scribd’s BookID copyright protection system has disabled access” to their documents, even though the email admits, “This does not necessarily mean that an infringement has occurred” or that the uploaders “have done anything wrong.” The email continued, “Like all automated systems, it will occasionally identify legitimate content as a possible infringement. Unfortunately, the volume of content in Scribd’s library prohibits us from reaching out for verification before BookID disables content. Scribd frequently updates BookID in order to reduce false positives.”

As Quartz notes, this is now the future for basically all content in the EU:

The Council of the European Union recently approved legislation known as Article 13. It’s a law that requires internet platforms to police content for copyright violations before it goes up, rather than only after it is reported as infringing by a third party as they do now. In practice, it is expected to lead to more scenarios like the one here: public domain and other legal uses of work get blocked or taken down by an unaccountable system of corporate dragnets based on code so poorly written and algorithms so poorly trained that it mistakes the most talked-about and most widely shared public-domain documents for copyrighted works.

The good folks over at EFF commented on this with even stronger language:

It is impossible for a copy of the Mueller report to be infringement since it cannot be copyrighted. Filters like BookID don’t work, and in this case, actively work against the public interest.

None of this is a surprise to anyone who actually understands the internet, but because some people falsely believe that the internet has been bad for artists, now we get to deal with the fact that automated filters are going to go censorship crazy.

