Scribd's Takedown Of The Public Domain Mueller Report Is A Preview Of The EU's Future Under The Copyright Directive

from the what-a-mess dept

For years now, people who understand this stuff have been screaming from the rooftops that automated filtering leads to all sorts of legitimate content being taken down — and yet, the EU went ahead and approved the EU Copyright Directive and its mandatory filters anyway (and, if you’re still repeating the lie that it does not require filters, a quick reminder that multiple politicians who supported the Directive have now admitted that of course it requires filters, so don’t even bother).

And it didn’t take long for a new example of why demanding filters for copyright purposes is incredibly stupid. Last Thursday, the DOJ finally released the redacted version of the Mueller Report. The document is obviously in the public domain as a work of the federal government — and tons of publishers rushed to get book versions on store shelves as they do with every big government report, often turning them into best sellers, despite their availability for free.

Among the many places that the digital version of the report was made available was Scribd, the sort of Youtube-for-PDFs that remains annoying but very popular. And what happened? Well, of course, Scribd took the report down claiming it violated someone’s copyright.

However, in a taste of what?s to come with the EU?s views on copyright enforcement, the online document portal Scribd took down multiple copies of the Mueller Report claiming that their algorithms identified it as a copyrighted work.

Scribd thought the Mueller Report was copyrighted because there was no one to think otherwise?the company uses an algorithm to make determinations about intellectual property violations.

Filters to the rescue in censoring perfectly legitimate public domain content. It wasn’t hard to guess what actually happened. With so many big name publishers offering up their own versions in a rush, some probably used their standard processes to alert Scribd to the text it publishes to prevent uploads of actually pirated books. But, here, there’s no such thing as a “pirated” copy. Indeed, Scribd eventually admitted this is exactly what happened:

A spokeswoman for the company says that ?a leading global publisher? released the report as a book, fooling Scribd?s systems into thinking the report was owned by the publisher. The company identified 32 copies of the document, all of which were removed and then reinstated, she said.

Quartz, who noticed its own copy was taken down due to this automated failure, says that it received a weird sort of apology email from Scribd that basically says “filters, amirite?”

Users affected by the takedown received an email from the company indicating that ?Scribd?s BookID copyright protection system has disabled access? to their documents, even though the email admits, ?This does not necessarily mean that an infringement has occurred? or that the uploaders ?have done anything wrong.?

The email continued, ?Like all automated systems, it will occasionally identify legitimate content as a possible infringement. Unfortunately, the volume of content in Scribd?s library prohibits us from reaching out for verification before BookID disables content. Scribd frequently updates BookID in order to reduce false positives.?

As Quartz notes, this is now the future for basically all content in the EU:

The Council of the European Union recently approved legislation known as Article 13. It?s a law that requires internet platforms to police content for copyright violations before it goes up, rather than only after it is reported as infringing by a third party as they do now. In practice, it is expected to lead to more scenarios like the one here: public domain and other legal uses of work get blocked or taken down by an unaccountable system of corporate dragnets based on code so poorly written and algorithms so poorly trained that it mistakes the most talked-about and most widely shared public-domain documents for copyrighted works.

The good folks over at EFF commented on this with even stronger language:

It is impossible for a copy of the Mueller report to be infringement since it cannot be copyrighted. Filters like BookID don?t work, and in this case, actively work against the public interest.

None of this is a surprise to anyone who actually understands the internet, but because some people falsely believe that the internet has been bad for artists, now we get to deal with the fact that automated filters are going to go censorship crazy.

Filed Under: , , , , , , ,
Companies: scribd

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Scribd's Takedown Of The Public Domain Mueller Report Is A Preview Of The EU's Future Under The Copyright Directive”

Subscribe: RSS Leave a comment
64 Comments
Mason Wheeler (profile) says:

None of this is a surprise to anyone who actually understands the internet, but because some people falsely believe that the internet has been bad for artists, now we get to deal with the fact that automated filters are going to go censorship crazy.

We can start by "censoring" all of Europe until the come to their senses and repeal this mess. Make it nice and simple: "if you’re capable of inflicting EU Copyright Directive liability upon us, you can’t use our platform. Period." The faster the geofence gains widespread adoption, the quicker the nightmare will all be over, because they will end up with no choice but to acknowledge that they need us more than we need them.

Scary Devil Monastery (profile) says:

Re: Re: Re:

" You’re right, copyright had nothing to do with the Mueller report. And yet copyright-based mechanisms took it down anyway."

Because, as governments are increasingly noticing, copyright still fulfills its old role as a censorship tool. Why even pretend to follow constitutional requirements on open government when all you have to do is say "Copyright" and bury whatever information you want, until the reports author has been dead for 70 years?

Anonymous Coward says:

Unfortunately, the volume of content in Scribd’s library prohibits us from reaching out for verification before BookID disables content.

I hate writing like this. It’s not unfortunate. They’re choosing to disable the content prior to contacting the alleged copyright owner. There’s no law in the US (yet) that requires anything like ContentID or BookID. You only have to respond to DMCA requests, which it’s clear none of these were.

T. Ray LeFlambe says:

Re: There's no law against actively protecting copyright.

There’s no law in the US (yet) that requires anything like ContentID or BookID. You only have to respond to DMCA requests, which it’s clear none of these were.

Clearly you’re against ANY copyright enforcement.

Pirates simply won’t / don’t / can’t see ANY cause, ANY justification, ANY consideration for actively protecting work.

But Scribd is a "platform" that at least supposedly is for authors, so protecting work is a high priority.

Thanks for so well pointing up the pirate / author conflict. You are THE WHY of copyright enforcement.

PaulT (profile) says:

Re: Re: There's no law against actively protecting copyright.

"Pirates simply won’t / don’t / can’t see ANY cause, ANY justification, ANY consideration for actively protecting work"

Whereas copyright worshippers will accept, nay demand, the destruction of opportunities for independent artists and the stripping of rights from everybody – even (or often especially) those not involved with a platform, let alone piracy – for the illusion of total control.

There is a middle ground, but you reject it to chase a fantasy. You have to lie and misrepresent even the words of the people you are quoting in order to respond, so dishonest are you and your ilk. So, the rest of us fight, because the demands based on lies and fantasy are demanding too high a price for the meagre return you imagine.

Scary Devil Monastery (profile) says:

Re: Re: Re: There's no law against actively protecting copyright

"Better keep that strawman away from any open flames. It is so desiccated, it is liable to spontaneously combust."

Bobmail/Baghdad bob’s/blues straw men have, by now, been withered, set aflame, burned down, been pissed on to put them out…

Only for Baghdad Bob to painstakingly gather the sodden husks and try to make a new straw man out of the shattered remnants. It’s pretty clear just how much of a dishonest asshole he is when he turns what amounts to actual government censorship into a straw man argument about why anyone wanting to read the Mueller Report must be a pirate.

PaulT (profile) says:

Re: Re: Re:2 There's no law against actively protecting copyr

"a straw man argument about why anyone wanting to read the Mueller Report must be a pirate"

Even better when you consider that it’s not possible to pirate the Mueller Report since it is by definition public domain.

He’s now attacking people for wanting to access their own property.

Scary Devil Monastery (profile) says:

Re: Re: Re:3 There's no law against actively protecting c

"Even better when you consider that it’s not possible to pirate the Mueller Report since it is by definition public domain. He’s now attacking people for wanting to access their own property."

Oh, Bobmail/Baghdad Bob/Blue doesn’t stop there. He’s attacked people for wanting control over their own hardware, for wanting control over their own personal space, for wanting control over their own house and home.

I believe an argument he pushed back on Torrentfreak long ago ended up in a blanket condemnation of personal control which by extension would mean any rape case should end up with an instant condemnation over the victim not immediately removing her clothes when asked to, and the resulting violence being all her fault.

Uriel-238 (profile) says:

Re: Re: "Against ANY copyright enforcement."

Some of us recognize that no copyright enforcement would better serve the public than the copyright system we have.

And no, pirates can see cause to actively protect work but not at the expense of the immense number of false-positive takedowns we have, and not to the degree that copyright has been established. Heck, the rights holders can’t just protect their work, but force commercial viewing on their customers in the name of higher profits. And then they cheat their talent, and overwork and underpay their developers.

At this point IP law is nothing more than a rent seeking racket. IP today suppresses way more science and useful art than it promotes.

So yes, scuttle the lot, says I, and wright a fleet new from the keel on up. Run a fair ship so’s the crew would be glad to serve. Flog them not, or expect to wake to mutiny.

Anonymous Coward says:

Re: Re: Re: "Against ANY copyright enforcement."

Actually, all I ask is that the same takedown procedure which applies to take down sites of companies which have received unresolved consumer complaints as is applied to sites which have been targeted for complaints for copyright issues.

That way, the MegaCorp Record Company gets a nice little notice from their provider that their domain, megacorp.example.com, and all of its content has been deleted because Joe Smedleigh of Lobster’s Armpit, NB sent in a consumer complaint that MegaCorp sold him a vinyl LP with a scratch scratch scratch in it in 1971, which is of course breach of contract and breach of implied warranty, therefore illegal. We’re sorry, but if the provider were not to pull the entire record company’s site, they may lose safe harbour under the Digital Millennium Consumer Act and guilty until proven innocent…

After all, isn’t protecting consumers from being swindled more important and more worthy of swift action? It would make no sense at all if the authors of outright mail and wire fraud were protected by more due process rights than copyright cases.

btr1701 (profile) says:

Re: Re: Re:

Unless Scribd has geo-blocked the EU, there is a law there they need to comply with using something like BookID.

Scribd doesn’t have to geoblock the EU. It’s not their responsibility to play enforcement cop for the EU. Scribd can continue business as usual and if the EU doesn’t like how they’re doing it the EU can geoblock Scribd. But Scribd has no legal responsibility to do it for them.

Anonymous Coward says:

Re: Re: Re: Re:

But Scribd has no legal responsibility to do it for them.

Kind of true and kind of false. Scribd does do business in the EU (having many citizens of the EU as part of their subscription service, which is explicitly billed as available in the EU) so that technically they do have such a responsibility according to EU law (which, as a company which does business in the EU, they are obligated to follow). At least right now.

If they canceled the availability of their subscription to EU residents, then it enters a bit more of a grey area legally (and one which has not been strongly defined anywhere tbh), further complicated by the vast number of offerings on Scribd for which the copyright is owned by people based in the EU. The safest (by far) legal option would be to geo-block the EU, though if they wanted to throw the resources at it they could attempt to litigate the issue themselves to produce (more or less) the first boundary on what actions are required to avoid falling under EU jurisdiction. Now, practically speaking, Scribd could decide that as none of their assets are physically present in the EU they could just ignore EU law and the EU would not have any real ability to punish them. Then again, Scribd is made up of people, which further complicates things if they don’t want to make it dangerous for their employees to travel.

Anonymous Coward says:

Re: Re: Re:2 Re:

No, "doing business" in the EU here means having a physical presence there, office, data center, etc… Merely letting any and all users, regardless of physical location, sign up and use their platform, does not automatically make them subject to a foreign country’s laws. If that were true, then Google/Facebook/Reddit/ Twitter/etc… would be required to to abide by the laws of China, Russia, and North Korea by removing all content they deem objectionable. They do not. If ever those countries would attempt to do so, said companies would fart in their direction and ignore them.

JP the Recurring Zombie (profile) says:

Re: Re:

never complained about Scribd’s filters before

Scrolls to top of Techdirt’s page

Shoot, what’s that newfangled feature called?? Ah yes, search!

Searches for "Scribd filter"
About 153 results (0.17 seconds)

Yep, nothing to see here about Scribd and filters prior to Mike’s recent whiny snowflake liberal conspiracy post!

T. Ray LeFlambe says:

Nerd Harder. -- Re-Write Harder.

1) Wasn’t "censorship". According to you at other times when suits your slant, private persons / corporations by definition cannot "censor". So a mere take-down. Your over-the-top "sky is falling" critique is invalid.

2) A temporary take-down that partincularly in the instant case did not result in it un-available, and timely too.

3) Yes, Nerd Harder. Just because "AI" or "filters" aren’t perfect doesn’t mean that can’t be adequate. — Humans will be required to fix only the few rare failures.

4) In contrast to your assertions, actually the system worked pretty well for the purpose of protecting apparently "copyrighted" work. The system noted the work as coming from a publisher, identified copies on own Scribd servers, and expeditiously removed.

So from perspective of supporting copyright the "filters" worked well and quickly. But YOU HATE THAT PURPOSE and it working at all so revile the "filters".

5) You little more than copy-pasted story, slant, and headline from QZ! Why read Techdirt at all? — Scribd’s filters are already good enough to catch YOU in copy-theft! Be a good day when your lame re-writes are taken down.

Anonymous Coward says:

Re: Nerd Harder. -- Re-Write Harder.

According to you at other times when suits your slant, private persons / corporations by definition cannot "censor"

It becomes censorship when it extends from laws and the government. The only reason Scribd has such a filter is because of the demands of copyright law. When it takes down a PDF, it informs the uploader that it is doing so in order to comply with the law. It is removing content due to the fear of state-enforced legal repercussions – so yes, the word "censorship" is very much on the table.

The system noted the work as coming from a publisher, identified copies on own Scribd servers, and expeditiously removed.

You appear to be under the impression that if a work is "coming from a publisher" then it is a natural assumption that everything it contains is copyrighted and that nobody else has the right to publish it.

As we see in this and countless other cases, that is not true. That is precisely the sort of skewed perception of copyright that leads to situations like this in the first place – and the sort that large corporate publishers would love everyone to believe, since it just presumes they own and control every word that touches a page. They do not.

So from perspective of supporting copyright the "filters" worked well and quickly

The work was not under copyright so no the filters did not work well from that perspective. They failed to "support copyright" utterly and completely.

Mason Wheeler (profile) says:

Re: Nerd Harder. -- Re-Write Harder.

Wasn’t "censorship". According to you at other times when suits your slant, private persons / corporations by definition cannot "censor".

I don’t believe I’ve ever seen Mike express such an idiotic sentiment. Can you provide a citation?

Yes, Nerd Harder. Just because "AI" or "filters" aren’t perfect doesn’t mean that can’t be adequate. — Humans will be required to fix only the few rare failures.

When you’re dealing with something as, well, world-wide as the World Wide Web, no matter how tiny the failure rate is, failures are not "rare" in any objective sense.

They say Facebook has over a billion users. Let’s just say, pulling a wild guess out of nowhere, that the average user makes one post per day. That means that, assuming an absolutely minuscule error rate of 0.1%, they’re going to literally have over a million mistakes every single day.

When you get that big, it’s really true: filters that aren’t perfect are not adequate.

In contrast to your assertions, actually the system worked pretty well for the purpose of protecting apparently "copyrighted" work. The system noted the work as coming from a publisher, identified copies on own Scribd servers, and expeditiously removed.

Except the work should never have been in their "copyrighted works" system in the first place, as it’s impossible to copyright. Once something breaks down at an earlier stage in the process, it doesn’t matter how correctly the later stages work.

There’s a principle in computer programming known as "fail fast." The idea is that the best thing you can do is detect errors as quickly as possible and then deal with them immediately–crashing the program if necessary–because the further you proceed with an error still active inside the system, the more damage it can do, and because the closer the visible failure is to the cause, the easier it is to see what needs to be fixed. This case is a perfect illustration of what happens when the fail-fast principles are not applied.

Anonymous Coward says:

Re: Re: Nerd Harder. -- Re-Write Harder.

Clearly, the BookID system failed, as public domain material was allowed to be submitted. Luckily, the simple existence of BookID suggests a solution: a public domain filter. Before a work can be registered with BookID, it should be run against a filter of the entire body of public domain works, to eliminate errors. Luckily, as unlike the ever-growing body of copywritten works, the public domain is fixed and unchanging, that filter should be much easier to create. Of course, I would be remiss if I ignored the possibility of copywritten works being added to the filter’s filter, so the public domain filter would need its own filter as well. Then I suppose that filter would again require a public domain filter, and so on down the line. I remain confident, however, that we will reach the asymptote of filtering filters prior to exhausting all computational capacity on the planet.

stderric (profile) says:

Re: Re: Re: Nerd Harder. -- Re-Write Harder.

I remain confident, however, that we will reach the asymptote of filtering filters prior to exhausting all computational capacity on the planet.

I believe it’s possible to nest countably infinite filtering filters if you use blockchain. Uncountably infinite ff’s require blockchain plus 5G, I believe.

Stephen T. Stone (profile) says:

Re:

1.) The takedown was done at the behest of EU law — copyright law, to be exact — thereby making it a legal action against a public domain document and thus government-sponsored censorship.

2.) Even a temporary takedown can cause damage under the right circumstances.

3.) The “rare failures” would likely not be so “rare” if more people would fight against bogus takedowns. That they do not is less a matter of the takedown notice being legit and more a matter of “I lack both the time and the money to deal with a lawsuit”. (Or they may not know they can contest a takedown.) As for the “Nerd Harder” doctrine: No amount of “nerding” can teach a takedown bot about Fair Use and context.

4.) The problem with the filter was not that it worked — it was that it worked when it should not have.

5.) Whine somewhere else, child. The grown-ups need to talk.

Anonymous Coward says:

Far too many non-copyrighted published works claim to be copyrighted, sometimes due to the addition of a short introduction, addendum, or other insignificant content. Some publishers claim copyright on things that are 100% public domain, like audio CDs of pre-1923 recordings, which generally list their printing date as the copyyright date.

With so many public domain works being published as copyrighted content, how is a platform like Scribd supposed to figure out which is which?

Bamboo Harvester (profile) says:

C'mon, now...

"With so many big name publishers offering up their own versions in a rush, some probably used their standard processes to alert Scribd to the text it publishes to prevent uploads of actually pirated books."

… I was born early in the morning, but it wasn’t this morning.

Of course the big name publishers Copyrighted their work – annotations, typeface, etc. And made sure that the actual PD text was included, for this exact reason.

Mason Wheeler (profile) says:

Re: Re:

"At this point"? That’s been the case ever since the 1970s, when they radically revamped the laws with a new, poisonous philosophy of ownership culture. For a significant percentage of Techdirt readers, including myself, copyright law has been actively working against the public interest for their entire lives!

Stephen T. Stone (profile) says:

Re: Re:

As an example: All of Michael Jackson’s work would already be in the public domain if copyright terms had not expanded to “life and several decades beyond the death of an artist”. Neither I nor anyone else who listened to his music growing up or saw him perform in his prime will see his work enter the public domain in our lifetimes.

An untold number of copyrighted works are lost, and may be lost in the future, because no one could adequately archive them. Artists who want to use bits of music, movies, and texts from the early-to-mid 20th Century must still license that material. We have lost the ability to spread culture in a way where everyone has an equal opportunity to (legally) enjoy it.

In a just world, the death of the public domain would be treated as a crime against humanity. But we live in a world where a public domain document can be censored because someone else published a copy of it as a book…and a world where lawmakers will never think about whether copyright law has gone too far.

Anonymous Coward says:

OF course theres way way a filter can test is this content fair use ,or commentary, review ,eg many videos on youtube contain clips for the purposes of tv, film reviews , video game reviews etc
expect millions of videos to be blocked before they are even shown
when the new laws come into force in the eu.
at the moment most websites leave most video,s up unless they receive a complaint or a dnca notice about the video.
There maybe 1000,s of documents blocked or taken down all the time,
because they are not as well known as the Mueller report
we dont know about it.

Anonymous Coward says:

Say Cheese

Scribed removed a photo I had posted on their page because the photo – – pelicans flying over a beach – was described as taken with a Leica camera, and a (named) person working for Leica objected, saying I used the word Leica without their permission. Akin to saying I could not use to word Ford, or Mercedes to drive to New York in writing a book.

The photo had been on my site for five years.

Scribed said I had violated their “terms of service” – which makes no sense at all.

I advised Scribed close my account and remove all that had been posted for several years. They did not.

Upshot: I have not/will not purchase anything else from from either Leica or Scribed, seeking instead more intelligent people not inclined to censor non-offensive communication.

Anonymous Coward says:

Kill those responsible and install real people. Not next year, not after 6 years of investigations into whether or not a committee should be considered regarding the possibility of exploring a rollback, now.
It’s the only way this can be rolled back before people are even able to read about what happened anymore – as history is sure to be "copyrighted" when it’s less-than-flattering to these governments.

Nothing else will work.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...