Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot
from the overplaying-their-hand dept
Google, being the search giant that it is, has been banging the drum for some time about the silly way the DMCA has been abused by those that wield it like a cudgel. Here at Techdirt, we too have described the many ways that the well-intentioned DMCA and the way its implemented by service providers has deviated from its intended purpose. Still, the vast majority of our stories discuss deliberate attempts by human beings to silence critics and competition using the takedown process. Google, on the other hand, has been far more focused on statistics for DMCA takedown notices that show wanton disregard for what it was supposed to be used for entirely. That makes sense of course, as the abuse of the takedown process is a burden on the search company. In that first link, for instance, Google noted that more than half the takedown notices it was receiving in 2009 were mere attempts by one business targeting a competitor, while over a third of the notices contained nothing in the way of a valid copyright dispute.
But if those numbers were striking in 2009, Google's latest comment to the Copyright Office (see our own comment here) on what's happening in the DMCA 512 notice-and-takedown world shows some stats for takedown notices received through its Trusted Copyright Removal Program... and makes the whole ordeal look completely silly.
A significant portion of the recent increases in DMCA submission volumes for Google Search stem from notices that appear to be duplicative, unnecessary, or mistaken. As we explained at the San Francisco Roundtable, a substantial number of takedown requests submitted to Google are for URLs that have never been in our search index, and therefore could never have appeared in our search results. For example, in January 2017, the most prolific submitter submitted notices that Google honored for 16,457,433 URLs. But on further inspection, 16,450,129 (99.97%) of those URLs were not in our search index in the first place. Nor is this problem limited to one submitter: in total, 99.95% of all URLs processed from our Trusted Copyright Removal Program in January 2017 were not in our index.
Now, because Google is Google, the company doesn't generally have a great deal of sympathy hoisted upon it by the public, never mind by copyright protectionists. But, come on, this is simply nuts. When the number of claims coming through the system that don't even pertain to listed results by Google can be logically rounded up to 100%, that's putting a burden on a company for no valid reason whatsoever. Even if you hate Google, or distrust it, it should be plain as day that it's unfair for it to have to wade through all this muck just to appease the entertainment industries.
And, it's important to note that this isn't all of the notices received, but just those coming through the Trusted Copyright Removal system -- meaning that these are organizations that supposedly are supposed to have at least some credibility not to be submitting totally bogus notices. But, apparently, they don't actually give a damn.
The problem, as you may have already guessed, is that most of these claims are being generated through automated systems designed to shotgun-blast DMCA notices with reckless abandon.
These numbers of simply staggering with only a tiny number of millions of requests reflecting actual pages in the search index. Rather, 99.95% of the processed URLs from Google’s trusted submitter program are machine-generated URLs that do not involve actual pages in the search index. Given that data, Google notes that claims that the large number of requests correlates to infringing content on the Internet is incorrect:
Nor is the large number of takedown requests to Google a good proxy even for the volume of infringing material available on the Internet. Many of these submissions appear to be generated by merely scrambling the words in a search query and appending that to a URL, so that each query makes a different URL that nonetheless leads to the same page of results.
The claim by the entertainment industry that one can see what a problem piracy is by looking at the sheer volume of DMCA notices sent to search engines shall hereby be declared dead, having been buried by the industry's fellow takedown-notice-filers. That claim never made much sense, but these stats sever any link between takedown notice numbers and actual piracy completely. And there needs to be a remedy for this, whether its punishment upon the abusers or rules for how notices can be filed. Because these numbers are ridiculous.