Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot

from the overplaying-their-hand dept

Google, being the search giant that it is, has been banging the drum for some time about the silly way the DMCA has been abused by those that wield it like a cudgel. Here at Techdirt, we too have described the many ways that the well-intentioned DMCA and the way its implemented by service providers has deviated from its intended purpose. Still, the vast majority of our stories discuss deliberate attempts by human beings to silence critics and competition using the takedown process. Google, on the other hand, has been far more focused on statistics for DMCA takedown notices that show wanton disregard for what it was supposed to be used for entirely. That makes sense of course, as the abuse of the takedown process is a burden on the search company. In that first link, for instance, Google noted that more than half the takedown notices it was receiving in 2009 were mere attempts by one business targeting a competitor, while over a third of the notices contained nothing in the way of a valid copyright dispute.

But if those numbers were striking in 2009, Google’s latest comment to the Copyright Office (see our own comment here) on what’s happening in the DMCA 512 notice-and-takedown world shows some stats for takedown notices received through its Trusted Copyright Removal Program… and makes the whole ordeal look completely silly.

A significant portion of the recent increases in DMCA submission volumes for Google Search stem from notices that appear to be duplicative, unnecessary, or mistaken. As we explained at the San Francisco Roundtable, a substantial number of takedown requests submitted to Google are for URLs that have never been in our search index, and therefore could never have appeared in our search results. For example, in January 2017, the most prolific submitter submitted notices that Google honored for 16,457,433 URLs. But on further inspection, 16,450,129 (99.97%) of those URLs were not in our search index in the first place. Nor is this problem limited to one submitter: in total, 99.95% of all URLs processed from our Trusted Copyright Removal Program in January 2017 were not in our index.

Now, because Google is Google, the company doesn’t generally have a great deal of sympathy hoisted upon it by the public, never mind by copyright protectionists. But, come on, this is simply nuts. When the number of claims coming through the system that don’t even pertain to listed results by Google can be logically rounded up to 100%, that’s putting a burden on a company for no valid reason whatsoever. Even if you hate Google, or distrust it, it should be plain as day that it’s unfair for it to have to wade through all this muck just to appease the entertainment industries.

And, it’s important to note that this isn’t all of the notices received, but just those coming through the Trusted Copyright Removal system — meaning that these are organizations that supposedly are supposed to have at least some credibility not to be submitting totally bogus notices. But, apparently, they don’t actually give a damn.

The problem, as you may have already guessed, is that most of these claims are being generated through automated systems designed to shotgun-blast DMCA notices with reckless abandon.

These numbers of simply staggering with only a tiny number of millions of requests reflecting actual pages in the search index. Rather, 99.95% of the processed URLs from Google’s trusted submitter program are machine-generated URLs that do not involve actual pages in the search index. Given that data, Google notes that claims that the large number of requests correlates to infringing content on the Internet is incorrect:

Nor is the large number of takedown requests to Google a good proxy even for the volume of infringing material available on the Internet. Many of these submissions appear to be generated by merely scrambling the words in a search query and appending that to a URL, so that each query makes a different URL that nonetheless leads to the same page of results.

The claim by the entertainment industry that one can see what a problem piracy is by looking at the sheer volume of DMCA notices sent to search engines shall hereby be declared dead, having been buried by the industry’s fellow takedown-notice-filers. That claim never made much sense, but these stats sever any link between takedown notice numbers and actual piracy completely. And there needs to be a remedy for this, whether its punishment upon the abusers or rules for how notices can be filed. Because these numbers are ridiculous.

Filed Under: , , , , , , , ,
Companies: google

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot”

Subscribe: RSS Leave a comment
Anonymous Coward says:

Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

This is the STATISTICS in “lies, damn lies, and statistics”, on a level with Tiger Repellent that’s 100% effective because haven’t seen any tigers since using it.

Means ZERO if Google hasn’t indexed it. If a pirate site puts “norobots.txt” up, then presumbably Google doesn’t index it! To claim that this proves anything is the “damn lies” part of the above phrase.

Nonetheless, we all know that piracy is going on.

FesteringPussPocket says:

Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

Uh… hold up.

There’s absolutely NO pirating going on here.

Pirating requires use of Ships at sea, helmed by bearded peg-legged Somalians with slingshots.

File-sharing is NOT nor has it ever been, pirating.

People who file-share are the same folks that recorded music off the radio, converted vinyl to 8-track or cassette.
They recorded movies off of HBO and handed them out to friends and family that didn’t have HBO.
They bought DVDs or Blu-Rays and then ripped them to recordable media so that their kids wouldn’t destroy the originals.
They opted to watch a rip or cam-cord version of a movie to see if it met the hype before spending a 4th of a day’s wages for a trip to the movie theater.

In any case, file-sharing has caused exactly $0.00 loss for the entire Movie and Music industry globally.

In most cases, those industry’s profits would actually be smaller if it weren’t for the file-sharing exposing people to content they wouldn’t otherwise see.

Thanks for playing “What lies, damned lies and double-damned lies, will the Jurassic media content industry say next?”

Jason says:

Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

Are you sure you read the whole article? The entire point was that “99.95% of all URLs” that the “trusted” submitter sent in for removal from Google’s search results were not in their index, and therefore “could never have appeared” in search results in the first place.

Whether the site of interest manually excludes themselves from Google’s indexing is irrelevant… the supposedly trustworthy requesting parties are overwhelmingly flooding Google with invalid takedown requests. Is Google supposed to de-list URLs that they never listed in the first place?

Anonymous Coward says:

Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

You forgot your “Football” pseudonym, jackass.

Not that it matters. It’s the same out_of_the_blue bullshit. Did you finally decide to take a whiff of oxygen after having your lips surgically attached to Cary Sherman’s phallus?

DannyB (profile) says:

Force the problem back on the DMCA filers

If processing of DMCA gets slower and slower, how can they possibly complain?

Google could undoubtedly produce millions of bogus requests that could fill hundreds of boxes on the docket of a court challenge. If the other side or the court would object to how burdensome this is, then Google could ask one to consider that this is just a sampling, and imagine how burdensome it is for Google. It is objectively unreasonable that Google could have infinite resources and infinite processing speeds for increasing bogus DMCA requests.

The court needs to set a precedent. The legislators need to fix the broken DMCA to impose a statutory penalty for every bogus DMCA. And the “legitimate” DMCA filers, if there even is such a thing, need to get behind this, since it is in their interest for Google to be able to process these hypothetical “legitimate” DMCA takedowns.

FesteringPussPocket says:

Re: Force the problem back on the DMCA filers

Here’s the solution to the bogus DMCA problem.
File a bogus claim, the copyright for the “claimed” media is immediately placed into public domain, if the claimant actually owned the copyright in the first place. Once placed into public domain, nobody can ever claim copyright on the content ever again.

If the claimant did not own the copyright, then the claimant owes 10 times what a copyright violation costs for each and every one of the invalid claims. A nice little 1.5 million per violation, that should dry up the “bogus” DMCA spammers.

Oh, now there’s a thought.
Since the submissions are being placed over the internet, wouldn’t that make it “wire-fraud”?
And if the entire RIAA/MPAA groups are doing this, then they can ALL be charged for each and every bogus claim (wire-fraud), where each invalid claim is a count of the wire-fraud.

Let’s get us some MPAA/RIAA executives to sit in prison for hundreds if not thousands of lifetimes because of the egregiously large number of false claims.

I think that would also suffice for malicious intent behind the false claims.

Jinxed (profile) says:

I may be conflating two separate issues in my post, but this troubling more for Google, but to those who use Google services who have been booted for “copyright infringement”.

What’s missing from the Td article is how much of these bogus claims go against users, doing nothing more than providing videos (mostly under Fair Use).

Even if this assessment conflates two separate issues, the reality is bogus DMCA takedowns affect everyone, at some point.

In 1990, when I was first introduce to the ramifications of copyright and software, I tried my best to voice my opposition at the vague threats issued by the entertainment industry, but could do nothing but what the knee-jerk protectionism of our government pass a bad bill into law.

I now fully understand why it’s called “Checks” and balances.

Not an Electronic Rodent (profile) says:

Re: Re:

What’s missing from the Td article is how much of these bogus claims go against users, doing nothing more than providing videos (mostly under Fair Use).

This is a good point – it’d be an interesting ancillary statistic to see how much of the 0.05% of valid URLs are actually anywhere close to valid copyright claims. I’m guessing that, even if you left in anything that even might be valid if you tilt your head and squint really hard, you’d struggle to rise above 0.025% valid notices.

Steve Carr (user link) says:

search engine

Freedom of speech and freedom of the internet, that net neutral was a way for the government to get there greedy hands on the internet. Stop the Government from spying on everybody. Use the search engine that does not change its results for political reasons and respects your privacy, just good old fashion results that are not tracked. Have a great day

Anonymous Coward says:

Re: search engine

Thanks for the heads up, I just tried it! Searched for a business in the area that I needed info on. Lookseek had no results. Same search in Google pointed me to their manta, yellowpages, whitepages, yelp, buzzfile, a local business listing page, and enough other relevant pages to fill the first results screen.

Unfortunately I don’t have the luxury of using a search engine that doesn’t work, so… back to Google.

Anonymous Coward says:

How is this legal?

I just dont understand how or why google hasnt tried to go to court over this. Especially in the realm of something lime youtube, where the constant push for automated takedown regularly removes legitimate parody and commentary.

I mean seriously, the DMCA process is supposed to have protections to prevent fraudulent claims, right? And iirc you do have to show it was willfull, right? How is an almost 100% rate on literally millions of notices being unapplicable anything BUT willful?

That One Guy (profile) says:

Re: How is this legal?

Because those ‘protections’ have more holes than a target at a gun range hosting a ‘Free bullets’ day, to the point that it is effectively impossible for them to trigger, barring the accused literally admitting in court that they knew that they were filing a bogus DMCA claim and did it anyway, and even then I wouldn’t put good odds on their being punished to any real extent.

The fact that the law theoretically requires a statement made under perjury, and bots, which cannot do so are allowed to send DMCA claims should be all the demonstration you need to show how pathetic the ‘protections to prevent fraudulent claims’ are.

The law was meant from the get-go to be entirely one-sided, it’s ‘legal’ because it’s working as intended.

James Brandes - Digital Copyright Consultancy says:

There is undoubtedly a major issue with DMCA abuse concerning Google.

The major problem is that Google are deluged with DMCA Notices every single day and have to cut corners (usually via automation as well). This is then exacerbated by the fact that many anti-piracy agencies work for hundreds of clients (they wouldn’t be profitable otherwise) and thus have to rely on automated bots that dynamically generate URLs.

As long as the site is already known as a pirate site/on a blacklist, it’s automatically approved for deletion from Google’s search index. Ultimately, this means that a DMCA Notice could be erroneous and yet the content is still removed. This happens very frequently.

James Brandes – Digital Copyright Consultancy

ARIO says:

just tell me what anges of ip i need to block

I own one of those bad sites !!
and dmca bots arre driving me crazy

do you know what is dmca bot?
amazon/google/… bot?

what I need to limit to get less dmca complains?

of course I will put this as freelancer project but maybe you guys have any idea

I have a site which posts every single scene release plus all the Important encodes from ipt bots to rm bots

I post 2 to 2.5 K/day different files
to 5 different hosts

You can guess amount of dmca reports I get/day?!

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop ยป

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...