A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship

from the it's-not-good-folks dept

One of the key talking points from those in favor of Article 13 in the EU Copyright Directive is that people who claim it will lead to widespread censorship are simply making it up. We’ve explained many times why this is untrue, and how any time you put in place a system for taking down content, tons of perfectly legitimate content gets caught up in it. Some of this is from malicious takedowns, but much of it is just because algorithms make mistakes. And when you make mistakes at scale, bad things happen. Most of you are familiar with the concept of “Type 1” and “Type 2” errors in statistics. These can be more simply described as false positives and false negatives. Over the weekend, Alec Muffett decided to put together a quick “false positive” emulator to show how much of an impact this would have at scale and tweeted out quite a thread, that has since been un-threaded into a webpage for easier reading. In short, at scale, the “false positive” problem is pretty intense. A ton of non-infringing content is likely to get swept up in the mess.

Using a baseline of 10 million piece of content and a much higher than reality level of accuracy (99.5%), and an assumption that 1 in 10,000 items are “bad” (i.e., “infringing”) you end up with a ton of legitimate content taken down to stop just a bit of infringement:

So basically in an effort to stop 1,000 pieces of infringing content, you’d end up pulling down 50,000 pieces of legitimate content. And that’s with an incredible (and unbelievable) 99.5% accuracy rate. Drop the accuracy rate to a still optimistic 90%, and the results are even more stark:

Now we’re talking about pulling down one million legitimate, non-infringing pieces of content in pursuit of just 1,000 infringing ones (many of which the system still misses).

Of course, I can hear the howls from the usual crew, complaining that the 1 in 10,0000 number is unrealistic (it’s not). Lots of folks in the legacy copyright industries want to pretend that the only reason people use big platforms like YouTube and Facebook is to upload infringing material, but that’s laughably wrong. It’s actually a very, very small percentage of such content. And, remember, of course, Article 13 will apply to basically any platform that hosts content, even ones that are rarely used for infringement.

But, just to humor those who think infringement is a lot more widespread than it really is, Muffett also ran the emulator with a scenario in which 1 out of every 500 pieces of content are infringing and (a still impossible) 98.5% accuracy. It’s still a disaster:

In that totally unrealistic scenario with a lot more infringement than is actually happening and with accuracy rates way above reality, you still end up pulling down 150,000 non-infringing items… just to stop less than 20,000 infringing pieces of content.

Indeed, Muffett then figures out that with a 98.5% accuracy rate, if a platform has 1 in 67 items as infringing, at that point you’ll “break even” in terms of the numbers of non-infringing content (147,000) that is caught by the filter, to catch an equivalent amount of infringing content. But that still means censoring nearly 150,000 pieces of non-infringing content.

This is one of the major problems that people don’t seem to comprehend when they talk about filtering (or even human moderating) content at scale. Even at impossibly high accuracy rates, a “small” percentage of false positives leads to a massive amount of non-infringing content being taken offline.

Perhaps some people feel that this is acceptable “collateral damage” to deal with the relatively small amount of infringement on various platforms, but to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality.

Filed Under: , , , , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship”

Subscribe: RSS Leave a comment
31 Comments
Anonymous Coward says:

the carpetbombing incentive

Since many of the takedown artists reporting supposedly-infringing content are companies for hire, it’s to their advantage to set up their key-word algorithms cast a very wide net and cause as much “collateral damage” as possible, since there is essentially no penalty, while many benefits to reap by showing to their clients an apparently huge work output of “infringing” takedowns that required very little time and effort to produce.

A casual glance at the Chilling Effects/Lumen database will easily show that many of the named page links are sloppily concocted keyword searches that don’t even link to the actual content they claim to, and in many cases use long lists of keyword searches that have no perceptible relationship to the protected content.

Anonymous Coward says:

Re: Re: Just to be fair...

Personally I favor taking it a step further. Make them have to sign and prove they have the rights to what they claim to have first. If they get three strikes that means that the ownership of the copyright is given to public domain as punishment and there is a carte blanche on reproduction on all of their works. They abused their privilege of copyright now they have lost it completely.

That would make them ‘get a team of lawyers before every filing’ cautious and that is very much a good thing.

ECA (profile) says:

MORE WORK FOR THE WICKED?

So now this is another way to get Server farms to watch over and EDIT THINGS??
So what is going to happen?
ASK YOUTUBE.. Go out and look at EVERY video? and see if it infringes? Or TAKE DOWN and dont care?? and deal with the SIMPLE CONSUMERS??
Anyone got a phone number to youtube/google??
(NOW you know why they dont have a direct phone number)
(press one to talk to another computer)

DO YOU REALLY want to create JOBS?? LET humans do the checking and verification of ALL DATA on the net.
ENFORCE THAT and we will NEVER run out of jobs..

Anonymous Coward says:

‘to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality’

but this is exactly what the entertainment and copyright industries want. remember, they thrive on make believe, on made up stuff, not on reality and expect their way of thinking to be the only way of thinking. they wont be happy until they have got complete control of the best media distribution platform on the planet at the moment. everything they have condemned to date will magically become the best thing since sliced bread, simply because they will be able to use it themselves how they want, for what they want and CHARGE for that use! and be prepared to pay more than high street prices for media downloaded, even though you’ll be using YOUR broadband connection, you device(s), your disks, your software, your burner and your printer. the cost to you will escalate considerably while there costs will diminish. and you will need permission and have to pay fees to get to the sites to download the stuff!!

Anonymous Coward says:

Re: The real numbers

That’s the thing though, they aren’t actually too far off. Copyright laws are written such that nearly everything is automatically granted a copyright from the moment of its creation, which makes most user generated content (with the exception of short comments like these) be either 1) infringing on existing copyright or 2) newly copyrighted works. Everything really is copyrighted, even if that is both irrelevant and ignored for almost everything.

Anonymous Coward says:

Re: Re: The real numbers

The original joke aside, there is a huge difference between “everything is infringing” and “everything is copyrighted”. Neither of which are true.

A factual statement isn’t copyrightable (theoretically). An analysis of a copyrighted work may be fair use, and so not infringing. Neither of which are amenable to an automated filter.

That One Guy (profile) says:

"How much of that is ours?" "...very little?" "Don't care then."

While it’s worthwhile to highlight the massive negative impact on speech and content in the flailing about to get those dastardly infringers, the problem is that the ones pushing such plans almost certainly do not care.

They’ll see a hundred and fifty thousand ‘innocent’ posts killed for every twenty thousand that are actually infringing and completely ignore the first number, caring only that twenty thousand infringing posts were removed.

After all, if the content isn’t their’s, and there is no penalty for false accusations or removal, then why would they care?

That One Guy (profile) says:

Re: Re: "How much of that is ours?" "...very little?" "Don't care then."

‘One rule for me, and another for thee’ I suspect, where claims made against any large group(political, entertainment, what have you) is given the benefit of the doubt and treated differently than any content flagged that was put up by the rabble. Not to mention a large platform is much more likely to pay attention to a counter-notice made by one of them, such that any incorrect takedowns is likely to only be in effect for a short amount of time.

It does happen occasionally though, and it tends to be downright hilarious when it does as the ones pushing for filters that they insist are ‘easy’ suddenly find themselves on the recieving end of those filters.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...