Daily Deal: Pay What You Want Certified White... >>
<< AT&T Is Very Excited To Try And Ruin HBO
 tdicon 

Copyright

by Mike Masnick

Tue, Jul 10th 2018 9:51am


Filed Under:
algorithms, censorship, censorship machines, copyright, copyright directive, eu, eu copyright directive, false positives, filters, type 1 errors, type 2 errors



A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship

from the it's-not-good-folks dept

One of the key talking points from those in favor of Article 13 in the EU Copyright Directive is that people who claim it will lead to widespread censorship are simply making it up. We've explained many times why this is untrue, and how any time you put in place a system for taking down content, tons of perfectly legitimate content gets caught up in it. Some of this is from malicious takedowns, but much of it is just because algorithms make mistakes. And when you make mistakes at scale, bad things happen. Most of you are familiar with the concept of "Type 1" and "Type 2" errors in statistics. These can be more simply described as false positives and false negatives. Over the weekend, Alec Muffett decided to put together a quick "false positive" emulator to show how much of an impact this would have at scale and tweeted out quite a thread, that has since been un-threaded into a webpage for easier reading. In short, at scale, the "false positive" problem is pretty intense. A ton of non-infringing content is likely to get swept up in the mess.

Using a baseline of 10 million piece of content and a much higher than reality level of accuracy (99.5%), and an assumption that 1 in 10,000 items are "bad" (i.e., "infringing") you end up with a ton of legitimate content taken down to stop just a bit of infringement:

So basically in an effort to stop 1,000 pieces of infringing content, you'd end up pulling down 50,000 pieces of legitimate content. And that's with an incredible (and unbelievable) 99.5% accuracy rate. Drop the accuracy rate to a still optimistic 90%, and the results are even more stark:

Now we're talking about pulling down one million legitimate, non-infringing pieces of content in pursuit of just 1,000 infringing ones (many of which the system still misses).

Of course, I can hear the howls from the usual crew, complaining that the 1 in 10,0000 number is unrealistic (it's not). Lots of folks in the legacy copyright industries want to pretend that the only reason people use big platforms like YouTube and Facebook is to upload infringing material, but that's laughably wrong. It's actually a very, very small percentage of such content. And, remember, of course, Article 13 will apply to basically any platform that hosts content, even ones that are rarely used for infringement.

But, just to humor those who think infringement is a lot more widespread than it really is, Muffett also ran the emulator with a scenario in which 1 out of every 500 pieces of content are infringing and (a still impossible) 98.5% accuracy. It's still a disaster:

In that totally unrealistic scenario with a lot more infringement than is actually happening and with accuracy rates way above reality, you still end up pulling down 150,000 non-infringing items... just to stop less than 20,000 infringing pieces of content.

Indeed, Muffett then figures out that with a 98.5% accuracy rate, if a platform has 1 in 67 items as infringing, at that point you'll "break even" in terms of the numbers of non-infringing content (147,000) that is caught by the filter, to catch an equivalent amount of infringing content. But that still means censoring nearly 150,000 pieces of non-infringing content.

This is one of the major problems that people don't seem to comprehend when they talk about filtering (or even human moderating) content at scale. Even at impossibly high accuracy rates, a "small" percentage of false positives leads to a massive amount of non-infringing content being taken offline.

Perhaps some people feel that this is acceptable "collateral damage" to deal with the relatively small amount of infringement on various platforms, but to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality.

12 Comments | Leave a Comment

If you liked this post, you may also be interested in...

Reader Comments

The First Word

And in reality, the legacy industry will demand that the filters are changed to capture the few false negatives, even if it means ten times more false positive,
—Anonymous Coward
made the First Word by Ninja

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Coward, 10 Jul 2018 @ 10:23am

    yeah but that stuff being censored won't be our stuff cause we're gonna sell it on CD and DVD
    - the IAA's

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 10 Jul 2018 @ 10:28am

    And in reality, the legacy industry will demand that the filters are changed to capture the few false negatives, even if it means ten times more false positive,

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 10 Jul 2018 @ 10:35am

    the carpetbombing incentive

    Since many of the takedown artists reporting supposedly-infringing content are companies for hire, it's to their advantage to set up their key-word algorithms cast a very wide net and cause as much "collateral damage" as possible, since there is essentially no penalty, while many benefits to reap by showing to their clients an apparently huge work output of "infringing" takedowns that required very little time and effort to produce.

    A casual glance at the Chilling Effects/Lumen database will easily show that many of the named page links are sloppily concocted keyword searches that don't even link to the actual content they claim to, and in many cases use long lists of keyword searches that have no perceptible relationship to the protected content.

    reply to this | link to this | view in chronology ]

  • identicon
    Carlie Coats, 10 Jul 2018 @ 10:41am

    Just to be fair...

    False takedown notices should be subject to the same penalties as copyright infringement.

    IMNHO.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 10 Jul 2018 @ 10:51am

      Re: Just to be fair...

      Honestly, if they are not, I plan on obtaining rights to something obscure and filing takedown with literally everyone, on everything. Let them deal with having their accounts locked due to too many violations.

      reply to this | link to this | view in chronology ]

  • icon
    Ninja (profile), 10 Jul 2018 @ 11:02am

    Collateral damage? I can't see it under the pile of my own greed. - MAFIAA

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 10 Jul 2018 @ 11:55am

      Re:

      Sure, we might lose a million legitimate works of art, but it's not as if copyright exists to promote the creation of new works. How many of the affected individuals would even know they have a copyright? OTOH we've gotta protect the God-given right of corporations to profit.

      reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 10 Jul 2018 @ 11:42am

    I don't know if this was posted before... canadian cpa's comercial about *IAA taking advice.

    https://www.youtube.com/watch?v=xknM7g9a7-g

    reply to this | link to this | view in chronology ]

  • icon
    ECA (profile), 10 Jul 2018 @ 12:16pm

    MORE WORK FOR THE WICKED?

    So now this is another way to get Server farms to watch over and EDIT THINGS??
    So what is going to happen?
    ASK YOUTUBE.. Go out and look at EVERY video? and see if it infringes? Or TAKE DOWN and dont care?? and deal with the SIMPLE CONSUMERS??
    Anyone got a phone number to youtube/google??
    (NOW you know why they dont have a direct phone number)
    (press one to talk to another computer)

    DO YOU REALLY want to create JOBS?? LET humans do the checking and verification of ALL DATA on the net.
    ENFORCE THAT and we will NEVER run out of jobs..

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 10 Jul 2018 @ 12:51pm

      Re: MORE WORK FOR THE WICKED?

      Your style of alternating quickly between talking and yelling comes across as severely bipolar. And paragraphs are a thing, have been for a very long time.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 10 Jul 2018 @ 2:38pm

        Re: Re: MORE WORK FOR THE WICKED?

        Yeah, ECA has a... unique style. And yet somehow manages to be more coherent on the whole than some other notable visitors. xD

        reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 10 Jul 2018 @ 12:49pm

    'to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality'

    but this is exactly what the entertainment and copyright industries want. remember, they thrive on make believe, on made up stuff, not on reality and expect their way of thinking to be the only way of thinking. they wont be happy until they have got complete control of the best media distribution platform on the planet at the moment. everything they have condemned to date will magically become the best thing since sliced bread, simply because they will be able to use it themselves how they want, for what they want and CHARGE for that use! and be prepared to pay more than high street prices for media downloaded, even though you'll be using YOUR broadband connection, you device(s), your disks, your software, your burner and your printer. the cost to you will escalate considerably while there costs will diminish. and you will need permission and have to pay fees to get to the sites to download the stuff!!

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Daily Deal: Pay What You Want Certified White... >>
<< AT&T Is Very Excited To Try And Ruin HBO
 tdicon 
Follow Techdirt
Techdirt Gear
Shop Now: Copying Is Not Theft
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories

Tuesday

15:44 Fake News Is A Meaningless Term, And Our Obsession Over It Continues To Harm Actual News (1)
13:30 Techdirt Podcast Episode 173: Sci-Fi & Scenario Planning (0)
12:03 FBI Decides To Ruin A Man's Life Over Nude Photos Of His Legal Girlfriend He Took Seven Years Ago (21)
10:46 SCOTUS Nominee Kavanaugh Bought Verizon's Silly Argument That Breaking Net Neutrality Is A 1st Amendment Right (33)
10:41 Daily Deal: Pay What You Want Certified White Hat Hacker Bundle (0)
09:51 A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship (12)
06:21 AT&T Is Very Excited To Try And Ruin HBO (62)
03:23 Inspector General: ICE Detention Facility Inspections Are A Joke (35)

Monday

19:36 Elsevier Will Monitor Open Science In EU Using Measurement System That Favors Its Own Titles (5)
15:39 Court Compares Car Crash Data To CSLI, Cellphone Contents; Tells Cops Best Bet Is To Always Get A Warrant (7)
More arrow
Advertisement
Report this ad  |  Hide Techdirt ads

Close

Email This

This feature is only available to registered users. Register or sign in to use it.