Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Content Moderation Case Study: Using Hashes And Scanning To Stop Cloud Storage From Being Used For Infringement (2014)

from the cloud-storage-scanning dept

Summary: Since the rise of the internet, the recording industry has been particularly concerned about how the internet can and will be used to share infringing content. Over time, the focus of that concern has shifted as the technology (as well as copyright laws) have shifted. In the early 2000s, most of the concern was around file sharing applications, services and sites, such as Napster, Limewire, and The Pirate Bay. However, after 2010, much of the emphasis switched to so-called ?cyberlockers.?

Unlike file sharing apps, that involved person-to-person sharing directly from their own computers via intermediary technologies, a cyberlocker was more of a hard drive on the internet. The issue was that some would store large quantities of music files, and then make them available for unlicensed downloading.

While some cyberlockers were built directly around this use-case, at the same time, cloud storage companies were trying to build legitimate businesses, allowing consumers and businesses to store their own files in the cloud, rather than on their own hard drive. However, technologically, there is little to distinguish a cloud storage service from a cyberlocker, and as the entertainment industry became more vocal about the issue, some services started to change their policies.

Dropbox is one of the most well-known cloud storage companies. Wishing to avoid facing comparisons to cyberlockers built off of the sharing of infringing works, the company put in place a system to make it more difficult to use the service for sharing works in an infringing manner, while still allowing the service to be useful for storing personal files.

Specifically, if Dropbox received a DMCA takedown notice for a specific file, the company would create a hash (a computer generated identifier that would be the same for all identical files), and then if you shared any file from your Dropbox to someone else (such as by creating a shareable link), Dropbox would create a hash and check it against the database of hashes of files that had previously received DMCA takedown notices.

This got some attention in 2014 when a user on Twitter highlighted that he had been blocked from sharing a file because of this, raising concerns that Dropbox was looking at everyone?s files.

Dropbox quickly clarified that it is not scanning every file, nor was it looking at everyone?s files. Rather it was using an automated process to check files that were being shared and see if they matched files that had previously been subject to a DMCA takedown notice:

?There have been some questions around how we handle copyright notices. We sometimes receive DMCA notices to remove links on copyright grounds. When we receive these, we process them according to the law and disable the identified link. We have an automated system that then prevents other users from sharing the identical material using another Dropbox link. This is done by comparing file hashes. We don?t look at the files in your private folders and are committed to keeping your stuff safe.?

Decisions to be made by Dropbox:

  • How proactive does the company need to be to remain on the compliant side of copyright law?
  • Will blocking sharing of files that might be shared for non-infringing purposes, make the service less useful to users?
  • What steps are necessary to avoid being accused of supporting infringement by traditional copyright industries?

Questions and policy implications to consider:

  • There may be legitimate, non-infringing reasons to share a file that in other contexts may be infringing.
  • Is it appropriate for a company to block that possibility?
  • What measures could be put in place to allow for those possibilities?
  • The recording and movie industries have a history of being aggressive litigants against technologies used for infringement. What level of response is appropriate for new startups and technology companies?
  • Will there be limitations on innovation to services like cloud storage imposed by the need to avoid angering certain industries?

Resolution: Dropbox has continued to use a similar setup, and for the most part has avoided being compared to traditional cyberlockers. Since 2014, the issue of DMCA takedowns leading to future blocking of files has not received all that much attention either. There have been a few articles and forum discussions about how it works, with some users looking for workarounds, but for the most part this technological setup appears to have prevented Dropbox from being considered a cyberlocker-style site for infringing file sharing.

Originally published on the Trust & Safety Foundation website.

Filed Under: , , , , , ,
Companies: dropbox

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Content Moderation Case Study: Using Hashes And Scanning To Stop Cloud Storage From Being Used For Infringement (2014)”

Subscribe: RSS Leave a comment
Anonymous Coward says:

Re: Re:

1) If you’re running a site which regularly gets large amounts of child porn, you’re either big enough to do something in-house (which doesn’t have Cloudflare invade your privacy), or you’re running your site horribly, horribly wrong.

2) I really don’t like these sorts of tools / scanners. Wikipedia got blocked in the United Kingdom in the past, because they had an image of a naked child on a band cover from decades ago. There’s also the issue of cartoon images. Even if their filters aren’t that bad now, filters usually converge on censoring everything.

3) As detailed in some articles, criminals can trivially bypass these filters with simple modifications to content. They also don’t work on video. They provide a false sense of security, and are often overstated as a panacea.

4) These filters are curated by a non-transparent and unaccountable non-profit organization (NCMEC), which has supposedly gotten a teenager in Costa Rica arrested for posting a cartoon image on her blog.
An European free speech organization, Article19, and two other organizations wrote a letter to them, and a collaborative body they’re part of (INHOPE) to try to dissuade them from performing such actions in the future.

JoeCool (profile) says:

Similar situation

I had a similar situation with MediaFire. After FileDen shut down, I switched to MediaFire as my primary cyberlocker for sharing (only files I can share… nothing illegal). MediaFire took down one of my files when they got a DMCA notice that was clearly just doing partial mapping to file names. I was sharing a PSP port of the open source emulator Basilisk, while the file was flagged as being the movie Basilisk. I changed the name of the file and reupped it. Lesson – give your files names that can’t possibly match to anything commercial less the stupid bots used for sending DMCA notices take notice.

Anonymous Coward says:

Re: Similar situation

Lesson – give your files names that can’t possibly match to anything commercial

Lesson: Bust up the common language into an ever-shrinking public domain for everyone — and an ever-growing list of reserved words owned by unthinking media corps and their lawyer-bot armies under a bastard quasi-trademark regime!

Intellectual property. Rule of law.

Anonymous Coward says:

Re: Re: Similar situation

More like rule of thought. Basic language is a requirement to communicate.

Though I’m sure IP maximalists would love being able to claim that they own every possible sentence that could ever be produced. The sad reality for them is that our society has a need to communicate to function properly.

IP needs to go the way of the dodo. It’s nothing more than a tool to oppress and stymie at this point.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Older Stuff
15:43 Content Moderation Case Study: Facebook Struggles To Correctly Moderate The Word 'Hoe' (2021) (21)
15:32 Content Moderation Case Study: Linkedin Blocks Access To Journalist Profiles In China (2021) (1)
16:12 Content Moderation Case Studies: Snapchat Disables GIPHY Integration After Racist 'Sticker' Is Discovered (2018) (11)
15:30 Content Moderation Case Study: Tumblr's Approach To Adult Content (2013) (5)
15:41 Content Moderation Case Study: Twitter's Self-Deleting Tweets Feature Creates New Moderation Problems (2)
15:47 Content Moderation Case Studies: Coca Cola Realizes Custom Bottle Labels Involve Moderation Issues (2021) (14)
15:28 Content Moderation Case Study: Bing Search Results Erases Images Of 'Tank Man' On Anniversary Of Tiananmen Square Crackdown (2021) (33)
15:32 Content Moderation Case Study: Twitter Removes 'Verified' Badge In Response To Policy Violations (2017) (8)
15:36 Content Moderation Case Study: Spam "Hacks" in Among Us (2020) (4)
15:37 Content Moderation Case Study: YouTube Deals With Disturbing Content Disguised As Videos For Kids (2017) (11)
15:48 Content Moderation Case Study: Twitter Temporarily Locks Account Of Indian Technology Minister For Copyright Violations (2021) (8)
15:45 Content Moderation Case Study: Spotify Comes Under Fire For Hosting Joe Rogan's Podcast (2020) (64)
15:48 Content Moderation Case Study: Twitter Experiences Problems Moderating Audio Tweets (2020) (6)
15:48 Content Moderation Case Study: Dealing With 'Cheap Fake' Modified Political Videos (2020) (9)
15:35 Content Moderation Case Study: Facebook Removes Image Of Two Men Kissing (2011) (13)
15:23 Content Moderation Case Study: Instagram Takes Down Instagram Account Of Book About Instagram (2020) (90)
15:49 Content Moderation Case Study: YouTube Relocates Video Accused Of Inflated Views (2014) (2)
15:34 Content Moderation Case Study: Pretty Much Every Platform Overreacts To Content Removal Stimuli (2015) (23)
16:03 Content Moderation Case Study: Roblox Tries To Deal With Adult Content On A Platform Used By Many Kids (2020) (0)
15:43 Content Moderation Case Study: Twitter Suspends Users Who Tweet The Word 'Memphis' (2021) (10)
15:35 Content Moderation Case Study: Time Warner Cable Doesn't Want Anyone To See Critical Parody (2013) (14)
15:38 Content Moderation Case Studies: Twitter Clarifies Hacked Material Policy After Hunter Biden Controversy (2020) (9)
15:42 Content Moderation Case Study: Kik Tries To Get Abuse Under Control (2017) (1)
15:31 Content Moderation Case Study: Newsletter Platform Substack Lets Users Make Most Of The Moderation Calls (2020) (8)
15:40 Content Moderation Case Study: Knitting Community Ravelry Bans All Talk Supporting President Trump (2019) (29)
15:50 Content Moderation Case Study: YouTube's New Policy On Nazi Content Results In Removal Of Historical And Education Videos (2019) (5)
15:36 Content Moderation Case Study: Google Removes Popular App That Removed Chinese Apps From Users' Phones (2020) (28)
15:42 Content Moderation Case Studies: How To Moderate World Leaders Justifying Violence (2020) (5)
15:47 Content Moderation Case Study: Apple Blocks WordPress Updates In Dispute Over Non-Existent In-app Purchase (2020) (18)
15:47 Content Moderation Case Study: Google Refuses To Honor Questionable Requests For Removal Of 'Defamatory' Content (2019) (25)
More arrow