The Story Behind Facebook Threatening To Sue Developer Into Oblivion For Highlighting Useful Facebook Data

from the how-nice-of-them dept

Facebook’s lawyers have been getting pretty nasty lately. We recently covered the company’s threats against the creator of a useful Greasemonkey script, and now a developer named Pete Warden has shared the sordid details of his legal run-in with Facebook — where they threatened to sue him for his activity aggregating publicly available data found on Facebook.

You should read the full story, but basically, he built a simple crawler for public Facebook info, initially for his own purposes. He made sure that Facebook’s robots.txt didn’t block such crawlers — and he also emailed someone at Facebook (who he had dealt with before), but didn’t hear back from anyone. As his crawler worked, it started collecting a bunch of interesting data, and so he set up a website to let people explore some of this (again, public) data.

After playing with some of the data himself, he started making some interesting maps and charts with the data, and did a simple analysis of geographic locations of Facebook friend connections to show people what you could do with the data. He noted that if others (such as professional researchers) wanted to dig into the data, he would let them access a version of the data set (with identifying info stripped). The chart he released got picked up by a variety of sites and quickly got passed around.

And that’s when the lawyers called:

On Sunday around 25,000 people read the article, via YCombinator and Reddit. After that a whole bunch of mainstream news sites picked it up, and over 150,000 people visited it on Monday. On Tuesday I was hanging out with my friends at Gnip trying to make sense of it all when my cell phone rang. It was Facebook’s attorney.

He was with the head of their security team, who I knew slightly because I’d reported several security holes to Facebook over the years. The attorney said that they were just about to sue me into oblivion, but in light of my previous good relationship with their security team, they’d give me one chance to stop the process. They asked and received a verbal assurance from me that I wouldn’t publish the data, and sent me on a letter to sign confirming that. Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission.

Mathew Ingram reported on the data getting forced down, and got a statement from Facebook that seems to miss the point:

Andrew Noyes, manager of public policy communications at Facebook, said in an email that Warden “aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others.” Noyes also noted that Facebook’s statement of rights and responsibilites says that users agree not to collect users’ content or information “using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.”

But I still don’t see what the legal argument is. At best, I could see them terminating his account for disobeying the terms of service — but even then the whole thing doesn’t make much sense. The data is publicly available and, as Peter notes, it’s pretty much standard practice for people to aggregate and analyze such data. However, he also pointed out that he couldn’t afford to be a legal test case, and so he gave in and negotiated with Facebook to remove the data.

In the end, though, this shows Facebook’s rather schizophrenic view towards data and privacy. On the one hand, it tries to push everyone to open up their info, but then if anyone does anything useful with it, they threaten to sue?

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “The Story Behind Facebook Threatening To Sue Developer Into Oblivion For Highlighting Useful Facebook Data”

Subscribe: RSS Leave a comment
Richard (profile) says:

since forever

weeeelll this is an old battle, remember the aggregators of the late 90’s? hotspot err altavista (before Fast bot) and so many others I cant remember. I don’t really know what happened with any of those cases except that companies were sued out of business. The serach engines (also aggregators) lasted and the others ended up on the penny exchange. I think the tried and true “sued into oblivion” strategy is the real story here. I mean, thats a massive failure of the legal system. It’s denying justice to the poor and thats unconstitutional.

Beta says:

I know logic isn't involved, but...

If Facebook’s entire argument is based on his using an automated tool to gather this information, then he could crowdsource it: announce his plan to Facebook users and invite them to contribute information which they collect by hand.

And by “could”, I mean “could have”. By which I mean that once he made the announcement, it’d be hard to prove that the data in his possession hadn’t come from big crowds of helpful Facebookers.

david G (user link) says:

Re: I know logic isn't involved, but...

“And by “could”, I mean “could have”. By which I mean that once he made the announcement, it’d be hard to prove that the data in his possession hadn’t come from big crowds of helpful Facebookers.”

One problem, with today’s standard you are guilty until you prove YOU DID NOT get it by other means.

Dave G (user link) says:

I saw this yesterday and it really irked me

I saw this yesterday on another site. I went and read the blog and this really irked me. I wish peopel woudl get together and say enough when we see these type of abuses. Some peopel had arguments that their EULA states you can’t spider their site without previous permission, but I say, then don’t allwo it in your robot.txt file. You can’t play the open card, then shut the door when it dosn’t server your purpose. I feel the same way about people who set up rss feeds, then state you cannot use it in an open manner in some blurb on the website, but not int the feed itself.

John Doe says:

All about control...

They only want you to open up your info if THEY can control it. They don’t want anyone else to have it; you must come to them to get it. If it is that useful, they will want to charge for it.

Personally I don’t believe they have a legal leg to stand on, but our court system is for the rich as the rest of us can’t afford to fight.

JackSombra (profile) says:

“On the one hand, it tries to push everyone to open up their info, but then if anyone does anything useful with it, they threaten to sue? “
The reason is very simple if you ask yourself a simple question, how does facebook make money?

Via two methods. The obvious one is advertising, the second, not so obvious method is selling info like what was collected by this guy. He was cutting into their revenue stream, hence the trigger happy (but imo toothless) lawyers

Anonymous Coward says:

Re: Re:

it is a question of speed and volume just like a library compared to a torrent. as a single person clicking and making notes you might get a few dozen pieces of information. a bot running 24 hours per day will collect much more data more than anyone would personally need. scale is key.

V for Vendetta says:

Facebook data leak - download all files here

The original work, released a few days ago, was done on a Unix machine, and therefore, used Unix compression, which is woefully inadequate when compared to even WinRAR.

So I have all the original Facebook data, decomressed them, and tested three Windows-based compressors – WinRAR won out (the other contestants were 7-Zip and WinZIP)

The original data are merely huge text files, and came in at a hefty 15GB. With WinRAR, I was able to get that to just a bit over 2GB.

If you would like the files, you can download them yourselves from RS, much faster I suspect than from a torrent. Here are the links:

[b]YOU MUST DOWNLOAD all five files to get the data.[/b] Click FREE USER button if not a Premium Member.

It’s ALL public information, so is all legal – kinda fun to peruse through, though not exciting.


Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...