by Timothy Lee
Fri, Nov 30th 2007 2:41pm
Slashdot reports that a pair of computer scientists have figured out how to de-anonymize the "anonymous" data set that Netflix released as part of its million-dollar contest to improve its recommendation algorithm. The researchers found that the set of less-popular movies a user has rated tends to uniquely identify that user. By comparing movie ratings on IMDB with the ratings in the Netflix data set, the researchers were often able to uniquely pair a particular IMDB user with a corresponding Netflix user. And that meant the researcher would instantly have access to all of the user's Netflix ratings, which Netflix users presumably expected to remain private. While movie ratings might seem innocuous at first glance, the authors point out that one's movie ratings can often reveal potentially embarrassing personal details, including a user's views on politics, religion, and homosexuality. This isn't the first time a company has released "anonymous" data regarding its users that turned out not to be so anonymous. Last year, AOL got in a lot of hot water when it released a data set of search queries that turned out to be quite easy to link back to the users conducting the searches. The lesson here is that companies should be very reluctant to release private customer data, even if they believe they have "anonymized" it. Anonymization is surprisingly difficult, and you can never be sure you've done it successfully; it's always possible that someone will find a way to link records back to the people they represent. Wherever possible, companies needing to release data should either aggregate it in a way that avoids revealing information about individuals, or they should carefully limit who has access to the data sets, to avoid having the data sets become publicly available. Simply stripping out the "username" field doesn't cut it.
If you liked this post, you may also be interested in...
- Netflix CEO Says Annoyed VPN Users Are 'Inconsequential'
- FCC Commissioner: Gov't Should Never Interfere In Private Markets...Unless ISPs Have A Chance To Mock Netflix
- Rep. Speier Wants To Register Every Prepaid Phone Purchase, In Case Someone Bad Uses One As A Burner Phone
- Oil Industry Group Claims Copyright On Oil Pricing Data, Gets Twitter To Delete Tweets
- Administration Grants FBI More Raw Access To NSA Data Just As FBI Claims To Be Implementing New Minimization Procedures