by Timothy Lee
Fri, Nov 30th 2007 2:41pm
Slashdot reports that a pair of computer scientists have figured out how to de-anonymize the "anonymous" data set that Netflix released as part of its million-dollar contest to improve its recommendation algorithm. The researchers found that the set of less-popular movies a user has rated tends to uniquely identify that user. By comparing movie ratings on IMDB with the ratings in the Netflix data set, the researchers were often able to uniquely pair a particular IMDB user with a corresponding Netflix user. And that meant the researcher would instantly have access to all of the user's Netflix ratings, which Netflix users presumably expected to remain private. While movie ratings might seem innocuous at first glance, the authors point out that one's movie ratings can often reveal potentially embarrassing personal details, including a user's views on politics, religion, and homosexuality. This isn't the first time a company has released "anonymous" data regarding its users that turned out not to be so anonymous. Last year, AOL got in a lot of hot water when it released a data set of search queries that turned out to be quite easy to link back to the users conducting the searches. The lesson here is that companies should be very reluctant to release private customer data, even if they believe they have "anonymized" it. Anonymization is surprisingly difficult, and you can never be sure you've done it successfully; it's always possible that someone will find a way to link records back to the people they represent. Wherever possible, companies needing to release data should either aggregate it in a way that avoids revealing information about individuals, or they should carefully limit who has access to the data sets, to avoid having the data sets become publicly available. Simply stripping out the "username" field doesn't cut it.
If you liked this post, you may also be interested in...
- A Growing Chorus Is Trying To Rewrite The History Of Net Neutrality -- And Blame Absolutely Everything On Netflix
- How The Copyright Industry Wants To Undermine Anonymity & Free Speech: 'True Origin' Bills
- Why Are Some People So Intent On Making Netflix More Like Traditional TV?
- New Anti-Corruption Social Network In Russia Requires Numerous Personal Details To Join: What Could Possibly Go Wrong?
- Florida Legislators Introduce Bill That Would Strip Certain Site Owners Of Their Anonymity