by Timothy Lee
Fri, Nov 30th 2007 2:41pm
Slashdot reports that a pair of computer scientists have figured out how to de-anonymize the "anonymous" data set that Netflix released as part of its million-dollar contest to improve its recommendation algorithm. The researchers found that the set of less-popular movies a user has rated tends to uniquely identify that user. By comparing movie ratings on IMDB with the ratings in the Netflix data set, the researchers were often able to uniquely pair a particular IMDB user with a corresponding Netflix user. And that meant the researcher would instantly have access to all of the user's Netflix ratings, which Netflix users presumably expected to remain private. While movie ratings might seem innocuous at first glance, the authors point out that one's movie ratings can often reveal potentially embarrassing personal details, including a user's views on politics, religion, and homosexuality. This isn't the first time a company has released "anonymous" data regarding its users that turned out not to be so anonymous. Last year, AOL got in a lot of hot water when it released a data set of search queries that turned out to be quite easy to link back to the users conducting the searches. The lesson here is that companies should be very reluctant to release private customer data, even if they believe they have "anonymized" it. Anonymization is surprisingly difficult, and you can never be sure you've done it successfully; it's always possible that someone will find a way to link records back to the people they represent. Wherever possible, companies needing to release data should either aggregate it in a way that avoids revealing information about individuals, or they should carefully limit who has access to the data sets, to avoid having the data sets become publicly available. Simply stripping out the "username" field doesn't cut it.
If you liked this post, you may also be interested in...
- Montana Newspaper Announces Plans To Reveal The Names Of All Previous Commenters, Despite Promises To Keep Them Secret
- TV Industry Starts Running Fewer Ads To Combat Netflix, Cord Cutting
- Tor Project Claims FBI Paid Carnegie Mellon $1 Million To Deanonymize Tor Users
- James Woods Not Allowed To Find Out Name Of Guy Who Called Him A Cocaine Addict On Twitter
- T-Mobile Wades Into Net Neutrality Minefield With Plan To Zero Rate Netflix, HBO