by Timothy Lee
Fri, Nov 30th 2007 2:41pm
Slashdot reports that a pair of computer scientists have figured out how to de-anonymize the "anonymous" data set that Netflix released as part of its million-dollar contest to improve its recommendation algorithm. The researchers found that the set of less-popular movies a user has rated tends to uniquely identify that user. By comparing movie ratings on IMDB with the ratings in the Netflix data set, the researchers were often able to uniquely pair a particular IMDB user with a corresponding Netflix user. And that meant the researcher would instantly have access to all of the user's Netflix ratings, which Netflix users presumably expected to remain private. While movie ratings might seem innocuous at first glance, the authors point out that one's movie ratings can often reveal potentially embarrassing personal details, including a user's views on politics, religion, and homosexuality. This isn't the first time a company has released "anonymous" data regarding its users that turned out not to be so anonymous. Last year, AOL got in a lot of hot water when it released a data set of search queries that turned out to be quite easy to link back to the users conducting the searches. The lesson here is that companies should be very reluctant to release private customer data, even if they believe they have "anonymized" it. Anonymization is surprisingly difficult, and you can never be sure you've done it successfully; it's always possible that someone will find a way to link records back to the people they represent. Wherever possible, companies needing to release data should either aggregate it in a way that avoids revealing information about individuals, or they should carefully limit who has access to the data sets, to avoid having the data sets become publicly available. Simply stripping out the "username" field doesn't cut it.
If you liked this post, you may also be interested in...
- Popehat v. James Woods SLAPP-down Match; Coming Soon To A Court Near You
- Hollywood Keeps Breaking Box Office Records... While Still Insisting That The Internet Is Killing Movies
- Why TPP Threatens To Undermine One Of The Fundamental Principles Of Science
- Appeals Court Says Netflix Doesn't Violate Privacy By Displaying Viewing History To Anyone Using That Account
- Daily Dot Latest To 'Keep Conversation Moving Forward' By Not Letting Site Visitors Comment At All