by Timothy Lee
Fri, Nov 30th 2007 2:41pm
Slashdot reports that a pair of computer scientists have figured out how to de-anonymize the "anonymous" data set that Netflix released as part of its million-dollar contest to improve its recommendation algorithm. The researchers found that the set of less-popular movies a user has rated tends to uniquely identify that user. By comparing movie ratings on IMDB with the ratings in the Netflix data set, the researchers were often able to uniquely pair a particular IMDB user with a corresponding Netflix user. And that meant the researcher would instantly have access to all of the user's Netflix ratings, which Netflix users presumably expected to remain private. While movie ratings might seem innocuous at first glance, the authors point out that one's movie ratings can often reveal potentially embarrassing personal details, including a user's views on politics, religion, and homosexuality. This isn't the first time a company has released "anonymous" data regarding its users that turned out not to be so anonymous. Last year, AOL got in a lot of hot water when it released a data set of search queries that turned out to be quite easy to link back to the users conducting the searches. The lesson here is that companies should be very reluctant to release private customer data, even if they believe they have "anonymized" it. Anonymization is surprisingly difficult, and you can never be sure you've done it successfully; it's always possible that someone will find a way to link records back to the people they represent. Wherever possible, companies needing to release data should either aggregate it in a way that avoids revealing information about individuals, or they should carefully limit who has access to the data sets, to avoid having the data sets become publicly available. Simply stripping out the "username" field doesn't cut it.
If you liked this post, you may also be interested in...
- Patent Troll Sues Netflix, Soundcloud, Vimeo And More For Allowing Offline Viewing
- Cyberbullying Bill Would Grant Power To Strip Online Anonymity Before Legal Proceedings Begin
- Windows DRM: Now An (Unwitting) Ally In Efforts To Expose Anonymous Tor Users
- Michigan Lawmakers Looking To Amend State Constitution To Add Protections For Electronic Data
- Basically The Entire Tech Industry Signs Onto A Legal Brief Opposing Trump's Exec Order