We've pointed out time and time again that there's really no such thing as an anonymized dataset
. Given the data, it's almost always easy enough to at least connect some of it back to a real person. It looks like there's now some research to support that. Steven Hoy
points us to a new paper
where some researchers wrote an algorithm that takes anonymized data from social networks and connects it back to names and addresses
We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.
Basically, the researchers are saying that anonymized data isn't really anonymous -- and social networks that insist they're "safe" because they've anonymized the data are being somewhat disingenuous.