New Study Shows Anonymous Data Isn't Very Anonymous At All

from the hear-that? dept

We've pointed out time and time again that there's really no such thing as an anonymized dataset. Given the data, it's almost always easy enough to at least connect some of it back to a real person. It looks like there's now some research to support that. Steven Hoy points us to a new paper where some researchers wrote an algorithm that takes anonymized data from social networks and connects it back to names and addresses of individuals:
We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.
Basically, the researchers are saying that anonymized data isn't really anonymous -- and social networks that insist they're "safe" because they've anonymized the data are being somewhat disingenuous.

Filed Under: anonymity, anonymous data, social networks

Reader Comments

Subscribe: RSS

View by: Time | Thread

  1. identicon
    Ray, 29 Mar 2009 @ 7:28pm


    It appears, and I say 'appears' advisably since they have not posted the full results, that the more sites you post on that sell your "anonymized" information, then the better the chance that you can be matched up with what you might consider your private data.

    Really nothing new about this if that is true, it is a standard spy methodology used to identify what is going on someplace, just get a lot of data points and see what the pattern is. First saw the affects of that back in the late 70's when the monitoring of CB radios (about half the unit used them) and the telephones gave away the supposedly secret plans for a military training operation that most of us only knew tiny pieces of prior to the exercise. Or in the late 80's a security test group identified what was going on in a supposedly secret building by using license plates, normal phone listening, and hanging around local businesses people went to for lunch and drinks.

    So the next question is, what happens if you use a different IP address (not the same company/town, but a different company which has a different town listed), and a different user name for each social site? I think (but would not bet on it) that it would be much harder to cross-reference without analyzing postings carefully over a long period of time, not impossible since most people have unique habits that act like a signature.

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here

Subscribe to the Techdirt Daily newsletter

Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Special Affiliate Offer

Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Report this ad  |  Hide Techdirt ads
Recent Stories
Report this ad  |  Hide Techdirt ads

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it

Email This

This feature is only available to registered users. Register or sign in to use it.