One of the key themes that has come out from the revelations concerning NSA surveillance is a bunch of defenders of the program claiming "it's just metadata." This is wrong on multiple levels. First of all, only some
of the revealed programs involve "just metadata." The so-called "business records" data is metadata, but other programs, such as PRISM, can also include actual content. But, even if we were just talking about "just metadata," the idea that it somehow is no big deal, and people have nothing to worry about when it comes to metadata is ridiculous to anyone who knows even the slightest thing about metadata. In fact, anyone who claims that "it's just metadata" in an attempt to minimize what's happening is basically revealing that they haven't the slightest clue about what metadata is. Here are a few examples of why.
Just a few months ago, Nature
published a study all about how much a little metadata can reveal, entitled Unique in the Crowd: The privacy bounds of human mobility
by Yves-Alexandre de Montjoye, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. The basic conclusion: metadata reveals a ton, and even "coarse datasets" provide almost no anonymity:
A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual's patterns are unique enough, outside information can be used to link the data back to an individual. For instance, in one study, a medical database was successfully combined with a voters list to extract the health record of the governor of Massachusetts27. In another, mobile phone data have been re-identified using users' top locations28. Finally, part of the Netflix challenge dataset was re-identified using outside information from The Internet Movie Database29.
All together, the ubiquity of mobility datasets, the uniqueness of human traces, and the information that can be inferred from them highlight the importance of understanding the privacy bounds of human mobility. We show that the uniqueness of human mobility traces is high and that mobility datasets are likely to be re-identifiable using information only on a few outside locations. Finally, we show that one formula determines the uniqueness of mobility traces providing mathematical bounds to the privacy of mobility data. The uniqueness of traces is found to decrease according to a power function with an exponent that scales linearly with the number of known spatio-temporal points. This implies that even coarse datasets provide little anonymity.
Some of the figures they presented show how easy it is to track individuals and their locations, which can paint a pretty significant and revealing portrait of who they are and what they've done.
In an interview, one of the authors of the paper basically said that your metadata effectively creates a "fingerprint"
that is unique to you and easy to match to your identity:
"We use the analogy of the fingerprint," said de Montjoye in a phone interview today. "In the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset."
Others are discovering the same thing. Ethan Zuckerman, who recently co-taught a class with one of the authors of the paper above, Cesar Hidalgo, wrote about how two students in the class created a project called Immersion
, with Hidalgo, which takes your Gmail metadata ("just metadata") and maps out your social network. As Zuckerman notes, his own use of Immersion reveals some things that could be questionable or dangerous.
He discusses some bits of metadata that are "obvious," which would make him easily identifiable, but which probably aren't that "questionable." However, he also notes some potentially problematic things as well:
Anyone who knows me reasonably well could have guessed at the existence of these ties. But there’s other information in the graph that’s more complicated and potentially more sensitive. My primary Media Lab collaborators are my students and staff – Cesar is the only Media Lab node who’s not affiliated with Civic who shows up on my network, which suggests that I’m collaborating less with my Media Lab colleagues than I might hope to be. One might read into my relationships with the students I advise based on the email volume I exchange with them – I’d suggest that the patterns have something to do with our preferred channels of communication, but it certainly shows who’s demanding and receiving attention via email. In other words, absence from a social network map is at least as revealing as presence on it.
Separately, more than two years ago, we wrote about
how a German politician named Malte Spitz got access to all of the metadata that Deutsche Telekom had on him over a period of six months, and then worked with the German newspaper Die Zeit to put together an amazing visualization
that lets you track six months of his life entirely via his metadata, combined with public information, such as his Twitter feed.
While this all came out over two years ago, just recently, Spitz wrote a NYT op-ed piece about how this "just metadata" situation means that it's tough to trust the US government
In Germany, whenever the government begins to infringe on individual freedom, society stands up. Given our history, we Germans are not willing to trade in our liberty for potentially better security. Germans have experienced firsthand what happens when the government knows too much about someone. In the past 80 years, Germans have felt the betrayal of neighbors who informed for the Gestapo and the fear that best friends might be potential informants for the Stasi. Homes were tapped. Millions were monitored.
Although these two dictatorships, Nazi and Communist, are gone and we now live in a unified and stable democracy, we have not forgotten what happens when secret police or intelligence agencies disregard privacy. It is an integral part of our history and gives young and old alike a critical perspective on state surveillance systems.
"Just metadata" isn't "just" anything, other than a massive violation of basic privacy rights.