Anyone Brushing Off NSA Surveillance Because It's 'Just Metadata' Doesn't Know What Metadata Is

from the your-metadata-reveals-quite-a-bit dept

One of the key themes that has come out from the revelations concerning NSA surveillance is a bunch of defenders of the program claiming “it’s just metadata.” This is wrong on multiple levels. First of all, only some of the revealed programs involve “just metadata.” The so-called “business records” data is metadata, but other programs, such as PRISM, can also include actual content. But, even if we were just talking about “just metadata,” the idea that it somehow is no big deal, and people have nothing to worry about when it comes to metadata is ridiculous to anyone who knows even the slightest thing about metadata. In fact, anyone who claims that “it’s just metadata” in an attempt to minimize what’s happening is basically revealing that they haven’t the slightest clue about what metadata is. Here are a few examples of why.

Just a few months ago, Nature published a study all about how much a little metadata can reveal, entitled Unique in the Crowd: The privacy bounds of human mobility by Yves-Alexandre de Montjoye, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. The basic conclusion: metadata reveals a ton, and even “coarse datasets” provide almost no anonymity:

A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to an individual. For instance, in one study, a medical database was successfully combined with a voters list to extract the health record of the governor of Massachusetts27. In another, mobile phone data have been re-identified using users’ top locations28. Finally, part of the Netflix challenge dataset was re-identified using outside information from The Internet Movie Database29.

All together, the ubiquity of mobility datasets, the uniqueness of human traces, and the information that can be inferred from them highlight the importance of understanding the privacy bounds of human mobility. We show that the uniqueness of human mobility traces is high and that mobility datasets are likely to be re-identifiable using information only on a few outside locations. Finally, we show that one formula determines the uniqueness of mobility traces providing mathematical bounds to the privacy of mobility data. The uniqueness of traces is found to decrease according to a power function with an exponent that scales linearly with the number of known spatio-temporal points. This implies that even coarse datasets provide little anonymity.

Some of the figures they presented show how easy it is to track individuals and their locations, which can paint a pretty significant and revealing portrait of who they are and what they’ve done.

In an interview, one of the authors of the paper basically said that your metadata effectively creates a “fingerprint” that is unique to you and easy to match to your identity:

“We use the analogy of the fingerprint,” said de Montjoye in a phone interview today. “In the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset.”

Others are discovering the same thing. Ethan Zuckerman, who recently co-taught a class with one of the authors of the paper above, Cesar Hidalgo, wrote about how two students in the class created a project called Immersion, with Hidalgo, which takes your Gmail metadata (“just metadata”) and maps out your social network. As Zuckerman notes, his own use of Immersion reveals some things that could be questionable or dangerous.

He discusses some bits of metadata that are “obvious,” which would make him easily identifiable, but which probably aren’t that “questionable.” However, he also notes some potentially problematic things as well:

Anyone who knows me reasonably well could have guessed at the existence of these ties. But there’s other information in the graph that’s more complicated and potentially more sensitive. My primary Media Lab collaborators are my students and staff – Cesar is the only Media Lab node who’s not affiliated with Civic who shows up on my network, which suggests that I’m collaborating less with my Media Lab colleagues than I might hope to be. One might read into my relationships with the students I advise based on the email volume I exchange with them – I’d suggest that the patterns have something to do with our preferred channels of communication, but it certainly shows who’s demanding and receiving attention via email. In other words, absence from a social network map is at least as revealing as presence on it.

Separately, more than two years ago, we wrote about how a German politician named Malte Spitz got access to all of the metadata that Deutsche Telekom had on him over a period of six months, and then worked with the German newspaper Die Zeit to put together an amazing visualization that lets you track six months of his life entirely via his metadata, combined with public information, such as his Twitter feed.

While this all came out over two years ago, just recently, Spitz wrote a NYT op-ed piece about how this “just metadata” situation means that it’s tough to trust the US government.

In Germany, whenever the government begins to infringe on individual freedom, society stands up. Given our history, we Germans are not willing to trade in our liberty for potentially better security. Germans have experienced firsthand what happens when the government knows too much about someone. In the past 80 years, Germans have felt the betrayal of neighbors who informed for the Gestapo and the fear that best friends might be potential informants for the Stasi. Homes were tapped. Millions were monitored.

Although these two dictatorships, Nazi and Communist, are gone and we now live in a unified and stable democracy, we have not forgotten what happens when secret police or intelligence agencies disregard privacy. It is an integral part of our history and gives young and old alike a critical perspective on state surveillance systems.

“Just metadata” isn’t “just” anything, other than a massive violation of basic privacy rights.

Filed Under: , , , , , , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Anyone Brushing Off NSA Surveillance Because It's 'Just Metadata' Doesn't Know What Metadata Is”

Subscribe: RSS Leave a comment
John Fenderson (profile) says:

How to explain?

This is a very nice explanation of why metadata collection is just as great of an invasion as collecting the message contents themselves (it’s a greater invasion, actually). However, it’s still a detailed, complex explanation.

I’ve been trying to figure out a quick, simple, “sound bite” way of explaining this. It’s something that people don’t seem to instinctively understand, I think because they focus on a single tree at a time, when the problem is really the forest.

John Fenderson (profile) says:

Re: Re: How to explain?

That doesn’t get to the meat of it, though. The real intrusion is not that simple patterns like that can be detected. It’s that when you’ve gathered and stored a lot of individual data points, even “anonymized”, it’s possible to learn almost everything there is to learn about you: from the obvious stuff like your name and where you live/work/play, to who your friends are, to what your political opinions are, even things like what kinds of genetic diseases you may be predisposed to. The power is actually mind-boggling: it’s possible to learn things about you that even you don’t know.

The Real Michael says:

Re: Re: Re: How to explain?

If you honestly believe that all they’re storing is metadata, I’ve got a bridge to sell you. They’re storing ALL communications (e-mail, text, phone, etc.), GPS monitoring systems, transactions, everything they can get their hands on. My guess is they’re creating a massive profiling/monitoring database on everybody. They can already remotely activate microphones and cameras in mobile phones; Snowden mentioned that he could eavesdrop on anyone’s communications at anytime.

ethorad (profile) says:

Re: How to explain?

The official explanation seems to be:

Metadata is data which describes other data and is thus completely harmless. As an example, Snowden didn’t release any of the actual data held by the NSA, only information about it – thus he only released metadata which is … completely … um … harmless … hang on, scratch that. Guys? We need a better explanation!

Anonymous Coward says:

the NSA, USA, UK and everyone else involved in this scandalous treatment of ordinary people, are relying on that very fact. if people knew exactly what it is, what can be gleaned from it by the machines that can read just that and how incriminating the most seemingly innocent exchanges can be taken, things would be different. the biggest problem is that all those governmental officers and politicians involved want it all to continue, want everyone to just poo-poo it off so they can continue with the extortionate salaries whilst stomping around like bulls on heat telling everyone how important they are. most of them are not needed! they are only important to themselves and then only because they have made themselves important. if they really wanted to be important they should try doing things that really are needed, like stopping world food shortages or banning whaling!!

ECA (profile) says:

OK, some things, to think about

1. corp you think they DONT KNOW ENOUGH?? think about credit agencies, add that Target credit card..and they know Every purchase you made.
2. SPAM..with small amounts of Data, they can get information on you that would scare the >>>> out of you. and empty your bank.
3. the USA gov, has LESS info on you then the CORPS do. the IRS cant even track Corp Tax accounts.

IF the Gov. grabbed the Corp info, Credit agencies..and merged it with Social sec. They would have all the info they need.

The problem they are TRYING to do. tends to be with communication. The problem is that there is so much, HOW do you sort it out.
consider an adversary, that WAS RAISED AND taught how to send communications without tracking..Direct communications and Message drops. interlinked groups that have no EAL connections, except by message drops..

Phone spying wont help.(they might buy a Burner phone, make 1 message and throw it away) NET spying wont help.( there are MANY games/site/… to send data over the net, that its astounding). trying to track illegals, is hard enough. and if they are Staying below radar..its even harder.

out_of_the_blue says:

Same with Google for "mere" search terms and websites.

One can deduce a lot from many little bits, and anyone with access to the record holds some degree of power over you. Google will claim it doesn’t identify individuals, but of course that’s just another lie waiting to be revealed.

Don’t leave out “The Google” when you write of privacy rights. Corporations don’t have any right to track persons, EITHER. As a minion wrote just today: “It’s almost as though these entities assume they have an innate right to access personal data without … any consideration for the rights of those whose information they’re sweeping up.”

Unlike corporatist Mike, I’d bet most people believe that corporations CAN violate your rights just as much as gov’t does, and often more so, besides annoyingly.

That One Guy (profile) says:


Better be careful blue, I have it on good authority, from third-hand sources that a google is hiding out in your closet, as well as the one currently camped out under your bed.

Just a suggestion, but if you ever want to, you know, be taken seriously, learn the difference between giving a company information, and having the information taken whether you like it or not by the government. Until then, you’ll just continue to be seen as the willfully blind, paranoid individual that you are now.

Rikuo (profile) says:

Re: Same with Google for "mere" search terms and websites.

Shut the fuck up about Google. You condemn Google for harvesting this data, but NOT ONCE have you criticized the US government for doing the same thing.
Everyone here on Techdirt agrees with you about the amount of data that Google collects, but we’re not worried about the Big G. We’re worried about the Other Big G, Uncle Sam, which is infinitely scarier than a single corporation. Yet, for some reason, you can’t process the thought that there are worse things than Google out there.

Strafe says:

Re: Re: Same with Google for "mere" search terms and websites.

His post was the equivalent of “Pay no attention to the man behind the curtain!” It’s a common tactic often employed by trolls. The goal is to derail the conversation and/or shift focus to something or someone else. If humans were more like cats, easily distracted by a moving laser dot, this troll tactic might work. Fortunately we’re smarter than that and can see right through out_of_the_blue’s worn out charades. Well, most of us are smarter than a cat anyways. Looking at the replies, it obvious some aren’t lol. :p

JMT says:

Re: Same with Google for "mere" search terms and websites.

“Don’t leave out “The Google” when you write of privacy rights.”

Let’s compare shall we?

Google’s data gathering supports a whole range of really useful products and services that most people seem to love; there is no evidence of Google ever nefariously abusing their ability to collect their users’ data; and if you don’t like them or their actions, you simply don’t use them.

On the other hand, the USG’s data gathering has yet to be proven useful to the general public other than to provide a false sense of security to those paranoid about terrorism; there is a long and sordid history of governments and law enforcement abusing their access to such data; and if you don’t like all this you’re shit out of luck.

Do us all a favour and don’t ever compare these two things again, ok?

Bernardo Verda says:

Re: Re: Re: Re:

“What’s scary is that I’m all flustered about the surveillance state, and all my friends are like “meh. We knew that. Why’s this a problem?” “

The Techdirt comment system needs another click-able button, right alongside “Insightful” and “Funny”
— this one should be labelled “Depressing”…

Uriel-238 (profile) says:

Yeah, not sure about the Google-hate.

Google’s business model was about having their data block but processing it without direct human access to it.

So, granted, Google’s network reconnoiters you in very much the same way that NSA does with PRISM, using information from emails, searches, etc. to determine your interests in order to target relevant advertising at you (and augment the usefulness of their many services). Google was doing a great job of demonstrating the benign side of surveillance.

Also, Google’s gone to great lengths to defend their core database from court subpoenas. Really it should be truly inaccessible, though we’ve also had the occasional Google-tech stalker get fired over misuse of it.

But now thanks to the FISC, that core database is a threat to the privacy of all its users. Hopefully this won’t destroy the Google business model.

Uriel-238 (profile) says:

Re: Yeah, not sure about the Google-hate.

It occurs that the whole NSA thing could go a long way to earn back some love if…

a) their surveillance database was inadmissible in court, ergo it couldn’t be used to prosecute anyone for anything (the DoD is going to disappear terrorists anyway), and…

b) they do as Google did and provided a whole bevy of services to the surveyed populace (instant statical analyses, calendars, phone books, email and so on) all covered ad free from the NSA budget.

This isn’t a perfect solution, but it certainly would make the whole ordeal a bit more tolerable.

John Fenderson (profile) says:

Re: Yeah, not sure about the Google-hate.

So, granted, Google’s network reconnoiters you in very much the same way that NSA does with PRISM, using information from emails, searches, etc.

Well, no, not even remotely in the same way as the NSA does. but that aside, a huge difference is that you can avoid the vast majority of Google’s spying by not using their services. You can avoid almost all Google spying by blocking access to their domains.

You can’t avoid NSA spying.

Wally (profile) says:

Best Explanation...

Most digital cameras provide metadata on .jpg files or .raw files. This information includes a lot of details from the DPI resolution to the f-stop your camera used to take a shot.

Our smart phones and electronic devices have such data on them…the meta data stored on your smart phone can include and does store the IP adress, MAC Adress, Your name, your registered name, your phone number, who you last called, who last called you, who last texted you, who you last texted, your credit card info, how many minutes each call was, how much space your texting used and the last text message sent or received. Any questions so far?

Spaceman Spiff (profile) says:

It's just metadata!

“In fact, anyone who claims that “it’s just metadata” in an attempt to minimize what’s happening is basically revealing that they haven’t the slightest clue about what metadata is.”

In reality, anyone who asserts this is disingenuous to the extreme. They know what metadata is, but they want to minimize the exposure to the general public what this really means to them!

Woadan (profile) says:

TechDirt had already told us about this:

As Jane Mayer at the New Yorker recently explained, the metadata issue is the one we should be most frightened about:

?The public doesn?t understand,? [mathematician and
former Sun Microsystems engineer Susan Landau] told me,
speaking about so-called metadata. ?It?s much more
intrusive than content.? She explained that the
government can learn immense amounts of proprietary
information by studying ?who you call, and who they
call. If you can track that, you know exactly what is
happening?you don?t need the content.?

FM Hilton (profile) says:

Words and definitions

I think they created the word “metadata” to make the spying look as harmless as possible.

After all, it doesn’t look bad, it looks official and geeky enough so that ordinary people don’t look twice at it.
“I don’t understand that word, but it’s not a bad word.”

We do need a better word to describe the concept:
“Everything you have done in the past, everywhere you’ve been, every single penny you have spent, everyone you know or have talked to..all in our database.”

Or as Wikipedia states: “The term metadata refers to “data about data”. The term is ambiguous, as it is used for two fundamentally different concepts (types) “.

Ambiguous, indeed.

anonymous coward says:

This is entirely what more people need to understand. My cousin works in actuary science..very smart guy…statician..he talks about things that I can’t possibly understand and I’m not a dim bulb…his hobby is combining calculus and statistics… He told me the other day when we were discussing the NSA shitstorm that:

Metadata composes 99% of the information that’s actually present within any data pool, 1% is the initial data itself, but only about 5% of the Metadata is actually visible, that is until you begin to collect more data samples, at which point the Metadata from every new sample begins to build off of the previous metadata, filling in logic gaps, allowing for new virtual layers to be added, much like a solving soduku puzzle. He told me the payoff per additional data samples for classical data analysis is logarithmic meaning, that every new piece of information will become less valuable than the piece before it with 100% data never being acquired for a given operation through Meta data analysis alone.

However with multidimensional integration of virtual data sets -Metadata- (particularly extremely large sample sizes such as the one the nsa is amassing) it is possible to use information about the information to infer the original content, and so much much more. The types of things that would become comprehensible with the right framework are terrifying and unimaginable. Imagine being able to know almost anything about what is happening, has happened, could happen, this is the power of Meta data. Anyone who claims otherwise is full of absolute shit.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...