Did The NSA Think The Public Can't Do Math? Attempt To Downplay Data Collection Fails Miserably

Last week we wrote about the NSA’s ridiculous attempt to justify its surveillance efforts, including this really wacky callout designed to show just how “little” data the NSA collects.

Scope and Scale of NSA Collection

According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world’s traffic in conducting their mission — that’s less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA’s total collection would be represented by an area smaller than a dime on that basketball court.

This was bizarre on a number of levels, not the least of which is the wacky basketball court-to-dime scale. Next time, maybe we can play “is it bigger than a breadbox” with the NSA. But, as for what any of this meant, it hasn’t been at all clear. Since the NSA has already redefined basic English words like “collect,” “target,” “datamine,” and “relevant” it’s not at all clear what is meant by “touch.” However, some are starting to dig into the numbers, and contrary to the NSA’s attempt to suggest that this is “nothing to fear,” a bit of analysis certainly suggests they’re collecting quite a bit of info.

First up, we have Jeff Jarvis, who highlights a bunch of important comparative datapoints including that Sandvine claims that only 2.9% of US traffic is communication traffic and 68.8% of all email is spam — meaning that it’s entirely possible that the NSA collects nearly all non-spam email and it would still be within its 1.6% number. He also points out that 62% of traffic on the internet is considered entertainment, and we can assume that the NSA doesn’t need to collect every copy of Game of Thrones that people are passing around (I’m sure one or two will do the job). He similarly points out that Google itself claims to only index approximately 0.004% of traffic on the internet, suggesting that the NSA may be collecting more info than Google indexes by two orders of magnitude.

Meanwhile, Sean Gallagher, over at Ars Technica, digs a bit deeper into the numbers, suggesting that the NSA’s data collection is closer to being on par with Google, but still greater than Google:

The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day. That means NSA is “touching” more data than Google processes every day (a mere 20 petabytes).

Gallagher also looks much more closely at the recently revealed details of the Xkeyscore program, to show how that 1.6% of “touched” internet communications can cover pretty much everything important.

As a result, if properly tuned, the packet analyzer gear at the front-end of XKeyscore (and other deep packet inspection systems) can pick out a very small fraction of the actual packets sent over the wire while still extracting a great deal of information (or metadata) about who is sending what to who. This leaves disk space for “full log data” on connections of particular interest.

In other words, while the 1.6% number was put forth by the NSA to try to make people think this is no big deal, when you look at what it means, it suggests it’s a very big deal indeed. In fact, the NSA may be collecting even more information that people had believed before.

out_of_the_blue says:

Re: No, AC it SHOWS Google is 2/3 size of NSA!

@ “So, does this mean OotB can’t claim Google is the real spy we should be worried about anymore?”

“29.21 petabytes of data a day. That means NSA is “touching” more data than Google processes every day (a mere 20 petabytes).” — Since you obviously don’t understand numbers: 20 / 29.21 is about 2/3. — Google, ONE corporation, is doing 2/3 as much SPYING that national scale NSA is!

Thanks for the query! You’re welcome!

Anonymous Coward says:

Re: Re: No, AC it SHOWS Google is 2/3 size of NSA!

Ahh, the wonders of twisting math. Google /processes/ that much data. The NSA is storing and more than Google is processing. How much of that 20 PBs is Google actually sifting for data? Remember to show your work and all citations from valid sources. (Note: Your ass is not a valid source.)

out_of_the_blue says:

Now, what does "index" mean? -- Spying on activities.

“Google itself claims to only index” — Umm, doesn’t matter what number follows: Google only “indexes” what Google wants to index, and that’s actual activities of persons, which can be compressed by tokenizing: for instance, Google probably doesn’t store the full text of every url, as “I’m sure one … will do the job”.

Now, on other hand this merits exclams: “Google processes every day (a mere 20 petabytes)”!!! Especially when just tossed in to compare with NATIONAL scale collection at NSA!!! — That’s a HEAP of data gathered EVERY DAY about you and me — mostly you who foolishly just let it, don’t do what you can to fight the Data Monster. So I say that for Google TOO “it’s a very big deal indeed” that “can cover pretty much everything important.”

Where Mike sez: Any system that involves spying on the activities of users is going to be a non-starter. Creeping the hell out of people isn’t a way of encouraging them to buy. It’s a way of encouraging them to want nothing to do with you.” — So why doesn’t that apply to The Google?

Anonymous Coward says:

Re: Now, what does "index" mean? -- Spying on activities.

Because consensual indexing of public facing websites and using a dragnet to scoop up cellphone metadata are totally the same thing. Don’t want google to have your e-mail metadata? Don’t use g-mail, problem solved. Don’t want the NSA to have it? Good luck with that.

Anonymous Coward says:

At least one of Sandvine's numbers is wrong

Any quoted statistic on spam that puts it anywhere under 95% of all email traffic is wrong. While incident rates vary across country, network, domain, MX, recipient, MTA, OS, and myriad other factors (I’ve seen as high as 99.6%, as low as 92%, both within the last year), and while the average varies based on how you compute it (volume, messages, etc.) it’s pretty clear that “the number” is somewhere above 95%, and it’s just a question of where in that last 5% of the continuum that it falls.

But that aside: clearly the NSA is in a rather unique position to filter all that and pluck out the non-spam: they have the largest corpus of email traffic of anybody, anytime, anywhere. That, combined with basic traffic analysis, combined with their vast computing resources, actually makes the problem somewhat tractable — if tedious.

In other words, they’re best-positioned to toss the junk and keep what’s useful. And the more email they vacuum up, the better positioned they are.

Phil62 says:

Re: At least one of Sandvine's numbers is wrong

But if a lot of people start encrypting things like spam, Game of Thrones, LOLcat pictures and forwarding these the NSA will keep those encrypted files hoping for the day they can decrypt them. Do they have that much storage space? If so, encrypt more spam

Anonymous Coward says:

Re: Doesn't really matter how large it is if it's our most sensitive junk...

The other obvious (and horrible) joke is:

Having taken a page from the NSA’s PR playbook pedophiles now claim ‘In their child molestation mission, NAMBLA only touches about 1.6% of their bodies.’

*PS. Data dolls would have a better ring to it.

Anonymous Coward says:

You know, one thing I don’t think I’ve seen mentioned about the NSA vacuuming up pretty much all emails.

Most places send confirmation emails. Do a secure transaction on Paypal? They automatically send you a confirmation email for the amount and destination. Buy something on Amazon, or Barnes and Noble, or Staples, or pretty much any online store and they send you a confirmation email saying “this is what you bought”. Often times price is included, sometimes they have the tracking number right in the email for anyone to go to the shipping company’s website and plug in the number and see where the shipment is.

So even without the NSA being able to monitor https transactions, they can monitor what you buy online, where you buy it, for how much, and in some cases, when you get it. They can also monitor who you donate to using paypal, and how much.

All from just reading your email.

Anonymous Coward says:

ok, someone help me with the math.

The first quote says:

According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day.

The second quote says:

The dime on the basketball court, as NSA describes it, is still 29.21 petabytes of data a day.

1.826 vs 29.21 PB

So how much traffic on the internet is there? and how much are they actually processing?

John Fenderson (profile) says:

Re: Re: Re: Re:

Yeah, the british convention of using commas as decimal markers really threw me when I first encountered it as a young sprout.

For even more number fun, the old-timey british “billion” (a million millions) and the US “billion” (a thousand millions) differ by three orders of magnitude. This has since been reconciled — it’s always a thousand millions now — but sometimes it can cause confusion when reading older texts.

Anonymous Coward says:

Yes, it was a stupid statement.

Yeah, I ignore those stats right away. Comparing apples to orange-colored stars.

I don’t care if the NSA isn’t collecting, scheduled to review, or have review a 480MB YouTube video. I do care if they’re collecting/scheduling to/ or have reviewed any of my .0000002 MB e-mails.

Anonymous Coward says:

Re: Yes, it was a stupid statement.

“I don’t care if the NSA isn’t collecting, scheduled to review, or have review a 480MB YouTube video. I do care if they’re collecting/scheduling to/ or have reviewed any of my .0000002 MB e-mails.”

You send emails that are less than 1 byte?

But yeah, some traffic is much more important than other traffic. Although I don’t want them spying on my videos EITHER. The NSA does not need to know about ANY of my traffic, because there is no probable cause.

McCrea (profile) says:

Re: Re: Re: Yes, it was a stupid statement.

Point is that percentage of traffic is completely the wrong metric.

Part of my premise is that of “all the data transmitted on the Internet”, most of it is public: such as typical videos.

My example even excluded the fact that said YouTube video is subject to thousands times more transmissions than the single transmission of an e-mail to a single recipient.

RD says:

So who is worse?

“He similarly points out that Google itself claims to only index approximately 0.004% of traffic on the internet, suggesting that the NSA may be collecting more info than Google indexes by two orders of magnitude.”

But according to out_of_his_giant_blue_ass, raising these points and making articles about them is “trivial” while Google is the DEVIL! DEVIL! DEVIL!

Anonymous Coward says:

The longer this goes on about the spying, NSA, congress critters, and the Obama administration the worse it looks. All I hear coming from official channels is how it is all legal, we’re not changing, and we’re not going to tell you what is really going on.

One official comes out and says this, the next one that, when compared they’re both lies and misdirections. Attempts at clearing the air to the public come down to schemes on how to hide and continue.

What really worries me is we haven’t heard it all and there are more block busters coming according to Glen.

This whole thing reeks of scandal. Of government gone bat shit crazy and expecting the citizens to just accept it. I know for sure I can’t trust what I am hearing coming out of Washington. There’s been too many lies and half truths.

We are not to the bottom of all this yet. We are not hearing the truth at all. There is nothing in all these supposed revelations I am comfortable with given I know I’m being lied to up front.

LivingInNavarre (profile) says:


Maybe I’m way off the mark but if it’s so obvious they will mostly ignore all that streaming media wouldn’t it be easy to start embedding messages into said media? Hell, for that matter you could just run subtitles to give the orders for Jihad.

Ooops, think I gave them something to think about. Now they’ll have to collect all data!

Anonymous Coward says:

Something else interesting I read yesterday. Some fellow claims he has internet through his phone with AT&T. Him and his buddy, both in the same room, decide to setup encryption for email.

But they are having troubles getting the keys to accept each other. Then it becomes clear why. There’s a man-in-the-middle interfering with the transfers of numbers for the keys. Very shortly after they get the numbers to jive, the originating guy gets an email from his buddy in the same room saying they’ve agreed…but his buddy didn’t send that email.

Within a short period of time, AT&T sends an email he needs to update his browser. Why would AT&T do that in email? Why wouldn’t it be a web browser pop up?

I don’t know what to make of it. Maybe it’s more tinfoil hat stuff. Maybe he was dead on the money given how little NSA likes encryption.

Anonymous Coward says:

Avg email size

Considering the size of your average email is well under 100 KILOBYTES(more likely in the 10-50kb range), the NSA and their downplayed numbers, are collecting every email sent to/from/within the US and elsewhere.

and yes, the American public are a majority of idiots. they will believe practically anything the media and govt tell them. “It must be true, i saw it on TV or read it in the paper, etc…”

For those of you tech savvy enough to figure it out, the TOR project is a great way to keep prying eyes from your private parts. May be slow, but so far, it’s secure. The more people that use TOR, the more difficult it becomes to trace what we are doing within it.

Dadwhiskers says:

All this is stupidity!

Generally, laws are there to influence people to not harm others, or themselves, through the possibility of punishment. If you are actively violating law(s), you are probably harming others, and/or yourself. Although some may have some childish rationalization about it that they use to mollify their conscience.

If you are stupid enough to use open email as a way of communicating with others about your illegal activities, you are too stupid to be allowed to circulate in society, because your very presence there (given your level of stupidity) will degrade society. I would rather have you in jail for your illegal activities than to have you drooling on my shoes (a metaphor, in case you are one of them) wherever we cross paths.

What the NSA is doing, is to just make it somewhat more tedious for “terrorists” to operate, much as the Secret Service makes it very, very difficult, though not impossible, to assassinate the President. The dangerous “terrorists” already know about encryption, and the many other ways from which they may choose to communicate covertly.

Some rambling: I would assume, they (both the NSA and Secret Service) do stumble across a drooler now and then. Only a true drooler would ever think it would be in any way beneficial to assassinate the president, or to kill a bunch of innocents, as the “terrorists” are wont to do. Though some are more technically literate than others, and would be able to function covertly. Now, more focused rambling.

Anyway, it is important that both the droolers and those who are able to operate very covertly be thwarted, and that is what the NSA is attempting to do. I don’t know what some people may be buying on the internet, or putting in their emails and texts that they are fearful that “Big Brother” will discover, but the NSA uses filters much as Google does to target relevant emails etc. No one is (nor could be) reading all that stuff, and no one cares what it says, unless it trips a filter. Hint, don’t buy 12 tons of ammonium nitrate, or discuss bombing an embassy. Then someone may read it.

I know that it is really the “principle” of the thing (intrusion into our privacy) that gets people riled up, but in the end, THE INTRUSION IS NOT GOING TO STOP (figger it out, DUH!), as long as there are serious “threats” out there. The NSA doesn’t care about “your” emails etc., or even about your privacy, that they are actually not violating, because they aren’t actually reading your emails, but they do care about “threats”.

This is where it starts to get sticky, because the definition of what a threat is can change. But I see no way that it can come to include my purchase of 2 pounds of organic alfalfa seeds for sprouting, or bathroom scale – my most recent internet purchases. That’s about as insidious as it gets for me, and about as insidious as it ever gets for most people. That is not going to trip any filters I can imagine, and so, none of the stuff I do on the internet is ever going to be read by anyone, and even if it is, I don’t care. I don’t do things that harm people (violate laws – well, you know, generally). I just don’t want bombs going off around me. If I ever do decide to blow up an embassy, I’ll encrypt my emails.

If you discuss things in your emails that you would find embarrassing (cute little love notes etc.), no one at the NSA is ever going to read them. If I were to send such things I wouldn’t care if they did read them, as I’m not particularly bashful. If you’re a porn perv – and a pornography “obsession” is a perversion (rather, the result of a sort of mental illness) that is not good for your peace of mind, and being such, harms you – or you are an internet stalker, bent on maltreatment of others, then the filters may (and perhaps should – it bears discussion) come to include such activities. But the real question, again, is where will it stop?

Well, none of us really knows the answer to that question, except that it will not stop until the “government” wants it to. And who is the government? Ultimately, it’s you. So, do keep complaining, but get real about it and quit sniveling about if the government could potentially know what you buy on the internet, or what web sites you visit or what’s in your innocent emails. Potentially – they don’t read your emails, though they could. With a little effort, I could probably capture your email.

The real thing is to work on getting some sort of reality based controls put in place, for when it gets to the point that the NSA finds that its “intrusions” are no longer worth the effort. Then “they” will allow us to put effective control measures into place, but as long as “they” believe they can “protect” the U.S. by doing what they see as necessary, the intrusions will not stop, and personally, I don’t want them too. It would be very dangerous for the current level of “intrusions” on our privacy to stop just now. I don’t want bombs going off around me. Do you? Devil’s advocate, signing out.

