'Anonymized Data' Is A Gibberish Term, And Rampant Location Data Sales Is Still A Problem
from the doing-nothing-helpful dept
That’s a particular problem when it comes to user location data, which has been repeatedly abused by everybody from stalkers to law enforcement. The data, which is collected by wireless companies, app makers and others, is routinely bought and sold up and down a major chain of different companies and data brokers providing layers of deniability. Often with very little disclosure to or control by the user (though companies certainly like to pretend they’re being transparent and providing user control of what data is traded and sold).
For example, last year a company named Veraset handed over billions of location data records to the DC government as part of a COVID tracking effort, something revealed courtesy of a FOIA request by the EFF. While there’s no evidence the data was abused in this instance, EFF technologist Bennett Cyphers told the Washington Post Veraset is one of countless companies allowed to operate so non-transparently. Nobody even knows where the datasets they’re selling and trading are coming from:
“A lot of these data brokers? existence depends on people not knowing too much about them because they?re universally unpopular,? Cyphers said. ?Veraset refuses to reveal even how they get their data or which apps they purchase it from, and I think that?s because if anyone realized the app you?re using ? also opts you into having your location data sold on the open market, people would be angry and creeped out.”
While a long list of companies continue to insist that the massive scale this data is bought and sold at is no big deal because the data is “anonymous,” experts (with mixed success) keep pointing out that’s not really true:
“If you look at a map of where a device spends its time, you can learn a lot: where you sleep at night, where you work, where you eat lunch, what bars and parks you go to,? Cyphers said. Because of that, he added, it?s extremely simple ?to associate one of these location traces to a real person.”
After major location data scandals at both Securus and wireless carriers, it looked like we might see actual reform on this front, but those efforts have largely stalled. Bills specifically targeting location data have gone nowhere. The occasional fines levied against such companies are a tiny fraction of the revenues made from the data in the first place. And our 20-year effort to have anything even vaguely resembling a useful federal privacy law for the internet era remains mired in gridlock thanks to a massive coalition of cross industry lobbying opposition with a near-unlimited budget.
Which means most of these companies are going to keep collecting and selling access to this data, while pretending they don’t sell access, that the data they collect is anonymous and harmless, and that absolutely any oversight or transparency requirements are unnecessary. And the parade of scandals, breaches, and abuse of this data will continue, until eventually there’s a scandal so large that the problem can no longer be cavalierly brushed aside.
Filed Under: anonymized data, location data, privacy
Comments on “'Anonymized Data' Is A Gibberish Term, And Rampant Location Data Sales Is Still A Problem”
I’m pretty sure there’s already overlap between those two groups.
… in both duties and in criminal charges.
As long as the data contains any reference or can be tied to the real world it isn’t anonymized.
TL;DR: Anonymized data isn’t.
You know if you asked these brokers/companies how they specifically anonymized the data, you would be a) ignored, b) lied to or c) denied due to ‘trade secrets’
If nothing else, I’d like to know how a data set is anonymized, what data fields are in the set and to be able to see my own ‘anonymized’ data records.
A way to do it properly is only aggregate generic statistics without information which may be used as proxies like say time of day or of course IP address. You can say "the traffic spiked Thanksgiving Weekend, 65% of it from Chrome". That of course reduces the value of the data but it has some uses.
There may be....
"a scandal so large that the problem can no longer be cavalierly brushed aside", but I doubt that there is a scandal so large that it cannot be forgotten, given enough time.
The abusers know this, so if such a scandal hits, they will orchestrate a massive "do something" runaround that doesn’t actually achieve anything and keep it going until we the people forget the original scandal.
Had to take one of those dumbass Online Security Training things at work last week.
Got marked wrong for putting "first name" under "can be used to identify an individual."
Which just tells me the test was written by somebody whose first name isn’t Thaddeus.
The first thing I would’ve asked this clown was "Did your parents give you first and last names? Did they use these names throughout your childhood? And did you answer to either of those names because you knew you were being singled out from every one else….. i.e. identified?"
If you get a deer-in-the-headlights look, or more likely, a ration of "I’m in charge, don’t question me!", tell the people who hired this asshat that he’s incompetent. (Being as this was online, substitute "company" for "he" or "him".)
Re: Re: Re:
I’ve learned not to bother.
One company I worked at (okay it was GoDaddy) my first week I had to take this online security test which, among other things, recommended that you comply with the "mixed case plus symbols" requirement by starting your password with a capital letter and ending with an exclamation point.
I e-mailed the security team to let them know this was terrible advice. Never heard back.
That was…2013, I think? I’m sure they’re not using that test anymore (because it was Flash-based), but that doesn’t mean whatever test they’re using now is any better.
Re: Re: Re: Re:
Isn’t that how everyone complies with such requirements? That, or underscores as spaces if using multiple words.
Asking people to make gibberish passwords is the terrible advice, because they won’t remember them and will have to resort to writing them down or using the same one everywhere.
That’s very interesting. How then does anyone even know the datasets aren’t totally fake?
For one statistics can give hints – for example one red flag in accounting is all digits occuring nearly an equal number of times as opposed to biased towards the "lower half" – an interaction of sums and which are more common.
Now that sounds like an excellent way to make a “fraudulent” business which can’t actually be convicted of anything fraudulent.
Pretend you have way to correlate email addresses with consumption of various products. Refuse to reveal anything about the sources, only say you’re using generative AI to analyze the source data. In reality, the data is entirely fabricated.
If you manage to sell this to a data broker, good job you have made the world just slightly better.
". . . thanks to corruption." would be much simpler. Maybe not as explanatory, but just as accurate.
Gotta disagree with you
It’s not gibberish, it is a perfectly good euphemism, like ‘retrenchment’ for ‘fired’, or ‘wet work’ for ‘murder’. Just another way of protecting delicate, snooping ears from reality.
Re: Gotta disagree with you
I’ve always liked "rejuvenation area" as an euphemism for "deforested area"…
wasnt long ago,
that the adverts I would see on Roku, and the internet, trying to show my location were off by over 40 miles, and farther in the past, it was around 100 miles.
At this time, they are within 10 miles, and even naming my town.
But who has released this data?
My router? Which has been registered in my name, and has a # in it specific to this device. which the ISP needs.
OR even the gov. forcing this data to Come out and matching all the random data. It has been shown over time that with enough Data, they can Finally figure out Enough about all of us.
What logic can we figure out that would interconnect us to the Whole system to be Easily identified. Between your router and modem, which both has a NICE intricate numbers, what would it take and Who to hack to get this?