It Took Just 5 Minutes Of Movement Data To Identify 'Anonymous' VR Users

from the no-such-thing-as-anonymous dept

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is “anonymized” — or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if “anonymizing” your data means something.

The latest case in point: new research out of Stanford (first spotted by the German website Mixed), found that it took researchers just five minutes of examining the movement data of VR users to identify them in the real world. The paper says participants using an HTC Vive headset and controllers watched five 20-second clips from a randomized set of 360-degree videos, then answered a set of questions in VR that were tracked in a separate research paper.

The movement data (including height, posture, head movement speed and what participants looked at and for how long) was then plugged into three machine learning algorithms, which, from a pool of 511 participants, was able to correctly identify 95% of users accurately “when trained on less than 5 min of tracking data per person.” The researchers went on to note that while VR headset makers (like every other company) assures users that “de-identified” or “anonymized” data would protect their identities, that’s really not the case:

“In both the privacy policy of Oculus and HTC, makers of two of the most popular VR headsets in 2020, the companies are permitted to share any de-identified data,? the paper notes. ?If the tracking data is shared according to rules for de-identified data, then regardless of what is promised in principle, in practice taking one?s name off a dataset accomplishes very little.”

If you don’t like this study, there’s just an absolute ocean of research over the last decade making the same point: “anonymized” or “de-identified” doesn’t actually mean “anonymous.” Researchers from the University of Washington and the University of California, San Diego, for example, found that they could identify drivers based on just 15 minutes? worth of data collected from brake pedal usage alone. Researchers from Stanford and Princeton universities found that they could correctly identify an “anonymized” user 70% of the time just by comparing their browsing data to their social media activity.

The more data that’s available to researchers (or corporations or governments), the easier it is to identify you. And with hacks, data leaks, and breaches dumping an endless ocean of existing datasets into the public domain, and no serious rules of the road governing things like the collection of location and other sensitive data, it shouldn’t be too hard to see how the idea of “privacy” is a myth. Especially if the company is, say, Facebook, which is now tying your entire online Facebook experience to VR whether you like it or not.

It’s all something to keep in mind for whenever the U.S. gets off its ass and finally crafts a meaningful privacy law for the internet era. Especially given that “don’t worry, your data is anonymized!” will be an endless refrain by industry as they try to ensure any rules are as feeble as possible.

Filed Under: , , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “It Took Just 5 Minutes Of Movement Data To Identify 'Anonymous' VR Users”

Subscribe: RSS Leave a comment
This comment has been deemed insightful by the community.
Upstream (profile) says:

Don't hold your breath

…for whenever the U.S. gets off its ass and finally crafts a meaningful privacy law for the internet era.

This may never happen because:

The more data that’s available to researchers (or corporations or governments), the easier it is to identify you.

Anonymous Coward says:

Re: Don't hold your breath

no serious rules of the road governing things like the collection of location and other sensitive data.

Will never happen. There’s too much money to be made for the corps to stop doing it on their own. Even if there wasn’t, governments around the world would ensure that the price would go up to keep the tap wide open. If for nothing else so that governments can have their scapegoat when the plebs get angry about it, like they do every once in a blue moon.

If you were actually serious about stopping it, the first thing would be a general ban on data collection in consumer products. No, I don’t care about the "experience" needing to be optimized, or development feedback. You idiots got greedy and now it’s time to take your toys away.

Another rule: General ban on using consumer devices to auction off eyeballs. That should have never been permitted in the first place. Using the visitor’s browser as a bot to make money should have died a quick death. Both in how much it slows down page loading and the fact the resources are being stolen from them without compensation. I don’t think everyone visiting Walmart in-person would consent to whoring themselves out for 5-30 minutes (because humans are slower than computers) to advertisers at the edge of each isle just to enter one. No one should be allowed to demand that of virtual visitors.

You can’t do anything about the data that is already out there, as such there’s no ban on selling the info they already have. It would be a never ending endeavor for the courts to try and stop it. But the data collection can be observed, and therefore stopped, on consumer devices.

the idea of "privacy" is a myth.

It’s only a myth because people don’t give a shit about their own safety. Let alone anyone else’s. They don’t care if someone else is in that photo they sent to Facebook. They are selfish and believe that the person should be honored to be seen on their account. They don’t care if someone could use all of the tweets they’ve made to figure out their lifestyle and falsely portray themselves to take advantage of them. They have to post that location update. My employees don’t wanna pay for tracking their location, audio / video recordings, and time using my workplace app? Well, I need better employees then. Poisoning? Nope, they have to post that food pic. Theft? Nope, gotta post about the fact they left the front door unlocked and how funny it is. Rape? Nope, gotta post about going out to get drunk right now, at this specific bar, alone. Murder? Nope they actually took the damn cellphone with them and actually asked Siri where to bury the body, how to destroy evidence, etc.

Go ahead and try to avoid these things where you have input, you’ll find out just how selfish society really is. Hell, some of them may even try to re-educate you, or worse, punish you over it.

Bruce C. says:

The irony of it all...

On the internet, everyone who gets your data stream knows you’re a dog.

I’m still trying to decide whether law enforcement should have access to this kind of info. On the one hand, it’s a huge government intrusion, on the other hand it would (eventually) allow courts to be more strict about data for probable cause and search warrants and get rid of the "my years of experience" justification when not backed by data.

That One Guy (profile) says:

'You first.'

Anyone who tries to argue that data like that is anonymous or not a privacy concern because it will be ‘anonymised’ should be presented with a ‘put up or shut up’ challenge, where they either admit that they’re wrong or lying or have their data given that treatment and then poured through by a third party to see how much they could learn about them from ‘anonymous’ data, which would show that they are wrong or lying.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...