Privacy Laws Giving Big Internet Companies A Convenient Excuse To Avoid Academic Scrutiny
from the that's-unfortunate dept
For years we’ve talked about how the fact that no one really understands privacy, leads to very bad attempts at regulating privacy in ways that do more harm than good. They often don’t do anything that actually protects privacy — and instead screw up lots of other important things, from competition to free speech. In fact, in some ways, there’s a big conflict between open internet systems and privacy. There are ways to get around that — usually by moving the data from centralized silos out towards the ends of the network — but that’s rarely happening in practice. I mean, going back over thirteen years ago, we were writing about the inherent conflict between Facebook’s (then) open social graph and privacy. Yet, at the time, Facebook was cheered on for opening up its social graph. It was creating a more “open” internet, an internet that others could build upon.
But, of course, over the years things have changed. A lot. In 2018, after the Cambridge Analytica scandal, Mark Zuckerberg more or less admitted that the world was telling Facebook to lock everything down again:
I do think early on on the platform we had this very idealistic vision around how data portability would allow all these different new experiences, and I think the feedback that we?ve gotten from our community and from the world is that privacy and having the data locked down is more important to people than maybe making it easier to bring more data and have different kinds of experiences.
As we pointed out in response — this was worrisome thinking, because it would likely take us away from a better world in which the data is more controlled by end users. Instead, so many people have now come to think that “protecting privacy” means making the big internet companies lock down our data rather than the much better approach which would be giving us full control over our own data. Those are two different things, that only sometimes look alike.
I say all of that as preamble in suggesting people read an excellent Protocol article by Issie Lapowsky, which — in a very thoughtful and nuanced way — highlights the unfortunate conflict between academic researchers trying to study the big internet companies and the companies’ insistence that they need to keep data private. We’ve touched on this topic before ourselves, in covering the still ongoing fight between Facebook and NYU regarding NYU’s Ad Observer project.
That project involves getting individuals to install a browser extension that shares data back to NYU about what ads the user sees. Facebook insists that it violates their privacy rules — and points to how much trouble it got in (and the massive fines it paid) over the Cambridge Analytica mess. Though, as we explained then, the scenarios are quite different.
Lapowsky’s article goes further — noting how Facebook told her that the Ad Observer project was collecting data without the user’s permission, which worried the PhD student who was working on the project. It turns out that was false. The project only collects data from the user who installs it and agrees (giving permission) to collect the data in question.
But the story and others in the article highlight an unfortunate situation: the somewhat haphazard demands on the big internet companies to “protect privacy” are now providing convenient excuses to those same companies to shut down academic research on those companies and their practices. In some cases there are legitimate concerns. For example, as the article notes, there were concerns about how much Facebook is willing to share regarding ad targeting. That information could be really important for those studying disinformation or civil rights issues. But… it could also be used in nefarious ways:
Facebook released an API for its political ad archive and invited the NYU team to be early testers. Using the API, Edelson and McCoy began studying the spread of disinformation and misinformation through political ads and quickly realized that the dataset had one glaring gap: It didn’t include any data on who the ads were targeting, something they viewed as key to understanding advertisers’ malintent. For example, last year, the Trump campaign ran an ad envisioning a dystopian post-Biden presidency, where the world is burning and no one answers 911 calls due to “defunding of the police department.” That ad, Edelson found, had been targeted specifically to married women in the suburbs. “I think that’s relevant context to understanding that ad,” Edelson said.
But Facebook was unwilling to share targeting data publicly. According to Satterfield, that could make it too easy to reverse-engineer a person’s interests and other personal information. If, for instance, a person likes or comments on a given ad, it wouldn’t be too hard to check the targeting data on that ad, if it were public, and deduce that that person meets those targeting criteria. “If you combine those two data sets, you could potentially learn things about the people who engaged with the ad,” Satterfield said.
Legitimate concern… but also allows the company to shield data that could be really useful to academics. Of course, it doesn’t help that so many people are so distrustful of these big companies that no matter what they do it will be portrayed — sometimes by the very same people — as evil. It was just a few weeks ago that we saw people screaming both about the big internet companies willing to cave in and pay Rupert Murdoch the Australian link tax… and when they refused to. Both options were painted as evil.
So, sharing data will inevitably be presented by some as violating people’s privacy, while not sharing data will be presented as hiding from researchers and trying to avoid transparency. And there’s probably some truth in every angle to these stories.
Of course, that all leaves out a better approach that these companies could do: give more power to the end users themselves to control their own data. Let the users decide what data is shared and what is not. Let the users decide where and how that data is stored (even if it’s not on the platform itself). But, instead, we just have people yelling about how these companies both have to protect everyone’s privacy and give access to researchers to see what they’re doing with all this data. I don’t think the “middle ground” laid out in the article is all that tenable. Right now it’s just to basically create special exceptions in which academics are “allowed” — under strict conditions — to get access to that data.
The problem with that framing is that the big internet companies still end up in control of the data, rather than the end users. The situation with NYU seems like a perfectly good example. Facebook shouldn’t have to share data from people who don’t consent, but with the Ad Observer, it’s all people who are actually consenting to handing over their own data, and Facebook shouldn’t be in the business of blocking that — even if it’s inevitable that some reporter at some future date will try to spin that into a story claiming that Facebook “violated” privacy because these researchers convinced people to turn over their own info.