Don't Let This Get Lost In The Shuffle: The Data Transfer Project Is Expanding, And Could Help Create Real Competition Online

from the this-is-important dept

While lots of people are angling to break up the big internet companies in the belief that will lead to more competition, we’ve long argued that such a plan is unlikely to work. Instead, if you truly want more competition you need to end the ability of these companies to lock up your data. Instead, we need to allow third parties access so that the data is not stuck in silos, but where users themselves both have control and alternative options that they can easily move to.

That’s why we were quite interested a year ago when Google, Facebook, Microsoft and Twitter officially announced the Data Transfer Project (which initially began as a Google project, but expanded to those other providers a year ago). The idea was that the companies would make it ridiculously easy to let users automatically transfer their own data (via their own control) to a different platform. While some of the platforms had previously allowed users to “download” all their data, this project was designed to be much more: to make switching from one platform to another much, much easier — effectively ending the siloing of data and (worse) the lock-in effects that help create barriers to competition. As we noted last year:

But the really important thing that this may lead to is not so much about transferring your data between one of the giant platforms, but hopefully in opening up new businesses which would allow you to retain much greater control over your data, while limiting how much the platforms themselves keep. This is something we’ve talked about in the past concerning the true power of data portability. Rather than having it tied up in silos connected to the services you use, wouldn’t it be much better if I could keep a “data bank” of my data in a place that is secure — and where if and when I want to I can allow various services to access that data in order to provide the services I want?

In other words, for many years I’ve complained about how we’ve lost the promise of cloud computing in just building up giant silos of data connected to the various online services. If we can separate out the data layer from the service layer, then we can get tremendous benefits, including (1) more end-user control over their own data (2) more competitive services and (3) less power to dominate everything by the biggest platforms. Indeed, we could even start to move towards a world of protocols instead of platforms.

So it’s good news to see the latest announcement about the project is that it’s expanding once again. While the headlines are that Apple has joined the program (to round out the biggest internet companies) it’s also notable that two other very interesting, but much smaller, players are joining as well: the federated Mastodon project and Tim Berners-Lee’s Solid, which is an attempt to build the kind of “protocols, not platforms” approach that we keep advocating for.

There are still many open questions about how well all of this will work — but if you believe in true competition among internet services this is the project to pay attention to it, as it has the highest likelihood of actually creating such competition. Plans to “break up” big tech just creates a few more data silos and effectively locks in some pre-selected (slightly smaller) giants, thanks to network effects. What the Data Transfer Project does is flip the equation. It makes it so that more competition can thrive without taking away the network effects that make the internet so powerful. It’s the most interesting, and most compelling approach to generating actual competition among internet services.

I still hope that the project goes even further in knocking down silos and opening up for competition, but it’s already quite encouraging. Of course, it got almost no attention at all because anti-trust is sexy, whereas companies opening themselves up to competition through technological means is apparently boring.

Filed Under: , , , , ,
Companies: apple, facebook, google, microsoft, solid

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Don't Let This Get Lost In The Shuffle: The Data Transfer Project Is Expanding, And Could Help Create Real Competition Online”

Subscribe: RSS Leave a comment
26 Comments
low level schlub (called so by "PaulT") says:

It's not better when mega-corporations openly conspire.

They don’t actually compete now, but have the territory divided up, exactly like the organized crime that they duplicate.

Google has search (and mobile, Android), Amazon products, Microsoft desktop, Apple their bunch of fanboys immune to either of the other OSs, and so on.

Know what cooperating corporations spell for consumers, though? — CONTROL.

Anonymous Coward says:

Re: Re:

They don’t actually compete now, but have the territory divided up, exactly like the organized crime that they duplicate.

Please show which portions of the US can and cannot access each of these services. If you can’t, then they haven’t divided up the territory and your claim is false.

Google has search

There’s also Bing, DuckDuckGo, Yahoo, Ask, Dogpile, etc…

(and mobile, Android)

You forgot to mention their major competitor, Apple. Or the fact that Android is actually free for any phone maker to slap on their phone.

Amazon products

Uh, can you be more specific?

Are we talking marketplaces? Because there’s also: Craigslist, ebay, aliexpress, walmart, etc…

Are we talking video services? Because there’s also: Netflix, Hulu, Youtube, Vimeo, etc…

Are we talking music services? Because there’s also: Google Music, Apple Music, Spotify, Pandora, etc…

Are we talking Cloud computing? Because there’s also: Google, Azure, etc…

Are we talking home devices and IoT things? Because there’s also Google, Apple, Microsoft, and a crap ton of others.

Microsoft desktop

Don’t forget Apple’s Mac OS, and 50+ different flavors of Linux.

Apple their bunch of fanboys immune to either of the other OSs

Fanboys is not evidence that there is a conspiracy, it means there’s a subset of people who really like that product. Just like there are Ford fanboys who are "immune" to every single other car manufacturer.

and so on

So you got nothing then, eh?

Know what cooperating corporations spell for consumers, though? — CONTROL.

Depends on the cooperation. Microsoft cooperating with Nintendo and Sony means cross-platform play to all consumers, i.e. less control.

Your arguments are….weird and invalid.

low level schlub (called so by "PaulT") says:

Exactly HOW am I better off with MORE corps having my details?

Besides that you’re again promising future benefits in order to BLOCK any effective action against CURRENT problems, you are simply proposing that consumers all be containerized for better exploitation by commercial interests and tracked more fully by gov’t.

You cannot be serious meaning that you think this is good for the public.

James Burkhardt (profile) says:

Re: Exactly HOW am I better off with MORE corps having my detail

Aside from the idea that its shouldn’t be an all or nothing proposition, you also should be able to retract access to that information so only the companies you need access to the information have it when they need it.

What is effective action in your eyes? The proposals seem to be to give big corporations complete control over your data and lock them in so they can legally exploit that data.

Your opposition is based on the assumption that under the distributed protocol model we see the same types of collection and exploitation seen under the current model, but that somehow goes away under a stricter silo system. Because all the issues you complain about aren’t fixed by any current enhanced privacy proposal.

Anonymous Coward says:

Re: Re: Exactly HOW am I better off with MORE corps having my de

"Retract?"

There is no "retracting" of any data. Only a promise against reusing it.

You can’t provably remove the data from FaceBook’s, or anyone else’s, server. A legal injunction would be your best option, but they can just as easily move the data and it’s processing out of any jurisdiction to render such an injunction worthless literally at the push of a button.

The only real option you have is to simply not play the game. I.e. Never give them the data in the first place. Of course there are so many idiots in the US and elsewhere who don’t care about uploading data that doesn’t belong to them, that giving no data at all is currently an impossibility.

Maybe making it a requirement during a data breach to divulge the identities of the uploaders so that affected parties can sue them for reparations might fix this. (Companies can avoid liability by claiming the individual uploaders who gave them the exposed data. It’s about the only way you’d get such a law passed Congress.) At the very least, a few big lawsuits under such a law would give those who don’t care about the privacy of others pause when hitting that submit button.

All the Data Transfer Project does is cover up the real issue. It’s not about who has your data, it’s about them having it and abusing it in the first place. All the cover up does is redistribute the data amongst the abusers, attempts to legitimize the abuse, and profits gained from said abuse, with the excuse of "well it’s OK as long as X doesn’t do it, right?", and lowers the barrier to entry. Allowing old victims to gain new abusers, and old abusers access to new victims.

Anonymous Coward says:

Re: Exactly HOW am I better off with MORE corps having my detail

Think of it this way: instead of data being locked up with the service, you’d have your own personal data broker who would hold your private/public data. You would then grant various services the right to access that data on a limited basis.

While this does mean you are opening yourself up to yet another provider to abuse/leak your data, it puts much more limited access and usage controls on the plethora of businesses that currently control all access to their piece of that data.

A side benefit here is that if one of these data brokers gets compromised somehow, the only data lost is the data that belongs to their customers; other people can continue to use all those services without being at risk — and the services have offloaded the risk of holding your PII, and so can offer their services more affordably.

This also means that if any of these services gets sold/goes bankrupt/etc. they can’t resell your data (they don’t have it) and they can’t disappear your data (they don’t have it).

And if a dedicated data broker closes down, your data management IS their service, so they can’t sell it on without your permission. You, however, can migrate your data to a new provider and remove it from the original provider during their restructuring.

While there are definitely problems with this solution, there are also MANY benefits, including holding your data offshore under a government whose rules you agree with. At that point, it doesn’t matter what country the service providers reside in, because all they can do is stop providing you service; they can’t hand your data off to local government or prevent you from accessing it.

Scary Devil Monastery (profile) says:

Re: Re: Exactly HOW am I better off with MORE corps having my de

"Think of it this way: instead of data being locked up with the service, you’d have your own personal data broker who would hold your private/public data. You would then grant various services the right to access that data on a limited basis."

Nice if it worked but fails on a few practical considerations:

1) Data storage and administration isn’t free. So we’re looking a solution which one way or the other requires the user to pay.

2) For said data to make any sense what so ever means that the platform to which you extend "gathering" rights will have to reveal the questions it asks and the answers it receives to a 3rd party commercial entity. Something which may in many jurisdictions mean a tripartite contract must be established for every platform the user wants to access.

3) The user in heaviest need of this type of service is unfortunately often the user who doesn’t really care that they’ve casually clicked their way past any expectation to privacy and data ownership with twenty different platforms.

4) The idea that you can retain your data while letting anyone else view it is absurd to begin with. You should assume that if a given piece of information is shown to anyone else that information will be public property tomorrow. The saying that "Three may keep a secret if two of them are dead" is as true today as it was when old Benjie first coined the phrase.

Christenson says:

But, badly-behaved data collectors!

I love the idea of controlling my own data….

but…what’s to stop Google/Microsoft/Apple from vacuuming up whatever passes through their hands????

Data is either private, shared between a very small number of players and encrypted elsewhere, or completely public. I don’t think there is a middle ground, just as there are don’t seem to be any ordinals between aleph-0 (the number of integers) and C (the number of reals between 0 and 1).

OTOH, if I am running a mastodon server, I’d love to hand the advertising/sales problem over to Google.

Anonymous Coward says:

Re: But, badly-behaved data collectors!

At least in the EU under GDPR, there are significantly different regulations between being a data broker, a data repository and a data service provider. What would stop GMA from grabbing it all is that they suddenly have WAY more liability and reporting requirements. If they can divest themselves of that, at least on paper, their businesses become much more efficient. So it’s in their best interests to handle the data as little as possible so that they lower their shareholder risk.

James Burkhardt (profile) says:

Re: But, badly-behaved data collectors!

Whats to stop them from vaccuming it up now? Nothing. That is literally one of the problems we face.

The goal if we move to a protocol for accessing, sorting, filtering, and displaying the data stream, we can also move to a world where that data doesn’t have to be stored on a Google or Amazon or Facebook server, but stored in a third party location including potentially my own server in my home. Where concepts like privacy can actually be points of competition. Where we can, in real time, revoke access to that third party storage.

You are right to say that once you reveal that information, its not generally private. Privacy is a trade off, if you want to use one of these services you need to provide some information. And trust will need to be that it only access the data you approve. But that trust already exists as you use that service now. Your ability to adjust where the data is stored however allows you to revoke that access in more meaningful ways.

christenson says:

Re: Re: But, badly-behaved data collectors!

Thanks. And yeah, we definitely want protocols; a single point of failure (Facebook, Google, Equifax, etc, but also common implementations and common hardware) for massive amounts of trust is a recipe for disasters.

Mason Wheeler is also right that subtle problems with the exported data just leading to more lock-in, and that’s before I get into discovering that Publius and Alexander Hamilton are the same person.

Scary Devil Monastery (profile) says:

Re: But, badly-behaved data collectors!

"but…what’s to stop Google/Microsoft/Apple from vacuuming up whatever passes through their hands???? Data is either private, shared between a very small number of players and encrypted elsewhere, or completely public. I don’t think there is a middle ground…"

Correct. Possibly due to the insane idea of copyright, a large number of people who ought to know better now believe it is possible to shout their secrets from the nearest rooftop and somehow magically keep anyone not on their dance card from hearing and retaining the information.

"Data privacy" is dead from the second you go online while leaving a discernible trail. Hence why everyone should use a VPN as standard, load their browsers with anti-fingerprinting and scriptblocking plug-ins, and realize that anything typed in will run a risk of becoming public knowledge.

Because unless the platform you’re logging on to is both aware that your data is valuable and is heavily motivated to possess good client-side security, that data is no longer private.

Anonymous Coward says:

Re: Re:

A bunch of us have been shouting "protocols instead of platforms" ever since .com was added to the Internet. Back in the 90s when Masnick started blogging, the likes of TBL and Vint Cerf as well as the Open Source movement were pretty adamant about such things. It’s why DNS is configured as it is, as well as POP3, IMAP, HTTP, and other open protocols… right down to sockets. It was all designed to be layers of protocols, such that any one protocol could be swapped out for something new without impacting the entire stack.

Then the server/client system was replaced with "the cloud" which was great in theory, but in practice was just load balanced global server silos for ALL data and protocols for a particular service. And suddenly, the only thing available to end users was service platforms with private APIs instead of protocols with services hanging off the end.

Anonymous Coward says:

Re: Re: Re:

Part of what led us to "the cloud" was the drive (starting somewhere in the 80s to 90s I reckon, and ramping up as the Great Information Security Flame War intensified) to firewall networks and take the control of which protocols could reach an endpoint out of that endpoint’s hands. This "denied by default" posture, while a good decision from a tactical, information-security standpoint, led to a situation where only a very small set of protocols are allowed to cross the "border posts" between networks. The winners in this were the widely used client-server protocols of the time: HTTP(S), POP3 ( largely replaced by IMAP now), DNS (out of sheer necessity), and FTP (in PASV mode). SSH was frequently permitted as well, but not always (due to its tunneling capability or non-inspectability compared to HTTPS); if you were some other protocol, though, you had to be prepared for being unreachable from a significant subset of endpoints (take IRC or BitTorrent for instance) or being restricted to proxied flows (SMTP, which you can only use to reach a border-guard-controlled mail transfer agent). Furthermore, the addition of secondary screening (application-layer security proxies) to these border posts made trying to "masquerade" protocols by hiding them within the normal flows of say HTTPS a dicey effort, at best.

In addition, the notion of unsolicited inbound traffic was treated as an insecure horror by these newfound border police forces (through the use of stateful firewalling and address translation), further cementing the dominance of centralized, client-server models over decentralized, "host anywhere" protocols. Furthermore, even if you had enough control over the border police to get them to send the unsolicited visitors to you (port forwarding), the relative unwillingness of ISPs to assign static addresses to consumer Internet endpoints posed an additional barrier, requiring the development and use of dynamic DNS updating over HTTP(S) as the mechanisms for this that are native to the DNS protocol will not work in a restrictive environment where DNS is proxied, or DNS servers are otherwise locked down.

With all these challenges, and the desire of users to access the services they wished coming into conflict with what the border guards were yammering about, everything got multiplexed into valid HTTP running over port 80 (later to be replaced with HTTPS over port 443), stunting the growth of new protocols to environments where a) the application could receive official blessing and paperwork from all the border posts involved (site-to-site VPN protocols, SIP), b) the protocol was never intended to cross a border to begin with (SMB, NFS, and so on, internal chat systems), or c) the protocol was either client-server OR equipped with ways to pierce address translation, AND was intended for use by end consumers only, outside of managed environments (online gaming, "olden days" chat/instant messaging).

Mason Wheeler (profile) says:

That’s why we were quite interested a year ago when Google, Facebook, Microsoft and Twitter officially announced the Data Transfer Project (which initially began as a Google project, but expanded to those other providers a year ago). The idea was that the companies would make it ridiculously easy to let users automatically transfer their own data (via their own control) to a different platform.

Have you actually looked at Facebook’s "ridiculously easy" data? I downloaded mine a few months ago, and looking at it from the perspective of a programmer, it’s garbage. It’s exactly what I would do if I wanted to set up a system specifically designed to look like openness to an unskilled outside observer (such as a politician or regulator) while being worthless for the purpose of actually enabling data transfer to a competitor.

The devil, it has been said, is in the details, and when you look at the details of the data Facebook gives you, (and what they don’t give you), you definitely see a diabolical entity emerge. The most important subtle little problem is that there are no unique identifiers.

For example, in your Friends data, it gives you the name of each Friend, and a few bits of data they’ve shared, but no username or other token that identifies them specifically. Then in your Comments data, it says which post you commented on, and the name of the person who posted it… but without a unique identifier you have no way of knowing if this Bob Smith is the same Bob Smith in your Friends list or someone else who happens to have that name.

You may say "well sure, but how likely are you to have two friends by the same name, or go commenting on someone’s post with the same name as one of your friends?" And you’d probably be right… but that’s exactly what makes this such a subtly evil problem. Because it looks just fine to any individual user, but if you try to use the data for its primary intended purpose–to facilitate competition by enabling people to move to a competing system–the lack of unique identifiers makes it impossible to reconstruct the social graph. If I’m running the MasonBook network and I import data from Dave, Fred, and Janet, and all of them have a friend named Bob Smith, I have no way to determine if they’re all friends with the same person or not.

Facebook’s "participation" in the Data Transfer Project is nothing but transparency theater, to borrow a concept from the world of security. It’s just more of the same from a company that’s never bothered to even pretend they’re not being evil.

Mike Masnick (profile) says:

Re: Re:

Have you actually looked at Facebook’s "ridiculously easy" data? I downloaded mine a few months ago, and looking at it from the perspective of a programmer, it’s garbage. It’s exactly what I would do if I wanted to set up a system specifically designed to look like openness to an unskilled outside observer (such as a politician or regulator) while being worthless for the purpose of actually enabling data transfer to a competitor.

You are talking about a different program and, in fact, explaining why THIS program is important. This is NOT about "data export." You’re talking just about the export functionality, which is exactly as you describe. A mess and useless. That’s why I’m against proposals that are just about exporting your data.

What this is is about easy one click TRANSFER of data from one service to another, without you having to see the data. This is one click "okay, I want to use those guys, not you any more". THAT is why it’s a big deal.

What you’re talking about is something different, and the reason why this project is so important.

Christenson says:

Subtle interference....

I still see quite a bit of merit in the idea that Facebook has a bunch of incentives to subtly interfere with data transfer….

and disambiguating John Smiths #1 through #4 and remembering that one of them calls themselves Jane Smith on the new platform is a non-trivial problem….

OTOH, if Techdirt’s leading edge technical opinion is the future, and objections to the current FTC settlement are the trend, then the giants have strong incentives to make this work, particularly if they can exert the sort of control Google exerts over Android. These organizations aren’t "old". They can afford to hedge their bets.

Google is also powered by a better page rank algorithm, for users, and targeted advertising that, per Techdirt, is no more cost-effective than untargeted or very broadly targeted advertising, so only by opening up their platform can they find a way to stay on top of their money source.

Anonymous Coward says:

Easy data transfer only theoretically does away with those network effects, though. Let’s say Social Media Platform X does something shitty. So shitty that you want to move to alternative Social Media Platform Y. So you move, but nobody else that you know moves. Your friends don’t move over, none of your close or extended family move over, the company that you work for doesn’t move over, etc. because they actually don’t give a fuck and choose to permanently use what they’ve become accustomed to.

The Data Transfer Project can only enable competition if a large enough number of people are fully conscious of their newfound abilities to easily transfer their data, and if those people are willing to make the jump and accept some change. I don’t trust the mostly-tech-illiterate masses to care enough.

christenson says:

Re: network effects

I agree in part, and disagree in part, as follows:
If you have truly open protocols, I can be on my mastodon implementation, and my friend can hang out of Facebook.

The software can make it clear that when I send something to my friend, I have crossed a trust boundary, but otherwise make it seamless…just like my web browser goes to all kinds of websites. There’s also the possibility that my mastodon instance aggregates the Facebook access, so it’s not entirely clear to Facebook the access to my friend’s account is coming from me.

But a deep problem on two levels:
a) how to make the situation I described possible.
b) how not to get non-technical individuals overwhelmed with the access control details.
[ I work for a company with some very complicated role-based permission structures, and the result ends up being a few of us just get whatever permissions we ask for. It really doesn’t work.]

Anonymous Coward says:

Re: Re: network effects

Would you like me to give you a formula for success? It’s quite simple, really: Double your rate of failure. You are thinking of failure as the enemy of success. But it isn’t at all. You can be discouraged by failure or you can learn from it, so go ahead and make mistakes. Make all you can. Because remember that’s where you will find success.

Anonymous Coward says:

Call me a skeptic

This whole thing sounds like theater, nothing more than smoke and mirrors to make it seem that the industry is "open" to fend off the latest round of anti-technology luddites.

Data exchange and transfer are antithetical to the corporate world where "lock-in" is king. There have been a handful of pseudo-standards over the years meant to enable exchange of information between disparate systems but none have been successful. One of the latest examples is SCIM, now on version 2, which has seen the mildest of adoption and only by those companies interested in helping transfer data into their systems. You can become "SCIM Compliant" by implementing only the client side.

I work in the data synchronization industry where the goal is to transfer data such as that contained in Facebook’s and others’ silos into other systems and (optionally) back again; Keeping data in sync between disparate systems. This industry exists because no single platform vendor has done anything to enable this on their own. The industry will continue to exist even if this "data export" becomes a real thing (largely because social platforms are not something our customers care about) but I have a very hard time believing these companies will willingly build anything that actually works as advertised and sends their precious data to a competitor.

The data, after all, is their product and their most valuable asset. If everyone has the same data then there is no competitive advantage. If anything, perhaps this whole dog and pony show is partly due to the proliferation of this data in recent years and an admission that everyone already has the data anyway.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...
Loading...