The Decentralized Web Could Help Preserve The Internet's Data For 1,000 Years. Here's Why We Need IPFS To Build It.

from the protocols-not-platforms dept

The internet economy runs on data. As of 2019, there were over 4.13 billion internet users generating more than 2.5 quintillion bytes of data per day. By the end of 2020, there will be 40 times more bytes of data than there are stars to observe in space. And all of this data is powering a digital revolution, with the data-driven internet economy already accounting for 6.9% of U.S. GDP in 2017. The internet data ecosystem supports a bustling economy ripe with opportunity for growth, innovation, and profit.

There’s just one problem: While user-generated data is the web’s most valuable asset, internet users themselves have almost no control over it. Data storage, data ownership, and data use are all highly centralized under the control of a few dominant corporate entities on the web, like Facebook, Google, and Amazon. And all that data centralization comes at an expensive cost to the ordinary internet user. Today’s internet ecosystem, while highly profitable for a few corporations, creates incentives for major platforms to exercise content censorship over end-users who have nowhere else to go. It is also incompatible with data privacy, insecure against cybercrime and extremely fragile.

The web’s fragility in particular presents a big problem for the long-term sustainability of the web: we’re creating datasets that will be important for humanity 1000 years from now, but we aren’t safeguarding that data in a way that is future-proof. Link rot plagues the web today, with one study finding that over 98% of web links decay within 20 years. We are exiting the plastic era, and entering the data era, but at this rate our data won’t outlast our disposable straws.

To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data. The web that we deserve isn’t the centralized web of today, but the decentralized web of tomorrow. And the decentralized web of tomorrow will need to last the next 1,000 years, or more.

Our team has been working for several years to make this vision of a decentralized web a reality by changing the way that apps, developers, and ordinary internet users make and share data. We couldn’t be doing this today without the InterPlanetary File System (IPFS)—a crucial tool in our toolbox that’s helping us tear down the major technological hurdles to building a decentralized web. To see why, we need to understand both the factors driving centralization on the web today, and how IPFS changes the game.

In fact, I want to make a bold prediction: in the next one to two years, we’re going to see every major web-browser shipping with an IPFS peer, by default. This has already started with the recent announcement that Opera for Android will now support IPFS out of the box. This type of deep integration is going to catalyze a whole range of new user and developer experiences in both mobile and desktop browsers. Perhaps more importantly, it is going to help us all safeguard our data for future net-izens.

Here’s how:

With the way the web works now, if I want to access a piece of data, I have to go to a specific server location. Content on the internet today is indexed and browsed based on where it is. Obviously, this method of distributing data puts a lot of power into the hands of whoever owns the location where data is stored, just as it takes power out of the hands of whoever generates data. Major companies like Google and Amazon became as big as they are by assuming the role of trusted data intermediaries, routing all our internet traffic to and through their own central servers where our data is stored.

Yet, however much we may not like “big data” collecting and controlling the internet’s information, the current internet ecosystem incentivizes this kind of centralization. We may want a freer, more private and more democratic internet, but as long as we continue to build our data economy around trusted third-party intermediaries who assume all the responsibilities of data storage and maintenance, we simply can’t escape the gravitational pull of centralization. Like it or not, our current internet incentives rely on proprietary platforms that disempower ordinary end users. And as Mike Masnick has argued in his essay "Protocols, Not Platforms: A Technological Approach to Free Speech", if we want to fix the problems with this web model, we’ll have to rebuild the internet from the protocol layer up.

That’s where IPFS comes in.

IPFS uses “content-addressing,” an alternative way of indexing and browsing data that is based, not on where that data is, but on what it is. On a content-addressable network, I don’t have to ask a central server for data. Instead, the distributed network of users itself can answer my data requests by providing precisely the piece of data requested, with no need to reference any specific storage location. Through IPFS, we can cut out the data intermediaries and establish a data sharing network where information can be owned by anyone and everyone.

This kind of distributed data economy undermines the big data business model by reinventing the incentive structures of web and app development. IPFS makes decentralization workable, scalable and profitable by putting power in the hands of end users instead of platforms. Widespread adoption of IPFS would represent the major upgrade to the web that we need to protect free speech, resist surveillance and network failure, promote innovation, and empower the ordinary internet user.

Of course, the decentralized web still needs a lot of work before it is as user-friendly and accessible as the centralized web of today. But already we’re seeing exciting use cases for technology built on IPFS.

To get us to this exciting future faster, Textile makes it easier for developers to utilize IPFS to its full potential. Some of our partners are harnessing the data permanence that IPFS enables to build immutable digital archives that could withstand server failure and web decay. Others are using our products (e.g., Buckets) to deploy amazing websites, limiting their reliance on centralized servers and allowing them to store data more efficiently.

Textile has been building on IPFS for over three years, and the future of our collaboration on the decentralized web is bright. To escape the big data economy, we need the decentralized web. The improvements brought by IPFS, release after release, will help make the decentralized web a reality by making it easier to onboard new developers and users. As IPFS continues to get more efficient and resilient, its contribution to empowering the free and open web we all deserve will only grow. I can’t wait for the exponential growth we’ll see as this technology continues to become more and more ubiquitous across all our devices and platforms.

Carson Farmer is a researcher and developer with Textile.io. Twitter: @carsonfarmer

Filed Under: archiving, decentralized web, ipfs, platforms, preserving data, protocols


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • icon
    Anonymous Anonymous Coward (profile), 5 May 2020 @ 3:39pm

    I like this idea, but...

    Question, how will IP maximalists feel about this development? My feeling is that they will make sure all their IP is sequestered on their servers. Quite possibly to their disadvantage.

    Of course it will make 'taking down' things they claim are their IP more difficult as there won't be any easy 'place' to go after it. To that end, I bet they fight this advancement tooth and nail, even to the point of trying to buy legislation against it.

    "Your file, and all of the blocks within it, is given a unique fingerprint called a cryptographic hash."

    Another question is what is the relationship between the unique fingerprints and IP addresses?

    reply to this | link to this | view in chronology ]

    • icon
      Scary Devil Monastery (profile), 6 May 2020 @ 2:10am

      Re: I like this idea, but...

      "Question, how will IP maximalists feel about this development?"

      How do you think? From the pov of the incumbent gatekeepers the ideal situation is one where every book and media recording in the world is burned at six month intervals to make room for the Next Big Thing.

      Old accessible entertainment and media is a competitor. It's that simple.

      reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 4:29pm

    I don't get it.

    If this is more private, how exactly is changing the location of my information from "Google's servers" to "an arbitrary number of stranger's devices" better?

    Presumably the owner of the data would also gatekeep access to their own information, but through what mechanism would they be able to recover access to their data in the inevitable event that a criminal phishes the end-user for the key, or their computer dies, or any number of other situations that happens that exclusive access to someone's key-holding device (presumably their PC and/or phone) is compromised?

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 6 May 2020 @ 1:09am

      Re:

      "If this is more private, how exactly is changing the location of my information from "Google's servers" to "an arbitrary number of stranger's devices" better?"

      I think you're looking at the wrong definition of "private". The idea is not make things private in terms of who can access the data, the idea is to move the data from corporate control to control by private entities.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 12:59pm

        Re: Re:

        Corporations are private entities.

        reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 6 May 2020 @ 2:22pm

          Re: Re: Re:

          Wrong, corporations are public entities owned by the public, that is shareholders.

          reply to this | link to this | view in chronology ]

          • icon
            Ehud Gavron (profile), 6 May 2020 @ 6:02pm

            Re: Re: Re: Re:

            Wrong, corporations are public entities owned by the public, that is shareholders.

            Please don't be rude while displaying arrogance and a lack of understanding.

            Corporations are entities which allow some portion to be owned by non-qualified public investors. They are not "public" in the sense that they are owned by "the public" such as a utility.

            Shareholders -- depending on the stock -- may have zero voting, zero dividends, and zero rights -- except, of course to receive voluminous writings and to resell their shares, hopefully for more than they paid for them.

            Don't confuse a corporation with shares on the market which we typically do call "a public company" with "a public entity owned by the public" -- which does not exist.

            Try starting your answer with something you wouldn't use on your wife, kids or friends. "WRONG!!! BZZT!!!" isn't a good introduction.

            E

            reply to this | link to this | view in chronology ]

            • identicon
              Anonymous Coward, 6 May 2020 @ 11:11pm

              Re: Re: Re: Re: Re:

              There are different types of corporations. Some are private, some are public, some are nontraditional in the US and arranged around a charter rather than a different form of regulatory filing.

              It is a complex and boring area of domestic and international law.

              reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 4:37pm

    A number of things wrong with this

    Data storage, data ownership, and data use are all highly centralized under the control of a few dominant corporate entities on the web, like Facebook, Google, and Amazon.

    This is simply not true, apart from the questionable inclusion of Facebook. Yes, if you post something to Facebook that data is stored on Facebook's servers and they maintain much of the control over that data. You, as the poster, are still free to delete or modify it so not all control is lost. But bigger than that, is anything, anything at all, posted to Facebook worth preserving for future generations?

    As for Google, they merely index data found around the internet. They don't house it. The data owner still has their data wherever they chose to put it and remains in complete control of it. Google's search engine came about because what we had before that, shared cross-linking to other sites, was trash. All Google search does in index all of that data so it is easier to find than following a poor chain of shared links. While Google may be a behemoth, it is such because they make a ton of money selling ads which all those data providers put all over their own sites as a way of making money, shared with Google. It is not huge because of some imagined data theft and storage.

    Amazon's AWS and related services are nothing more than server space that people rent in order to host their own data. Amazon doesn't own that data at all. And people are free to choose from countless other hosting options and even host their data on a server in their own closet.

    This entire basis argument is a farce.

    The web’s fragility in particular presents a big problem for the long-term sustainability of the web: we’re creating datasets that will be important for humanity 1000 years from now, but we aren’t safeguarding that data in a way that is future-proof.

    The only fragility in the web lies in the DNS system which is actually fairly robust. As for preserving data, well, that's an entirely different problem to that posed by this article. Data preservation, by definition, means backing it up and storing it in (surprise!) centralized data stores. Copyright law could easily stand in the way of such an effort as it has for the Internet Archive. Whatever the solution to this problem, data redundancy is inherently and diametrically opposed to data owners retaining full control of their data. It's directly antithetical.

    To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data.

    Again, users are already in control of their data, at least any data that matters. You might argue that Twitter discussions/debates with politicians are important to preserve for posterity but that's easily answered by noting all of the news outlets that cheerfully capture those tweets in articles that will still be around, barring a data backup error, for a very long time.

    And again, data storage is already, as it has always been, fully decentralized. The only thing this article seems to be legitimately advocating for is a different kind of search engine. While that probably has some value, none of the arguments supporting this assumed need are at all convincing or even related to the proposed solution.

    The article reads like "I need a new car because astrophysics is purple." Sure, maybe you do need a new car but wtf?

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 5 May 2020 @ 4:46pm

      Re: A number of things wrong with this

      Another thought:

      I challenge the author of this article to post it to slashdot. I'm willing to bet that not only will all of the above be repeated there (in far greater detail, both technical and savagely insulting) but the article will also be shredded as shameless self-promotion. If a technical article doesn't survive exposure there it shouldn't be posted elsewhere and the author should probably sit down for some deep self-reflection and a complete rethink of the idea.

      reply to this | link to this | view in chronology ]

    • identicon
      Molly, 6 May 2020 @ 1:53am

      Re: A number of things wrong with this

      You've got some serious blinders on about the problems the web is facing today. "The only fragility in the web lies in DNS" is sadly far from true - just look at all the regularly broken links as data shifts location, all the tools that become unusable once an owner pushes an update, our inability to track and version changes to data we care about, and the centralized nature of most apps and tools we use. These are serious flaws and brittleness. Using content-addressed data, and using peer-to-peer networks for direct collaboration between devices without central middlemen, helps alleviate that central control and dependence. See Juan Benet talk about this more, and the underlying technology at play here: https://www.youtube.com/watch?v=2RCwZDRwk48&list=PLuhRWgmPaHtSgYLCqDwnhsQV6RxKDrkqb&index=2

      I think you take "Google" at very face value (aka, the search engine) - but you neglect to account for the pervasive suite of centralized tools and services offered (Gmail, Docs, YouTube, Hangouts, Chrome), all of which store their application data (and metadata about your ads profile) on central Google servers. Having worked on tools AT Google, architecting these products to work offline and peer-to-peer is infeasible because of the gravity this centralized model exerts. Better tools can be built on a more resilient and flexible system.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 8:13am

        Re: Re: A number of things wrong with this

        all the regularly broken links as data shifts location

        Existing search engines are pretty good at picking up the new data (less so about forgetting the old). While IPFS might present an alternate means of locating that moved data it's not likely much faster than existing technology. This is a minor improvement, at best.

        all the tools that become unusable once an owner pushes an update

        This is a problem, for sure, but it's not a problem with the internet, it's a problem with product development. IPFS won't do a damn thing to resolve this. And thanks to greed, nothing else likely will either.

        our inability to track and version changes to data we care about

        This is antithetical to data retention by the owner and therefore not in line with the article author's goals. However, absolutely nothing is stopping you from doing this right now. There is no "inability" here.

        the centralized nature of most apps and tools we use

        This is just point #2 repeated in different words.

        Gmail, Docs and YouTube are all solutions the public wanted as replacements for equivalent tools that are slightly more cumbersome to manage. For example, I still use a non-gmail client and manage my own email servers. I also use LibreOffice instead of Docs despite the crappy UX. I don't really care about the videos (which could easily be hosted on a home server instead of Google's but people would rather try to make some cash from Google's ads).

        You're not describing problems with the internet. The article offers an architecture-level solution to a problem that doesn't exist and all of the arguments in defense of this idea talk about high-level issues that have nothing to do with the internet itself, only with it's users. And many or most of those arguments are just wrong. Good luck changing the users. IPFS is certainly not the solution to that problem.

        reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 6 May 2020 @ 8:51am

          Re: Re: Re: A number of things wrong with this

          Existing search engines are pretty good at picking up the new data

          That does not solve the broken link problem. IPFS does, because the data is identified by a hash, and not a location. Give me the data known as.... works when the data has been moved, while give me the data at link.... fails.

          reply to this | link to this | view in chronology ]

          • identicon
            Anonymous Coward, 6 May 2020 @ 11:07am

            Re: Re: Re: Re: A number of things wrong with this

            That does not solve the broken link problem. IPFS does

            If someone is still sharing the data. Who's never run into an unseeded torrent?

            because the data is identified by a hash, and not a location

            Will people be willing to publish under this model? Companies love to surround their pages with navigation bars, related headlines, and other shit not part of the content being requested. Not to mention post-publication editing, which might even be required for libel-related reasons.

            reply to this | link to this | view in chronology ]

            • identicon
              Anonymous Coward, 6 May 2020 @ 2:25pm

              Re: Re: Re: Re: Re: A number of things wrong with this

              Publishing under this model, and initial searching will be no different that today. You search etc. and find an article, and use cut and paste to create a link, be that a URL or a hash.

              reply to this | link to this | view in chronology ]

              • identicon
                Anonymous Coward, 6 May 2020 @ 5:22pm

                Re: Re: Re: Re: Re: Re: A number of things wrong with this

                Publishing under this model, and initial searching will be no different that today.

                If you follow a link to a document identified by hash, it's not going to be able to show you the site's top 10 articles as of that moment or whatever else they want to push. It will be more like a newspaper, where everyone sees the same thing. If there's an ad, it'll be the same for everyone, unless they're using some hybrid model that pulls that via a non-hash protocol.

                reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 6 May 2020 @ 9:28am

      Re: A number of things wrong with this

      Again, users are already in control of their data, at least any data that matters.

      That’s just silly. I don’t think anyone is in a position to decide who’s data matters. That the whole point.
      Do you really think users have more control over their data and metadata than google or Facebook? If either of them censor my content, do I have control over that?

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 9:39am

        Re: Re: A number of things wrong with this

        If either of them censor my content, do I have control over that?

        No but you should still have the content and can post it elsewhere. They are not publishers and do not demand control over your content, they just have the right to not show it on their site.

        Why do you keep on demanding that they display your posters in their window?

        reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 6 May 2020 @ 5:25pm

          Re: Re: Re: A number of things wrong with this

          Why do you keep on demanding that they display your posters in their window?

          Where did you see a demand? Things like IPFS will avoid the need for that. They are the "elsewhere" you mentioned.

          reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 4:40pm

    In fact, I want to make a bold prediction: in the next one to two years, we’re going to see every major web-browser shipping with an IPFS peer, by default.

    That is indeed bold. I heard a similar prediction about Tor during the Snowden days. Didn't happen. I don't really get what's special about IPFS. Seems pretty similar to Freenet from 2 decades ago. Despite the claim IPFS is "more private", I see no information on the site about that. It looks kind of like BitTorrent to me, and it's well known that strangers can see what you download over BT.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 5 May 2020 @ 4:51pm

      Re:

      There's no specific claim that IPFS is more private here, or elsewhere. IPFS (the protocol) is not more private. The data you add to IPFS can be encrypted or not. The real benefit/claim here is about control IMO.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 5 May 2020 @ 8:25pm

        Re: Re:

        The quote is "To build a stronger, more resilient and more private internet, we need to decentralize the web by putting users back in control of their data. … Our team has been working for several years to make this vision of a decentralized web a reality by changing the way that apps, developers, and ordinary internet users make and share data."

        And you're right, they never claim IPFS achieves or even attempts to achieve that goal. Seems misleading for them to repeatedly talk up such goals in the context of IPFS, without saying which ones IPFS does or does not try to achieve.

        reply to this | link to this | view in chronology ]

        • icon
          frank87 (profile), 5 May 2020 @ 11:32pm

          Re: Re: Re:

          That's private in contrast to corporate. Not private as in privacy.

          reply to this | link to this | view in chronology ]

          • identicon
            Anonymous Coward, 6 May 2020 @ 10:07am

            Re: Re: Re: Re:

            That's private in contrast to corporate. Not private as in privacy.

            Since when are there degrees of "private" in that sense? Also, the word "privacy" does appear explicitly:
            "Today’s internet ecosystem … [is] incompatible with data privacy"

            reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 6 May 2020 @ 1:58am

          Re: Re: Re:

          You can see more about Textile Threads here (https://docs.textile.io/threads/introduction/) - which handle data encryption on IPFS.

          reply to this | link to this | view in chronology ]

          • identicon
            Anonymous Coward, 6 May 2020 @ 10:13am

            Re: Re: Re: Re:

            Encryption is only part of privacy. Techdirt.com uses HTTPS, but people can still see I'm accessing the site. Given all the embedded media, they might get a good idea of which stories I'm reading. Timing might reveal which comments are mine.

            Perhaps I'm just dense, but I don't understand from that page how Techdirt could publish stories and accept comments while protecting the privacy of readers and commenters. There's mention of relaying, but not necessarily onion-relaying.

            reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 5:30pm

    if I read this right

    Isn't it just basically a slightly more polished version of Freenet, thats been around for 20 years?

    reply to this | link to this | view in chronology ]

    • icon
      Scary Devil Monastery (profile), 6 May 2020 @ 2:16am

      Re: if I read this right

      "Isn't it just basically a slightly more polished version of Freenet, thats been around for 20 years?"

      More or less. A few bells and whistles have been added but essentially it's still the same idea.

      At some point it may become convenient enough to work but until that point I wouldn't hold my breath. The current internet infrastructure isn't going to support IFPS as a scalable long-term solution until every PC user both uses IFPS and physically owns a part of the solution rendering long-term storage viable.

      I can only imagine the flood of synchronization traffic trying to compensate for billions of users continually switching out their PC's and changing their hard drives...

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 11:15am

        Re: Re: if I read this right

        The current internet infrastructure isn't going to support IFPS as a scalable long-term solution until every PC user both uses IFPS and physically owns a part of the solution rendering long-term storage viable.

        This isn't really an infrastructure problem. I'm sure this could work with 10% of people, maybe much less, running a server—to the extent it works at all. As you hint, bad algorithms are a still-unsolved problem. Lots of people have come up with stuff that works in labs or in small groups of dedicated users.

        It's worth keeping in mind that this doesn't have to be perfect, it only has to be better than what exists now, which is many ways isn't great. Things are reliable for popular and recent content, whereas the success rate for accessing years-old links is abysmal. Freenet was mostly worse, long ago when I tried. BitTorrent is better for bandwidth efficiency and speed, with recent content; but it's worse for privacy, and stuff disappears faster than with the web.

        reply to this | link to this | view in chronology ]

        • icon
          Scary Devil Monastery (profile), 7 May 2020 @ 12:56am

          Re: Re: Re: if I read this right

          "I'm sure this could work with 10% of people, maybe much less, running a server—to the extent it works at all. As you hint, bad algorithms are a still-unsolved problem. Lots of people have come up with stuff that works in labs or in small groups of dedicated users."

          10% is a lot unless you've got the sort of ubiquitous penetration of the online population that, say, microsoft has.

          For this sort of decentralized private network to scale well you need a lot of changes in how the average online users participates in the general network. IFPS needs to be piggybacked on top of multiple widespread applications used by everyone, for instance. Historically that's never worked well until MS managed to integrate xbox live with windows 10.

          Secondly you need sufficient amounts of people to donate significant amounts of hard drive space - and there we hit the freenet issue where the solution simply will not scale. You don't need the storage space sufficient to store all data on the internet - you need multiple times that storage space to ensure redundancy with individual users filling the role of hard drives in a RAID 6+ array.

          "It's worth keeping in mind that this doesn't have to be perfect, it only has to be better than what exists now, which is many ways isn't great."

          The question being whether we even can realistically shoot for something significantly better until every netizen has become, effectively, a server park all their own. Sure, storage has dropped in price a lot and most people have more capacity than they use...but not enough to make any sort of difference.

          Until that is significantly changed IFPS is simply a new type of freenet - useful for filesharers perhaps, and for the top layer of media considered "interesting" or new. But not a backup option for the net as a whole.

          As you implied, scaling is what usually buries initiatives like these. In theory it's genius. In practice the world still isn't ready.

          reply to this | link to this | view in chronology ]

  • icon
    Cpt Feathersword (profile), 5 May 2020 @ 5:34pm

    How is this different from BitTorrent?

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 7:45pm

    Seems more likely there will be an extinction than 1000 years of anything if you understand the trajectory of web architecture.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 5 May 2020 @ 8:02pm

    Found while RTFM: pinning services.
    So to ensure persistent availability of important data, you'll pay some data center to keep it around for you. How is this different from today?

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 5 May 2020 @ 8:45pm

      Re:

      you'll pay some data center to keep it around for you. How is this different from today?

      If that server goes down, your computer may automatically be able to locate the data elsewhere (on another server, or on a peer).

      I wonder why they say "your" data. Does anything stop me (or, say, archive.org) from "pinning" someone else's published data?

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 9:13am

        Re: Re:

        Nope, nothing stopping this. In fact, it's a great feature of IPFS. Collaborative/collective preservation of digital media. If something is important to you, you are able to archive it for others in that sense.

        reply to this | link to this | view in chronology ]

  • icon
    frank87 (profile), 5 May 2020 @ 11:37pm

    Just like Bittorrent

    The experience with distributed data storage is that links get stale. You're lucky if lasts years Bittorrent still had any seeders.

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 6 May 2020 @ 1:13am

      Re: Just like Bittorrent

      I lot of that has to do with the way people access torrents. Most people are leechers, who grab the file then disconnect when they get what they come for. There's no incentive to stay seeding, and in fact if they're accessing illegal content there's an incentive to seed for as short a time as possible. You see different ratios when accessing legal content such as Linux ISOs.

      However, if the point is not to access individual files, but rather to provide overall data storage for legal purposes, there's no incentive to leech and so people would agree to leave things connected.

      reply to this | link to this | view in chronology ]

      • identicon
        Rekrul, 6 May 2020 @ 9:17am

        Re: Re: Just like Bittorrent

        However, if the point is not to access individual files, but rather to provide overall data storage for legal purposes, there's no incentive to leech and so people would agree to leave things connected.

        Even with 100% legal files, not everyone has the storage capacity to leave everything on their system forever. I filmed a few gameplay videos a while back, but they're sitting on an external drive that isn't connected very often. Many things I have were burned to DVDs and then deleted from my system.

        reply to this | link to this | view in chronology ]

        • icon
          PaulT (profile), 6 May 2020 @ 9:37am

          Re: Re: Re: Just like Bittorrent

          "Even with 100% legal files, not everyone has the storage capacity to leave everything on their system forever."

          Obviously. But, an intelligently distributed system would mean that people dipping in and out would not affect things too much, while most people have a large amount of storage they're not using at all, which only increases with each system they buy.

          The question is really how the data to be distributed is prioritised, and the type of data to be distributed. Obviously, there's a different setup if you're talking about video vs textual data, for example.

          reply to this | link to this | view in chronology ]

          • identicon
            Anonymous Coward, 6 May 2020 @ 12:20pm

            Re: Re: Re: Re: Just like Bittorrent

            But, an intelligently distributed system would mean that people dipping in and out would not affect things too much

            Part of that might be noticing when this rarely-connected drive gets connected, and sharing files during that time. Personally, a lot of the data I get from torrents is still around—just not in its original location or directory layout; and maybe some files are missing, which means I can't seed others files that were sharing blocks with it.

            reply to this | link to this | view in chronology ]

          • icon
            Scary Devil Monastery (profile), 7 May 2020 @ 1:00am

            Re: Re: Re: Re: Just like Bittorrent

            "...But, an intelligently distributed system would mean that people dipping in and out would not affect things too much..."

            Yeah, but as I usually say, although the concept of turning every netizen's PC into a hard drive node for a RAID 6+ array that still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist.

            And you can guess what happens to network bandwidth when the synchronization efforts alone starts the equivalent of DDoSing every router and server on the network.

            reply to this | link to this | view in chronology ]

            • icon
              PaulT (profile), 7 May 2020 @ 1:35am

              Re: Re: Re: Re: Re: Just like Bittorrent

              "still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist"

              If there's zero planning put into what is stored and how, sure.

              "And you can guess what happens to network bandwidth when the synchronization efforts alone starts the equivalent of DDoSing every router and server on the network"

              So, it would be designed not to do that in order to avoid such an obvious problem?

              Your objections seem to rest on the idea that nobody planning the system has any idea how anything works in reality.

              reply to this | link to this | view in chronology ]

              • icon
                Scary Devil Monastery (profile), 7 May 2020 @ 7:17am

                Re: Re: Re: Re: Re: Re: Just like Bittorrent

                "If there's zero planning put into what is stored and how, sure."

                Well, yes, but it's pretty much given that there will be zero planning unless you manage to somehow curate the content. At this point I'll just refer you to a variant of Mike's arguments as to why you can't moderate at scale. By the time we have the tech enabling us to collate and sort all the data online we'll have the tech needed to auto-moderate all of the internet as well. And, I suspect, also the tech needed for real-life star trek replicators given how far off in sci-fi territory that is.

                "So, it would be designed not to do that in order to avoid such an obvious problem?"

                Then you face the unavoidable flip side of avoiding said problem - that if a set of data is taken offline that data won't be available any longer.

                Look, people keep saying "distributed data" and forgetting that said data is still stored - in multiple copies, usually, in multiple locations. "Cloud-based storage" really just means "A whole buttload of server farms storing the same sets of data ten times over".
                To store data offline and have it available at a moment's notice either you have it stored in a place which will always be accessible - and this rules out every private user - or you need to have it in multiple places simultaneously.

                If you have it in multiple places simultaneously then every time anyone changes the "source" copy - and good luck determining which one that'll be after a while - there will be a massive wave of data transfers as the new setup mirrors itself across the network. For every file so affected.

                I dunno about you but for me the idea of the entire damn internet synchronizing itself repeatedly looks like the vision of a million people simultaneously sticking etherkillers into their national network trunks.

                "Your objections seem to rest on the idea that nobody planning the system has any idea how anything works in reality."

                No, my objection rest on the fact that multiple private entities, corporate and individual, have pursued this issue many, MANY times over, starting from old freenet and continuing to the present day. It's a brilliant solution which has always collided with the fact that it doesn't scale well.

                That doesn't mean we shouldn't keep trying and applaud any serious attempt to take this further. But it does mean we should remain aware of what is currently possible and not.

                IFPS is brilliant. A worthy addition to the bittorrent protocol. And we do need more decentralization of the web.
                It will, in itself, do very little for data preservation, however, simply because in the end the data still has to be stored somewhere. And if that somewhere isn't a corporate storage facility or webmotel then that somewhere must be voluntarily donated netizen hard drive storage.

                And there we run into the same issue freenet encountered. It's great if you've got enough people donating storage for the data to be preserved.

                It's just scale. Data compression has limits. No matter how clever you are you can't expect to hold the atlantic ocean in a few thousand drinking glasses. We're literally in LENpeg compression territory here.

                reply to this | link to this | view in chronology ]

                • icon
                  PaulT (profile), 7 May 2020 @ 7:51am

                  Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent

                  "Well, yes, but it's pretty much given that there will be zero planning unless you manage to somehow curate the content."

                  No, there are ways to get around that. I can think of quite a few, such a hashing larger files and using that to get around trying to store full copies of every site that uses the file in question. Also, you can be choosy over which sites are preserved, you don't have to plan to store multiple copies of YouTube for this to be viable for a huge number of independent sites.

                  "At this point I'll just refer you to a variant of Mike's arguments as to why you can't moderate at scale."

                  Nothing I'm talking about has anything to do with moderation.

                  "Then you face the unavoidable flip side of avoiding said problem - that if a set of data is taken offline that data won't be available any longer."

                  No, the entire point is to decentralise storage so that doesn't happen.

                  "To store data offline and have it available at a moment's notice"

                  What does that have to do with the subject here?

                  "the idea of the entire damn internet synchronizing itself repeatedly"

                  Again, you're making a major assumption that's not on the table. Why would everything be constantly syncing, rather than periodic checks to ensure that sufficient copies are available? The latter wouldn't necessarily be more overhead than existing heartbeat and other checks that are already prevalent.

                  reply to this | link to this | view in chronology ]

                  • icon
                    Scary Devil Monastery (profile), 8 May 2020 @ 3:27am

                    Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent

                    "No, there are ways to get around that. I can think of quite a few, such a hashing larger files and using that to get around trying to store full copies of every site that uses the file in question."

                    Then you lose data as a result. I can think of several webpages where data considered extremely relevant to various stakeholders would be distributed between the actual plaintext in the html script, in assorted downloads linked to through the site, and in older versions of the same page, archived and accessible only through separate links.

                    Let's assume, for a second, that not every host holding data indexes and curates the data they present. Much as it is today, in fact. You'll still be stuck needing to store and retain absolutely everything - or face the likely issue of loss.
                    Sure, you can swing an axe and expect the majority of online information to be on the right side of the 80-20 divide...but that's what we currently have, with most of the data stored with the four Big Ones. I'll concur that wresting that data away from decidedly partial interests is useful but it's not a 100%, 80%, or even 60% solution. It's "better than nothing" and will remain at that state until we see a radical shift in how the average netizen handles and contributes to both their own storage and that of others.

                    "...you don't have to plan to store multiple copies of YouTube for this to be viable for a huge number of independent sites."

                    As the ContentID example shows what you WILL end up with ARE going to be multiple copies. Minor changes in a file, deliberate or not, producing a different hash. Every version of a file producing enough of a difference to not be recognized as a copy of an original by any algorithm currently in use.
                    It's not as if there's something like a universal internet OS ensuring that, as is the case on an individual device, every new version of a file has a unique identifier set to discern which files is a copy of another one or not. Nor, I hold, would we WANT a system like that.

                    "Nothing I'm talking about has anything to do with moderation."

                    It's the same argument - and the same logic. The ability to properly plan about storing the whole of the internet, without ending up with multiple copies of files which are, essentially, the same content with very minor variations, requires curation at massive scale. The exact same reason why moderation at scale doesn't work well - because you can't program any algorithm precisely enough without turning it into an actual person - who then needs to be 100% objective, to boot. And we're short of actual objective people to do the curation.

                    "No, the entire point is to decentralise storage so that doesn't happen."

                    Meaning multiple redundancy - which means multiple copies of everything you want stored. That's how decentralization works.
                    It's still the old Raid array problem where the requirement of having access to the data even if some hard drives crash or are unplugged, inevitably means you spend several times more storage on keeping redundancies.

                    "What does that have to do with the subject here?"

                    Me mangling my sentences. Should have been "to store data and have it available even when the origin is offline...". Mea Culpa.

                    "Why would everything be constantly syncing, rather than periodic checks to ensure that sufficient copies are available? The latter wouldn't necessarily be more overhead than existing heartbeat and other checks that are already prevalent."

                    Because every time any holder of a file alters anything in it - deliberately or by normal data corruption, it will generate a conflict in which version is to be considered correct and force an update - unless you want to introduce multiple file versions arising spontaneously in addition to the issue of needing copies to maintain redundancy.
                    It's again a question of scale. Microsoft and other companies have spent god alone knows how much money and effort developing and pushing decentralized storage solutions onto corporate and private consumers. For private consumers this tends to work fine. It's a 1:1 storage solution or just, bluntly put, a virtual extra hard drive.

                    For corporations, despite the overwhelming advantage of usually having a closed-ecology intranet, unique identifiers for every user, usually a known pre-defined platform and application setup, and with full administrative control over the laptops of the individual users, the solution is *messy, to say the least, generating no end of maintenance and administration issues.

                    And that's just trying to keep a 1:1 storage while not giving a rat's ass whether 1 or 10000 people retain copies of the same file. The obvious solution - running a shared drive - only works by assuming that every user either has restricted read-only access, or full control over read and write - at which point mayhem happens, even when the drive is shared only among a dozen people.

                    This is not in the end an issue of technology. It's both a technical problem of trying to conciliate two fundamental opposite logics (standardized control vs individual freedom), and a people problem (people set up, index and flag their data entirely according to their own preferences) we won't solve easily by simply nerding harder.
                    And every option that would make it just a little easier tends to be one we do not, under any circumstances, want.

                    Namely putting all the end point under centralized control. This is the sort of solution techies usually present as a joke or impossibility - and which people like Bill Barr then run with.

                    After seeing the various attempts at creating this wonderful decentralized environment - early filesharing clients, freenet, bittorrent, etc - and their respective attempts to accomplish full decentralization - what we've ended up with is currently what appears to work. It's not as if this is a new Holy Grail we just haven't pursued enough.

                    So as I said, I still think it's incredibly positive that people keep making efforts. IFPS is a fine enabler and force multiplier for online freedom already as is.

                    And someday in the future we may have an entirely different structure of the online environment which allows it to assist in building an internet where data "storage" is fully fluid and never tied to any individual location. But until that time- and at the minimum we need a few paradigm shifts in individually available storage and network technology and infrastructure before we are at the point where that becomes possible - it will remain an application mainly used by enthusiasts and the politically engaged.

                    Doesn't make it worthless, even if it should turn out that today all it might be good for is to keep long-dead torrent links alive.

                    reply to this | link to this | view in chronology ]

                    • icon
                      PaulT (profile), 8 May 2020 @ 7:32am

                      Re: Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent

                      "Then you lose data as a result."

                      Why would hashing data and using that to stop creating unnecessary duplication lead to a loss of data? I can understand concerns of versioning and bloat, but I don't see how this method of mitigation causes data loss.

                      "but that's what we currently have, with most of the data stored with the four Big Ones"

                      Except - and this is the entire point - the data is no longer under their control. Even if that's all that's achieved, it's still achieved one of the stated goals.

                      "Meaning multiple redundancy - which means multiple copies of everything you want stored"

                      Yes...? That is what we're discussing here - decentralised storage so that it's not dependent on a single node or provider.

                      "As the ContentID example shows what you WILL end up with ARE going to be multiple copies."

                      Yes, but you end up with far less unnecessary duplication than you would without controlling for that.

                      "Because every time any holder of a file alters anything in it - deliberately or by normal data corruption, it will generate a conflict in which version is to be considered correct and force an update"

                      ...and most files aren't changed after they are created with any kind of regularity, and most will be of relatively trivial sizes. Some websites will present different challenges, but most private webpages really aren't updated all that regularly after initial publication. Versioning would be something to be considered, but you're not talking about millions of copies of the same files, or at least you shouldn't be. Someone would have to do the maths to find the optimal number, but it should be low enough that incremental changes aren't going to cause huge numbers of updates.

                      "The obvious solution - running a shared drive - only works by assuming that every user either has restricted read-only access, or full control over read and write - at which point mayhem happens, even when the drive is shared only among a dozen people."

                      That has nothing to do with the system under discussion. Everything that will be stored is already on the public internet, why would things like access restriction be required? Same with versioning - you need some control over which version is being distributed, but we're talking about distribution of the latest version, not archival backups.

                      "a people problem (people set up, index and flag their data entirely according to their own preferences)"

                      You again seem to making a lot of fundamental assumptions that have nothing to do with what I'm talking about. I'm talking about distribution of files as they are published, which would require no more user interaction than indexing currently requires. I'm thinking along the lines of rsync - sure, you might need to do some changes at some point, but by and large once the cron script is set up you shouldn't need to do anything control that from the user end unless your requirements change.

                      "Doesn't make it worthless, even if it should turn out that today all it might be good for is to keep long-dead torrent links alive."

                      There's no fundamental difference between that and what we're discussing. It's all a matter of implementation, which is not a problem unless you keep assuming that it somehow instantly has to cover the entire internet on day one and requires vastly more management than any current file management system. The only reason why torrents go dead is because people stop seeding, which shouldn't be a problem with a system that automates the availability of files. Plus, as I've mentioned before, the main reason torrents go dead is because people don't want to be caught seeding files they shouldn't be once they've leeched what they need, it's not as much of a problem with legal content unless it's severely outdated.

                      reply to this | link to this | view in chronology ]

                      • icon
                        Scary Devil Monastery (profile), 11 May 2020 @ 6:42am

                        Re: Re: Re: Re: Re: Re: Re: Re: Re: Re: Just like Bittorrent

                        "Why would hashing data and using that to stop creating unnecessary duplication lead to a loss of data?"

                        Umm...let's go back a bit and try this again; Hashing the data only means you've got a checksum which matches one given file. It will identify that file and that file only.

                        I've got a file. It's a media file called "Ten hours of Paint drying" - or THOPD. It's an old DivX file at 20 GB, 4k resolution. It provides the checksum of "X".
                        My friend streamrips this file but thinks 720p res is good enough, saving a lot of space. The new checksum is "x".
                        I see this happening and think that OK he's got a point, the file's a bit big so I recode it in 2k res and in .264. New checksum is "Xx".

                        Parallel to this development some other dude films the very same spot of drying paint and encodes it in 4k, mp4. Checksum of that one, "Y". Rinse and repeat.

                        Now you have about fifteen different hashes which ALL refer to the exact same content, in various types of encoding and detail. This does not save space. It only means IFPS runs into the exact same phenomenon as can be observed on normal torrent index pages where you search for a specific file by name and come up with 58 versions, all with a unique torrent pointing their way.

                        So you invariably end up with the problem of either somehow having to curate the hashed content, OR preserving ALL of it. Any measure aimed at curation will fail to some degree and so there'll be loss of data. Preserving all of it means storing effectively identical copies of more or less everything. Even something as simple as a text file which can be coded in .cbr, .pdf, .odt...etc et ad nauseam.

                        "Except - and this is the entire point - the data is no longer under their control. "

                        Isn't it? Who owns the storage space? Because one thing I can say right now - it's not "decentralized" if all of it is still in the same drive racks in some Redmond server park.
                        And the storage need isn't covered by private individuals either.

                        "Yes, but you end up with far less unnecessary duplication than you would without controlling for that."

                        Torrent indexers use the exact same method and they still end up with a dozen pages of checksum hashes all pointing to separately encoded versions of one and the same work.

                        "Everything that will be stored is already on the public internet, why would things like access restriction be required?"

                        Case example - storage of file X. Multiple copies created to maintain redundancy. The origin data repository where X resides drops out of the web. For reasons (deliberate or not) several of those copies are altered enough to render the current checksum inoperable.

                        • Register the altered versions as new uniques?
                        • Try to synchronize the changes across all stored copies?
                        • Restore the altered versions to whatever is stored under the original checksum?

                        Either option opens a whole new can of worms and headaches. The simple solution is that anything online is to be stored in perpetuity and that altered data is to be considered new data. But you can imagine, i hope, how THAT affects storage needs.

                        "Same with versioning - you need some control over which version is being distributed, but we're talking about distribution of the latest version, not archival backups."

                        Pretty sure this debate derailed completely from the OP by now since that illustrated the use of IFPS primarily for archiving purposes. How would you go about curating the latest versioning when you do not, in fact, have any common denominator telling you that hashsums X and x are one and the same because it's really just a 4k DivX repackaged as a 2K mkv?

                        The Bittorrent approach is that it retains simplicity and brute-forces the issue by simply assuming that every checksum is a unique identifier. Hence why you can end up with 58 "versions" of a file which, decoded, all turn out to be identical but until then are considered 58 unique files by every protocol involved.

                        "I'm talking about distribution of files as they are published..."

                        And that is where the "people problem" shows up which leads us right back to my initial assertion;

                        "turning every netizen's PC into a hard drive node for a RAID 6+ array that still means you need enough storage space to store the entirety of the internet multiple times in order for that redundancy to exist."

                        To which you replied;

                        "If there's zero planning put into what is stored and how, sure."

                        I think we're on the same track, then, because distributing files as they are published does mean we get the "zero planning" bit because no way in hell do we find a way to curate which data set is unique and which is not. Leading to versioning across multiple users AND multiplied by the necessary redundancy.

                        And my main doubt here is whether we can manage to make that work at scale when the only available actual storage we have is voluntary donations by individual users.

                        "...assuming that it somehow instantly has to cover the entire internet on day one and requires vastly more management than any current file management system."

                        Most current decentralized file management systems are maintenance-heavy even when they only cover a single corporation in their network, and DO rely on using intermediate server storage solutions. I'm not sure "current" systems are good ballpark comparison points, honestly.

                        For the solution discussed here we are cutting away all of the maintenance, the intermediate storage solutions, and a lot of the infrastructure enabling peer-peer recognition when it comes to synchronization. DHT is admittedly proof of concept that maintenance and synchronization can become non-issues.

                        But proper indexing and storage? That's going to be tricky. We don't have the divination magic required for the first to become something even remotely searchable any better than Tribler's attempts at rendering DHT the backbone of indexing...
                        ...and the storage issue requires, imho, a radical shift in general netizen purchasing behavior. Not everyone wants to or can let a separate hard drive to do their part in archiving the internet.

                        reply to this | link to this | view in chronology ]

          • identicon
            Rekrul, 8 May 2020 @ 2:07pm

            Re: Re: Re: Re: Just like Bittorrent

            Obviously. But, an intelligently distributed system would mean that people dipping in and out would not affect things too much,

            Only if the files they have are duplicated elsewhere. As I understand it, the whole system depends on people storing local copies of files. If enough people don't do that, then there aren't sufficient copies online to keep the files available. Unless of course it automatically makes a local copy of every file you access, every web site you look at, every video you view, etc. Which seems like a recipe for disaster when people start inadvertently filling up their hard drives just from daily usage, or they happen to stumble across something illegal and it gets saved to their system and then made available for the rest of the world to access.

            while most people have a large amount of storage they're not using at all, which only increases with each system they buy.

            Except that when most people buy a new system, they just chuck the old one, drives included. Or they take a sledge hammer to them, since CSI and NCIS have taught them that data can NEVER be erased from a hard drive. All of the data that they will have collected just goes in the trash and they start from scratch. Sure, they might backup photos and personal files, but most people don't even know how to do that.

            The question is really how the data to be distributed is prioritised, and the type of data to be distributed. Obviously, there's a different setup if you're talking about video vs textual data, for example.

            There are text files and copies of web sites that I saved on previous systems that are now sitting on external drives or burned to disc. Also, for this system to work, the data has to remain unchanged. What happens when someone edits a file to correct typos or add notes? What if they take a bunch of files on a similar topic and Zip them together for ease of organization? What if they convert them to PDF for printing? I recently discovered that Staples can't print saved web pages, everything has to be PDF, Word or RTF.

            reply to this | link to this | view in chronology ]

            • icon
              PaulT (profile), 8 May 2020 @ 10:43pm

              Re: Re: Re: Re: Re: Just like Bittorrent

              "Only if the files they have are duplicated elsewhere. "

              Which would be the entire point of such a distributed system, yes. Why would a system that's intended to decentralise the web have single points of failure?

              "If enough people don't do that, then there aren't sufficient copies online to keep the files available"

              Yes, which is why there would be impetus to both reduce the amount of storage required and to encourage sharing.

              "Which seems like a recipe for disaster when people start inadvertently filling up their hard drives just from daily usage, or they happen to stumble across something illegal and it gets saved to their system and then made available for the rest of the world to access."

              It's only a recipe for disaster if it's badly managed, and people start getting prosecuted for what's essentially a large cache for sites they've never visited. Even so, I'd suggest the first form of the system would be concentrated on things like text and smaller images and expand from there once the proof of concept has been done.

              "Except that when most people buy a new system, they just chuck the old one, drives included."

              Yes they do.. and they buy them so infrequently that the new system has a massive amount of storage compared to the old one. Apart from a slight bump as people move from SATA to SSD, capacities are going up constantly and exponentially.

              "All of the data that they will have collected just goes in the trash and they start from scratch"

              Yes... and that doesn't matter to a truly distributed system since you just create another copy from another source. It would be like replacing a drive in a RAID array. The data remains available even as you have to destroy one of the original drives. You're only in trouble if every node goes away, but a properly designed system should make that vanishingly unlikely to happen.

              "What if they take a bunch of files on a similar topic and Zip them together for ease of organization? "

              Are those files served on the public web? If not, they're irrelevant to the aim of what we're talking about. If so, they would fit within the realms of what we're discussing. I'm sure it could be accounted for, however. There are standard Linux tools that allow you to view the contents of compressed files without opening them, so you could conceivably omit these files if the originals are available elsewhere. Password protected files are a different story, but at that point it's bad management by the site owner, and there's no accounting for that no matter who's serving the data.

              "I recently discovered that Staples can't print saved web pages, everything has to be PDF, Word or RTF."

              Probably because web pages don't always format naturally to the printed page and wastes a lot of paper if printed as-is. But, again, that's irrelevant. If you want to print it, you create the PDF. The fact that the page you're looking at was created using a different type of storage should be irrelevant to your ability to do things in your browser.

              reply to this | link to this | view in chronology ]

  • icon
    flyinginn (profile), 6 May 2020 @ 4:05am

    Ignoring the technical feasibility issues for a moment, this reminds me of the introduction of the Dewey Decimal system for library location. Instead of describing availability by location ("American Farm Tractor" by Randy Leffingwell is on floor 3, stack 14, shelf 2) it can be located by subject (631.3.x) anywhere that has a copy. The content with IPFS could be anything but there's always the risk that it will be misfiled. It seems to be the opposite of an IP, which is content-agnostic and inherently transient.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 6 May 2020 @ 9:16am

      Re:

      Very insightful. This is a great analogy that has been used before to describe IPFS. However, the fact is that there is no risk of misfiling, because the content address is the hash of the content itself. It is a unique fingerprint that cannot feasibly be produced from any other bit of digital data.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 6 May 2020 @ 12:22pm

        Re: Re:

        This is a great analogy

        I like it, but I do have to wonder whether people these days are more likely to know about the Dewey system or cryptographic hashes.

        reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 May 2020 @ 4:54am

    Re: Being Productive in the Lockdown with Talento.

    Sorry, this is not your private advertising platform.

    reply to this | link to this | view in chronology ]

  • identicon
    Anonymous Coward, 6 May 2020 @ 7:51am

    Don't Mistake Foolishness for Fearlessness

    "...bold prediction..."

    As someone who's been contributing bandwidth and storage volume to Freenet for over a decade, I would read "bold" as "silly in this context."

    reply to this | link to this | view in chronology ]

  • identicon
    JJ, 6 May 2020 @ 8:19am

    Worse privacy and security than the regular web?

    The IPFS website is paragraph after paragraph about how cool and wonderful and awesome IPFS is, and also has lots of low-level technical info, but makes very little attempt to answer questions that informed readers will almost certainly have: How does this system compare to other similar systems? Why is it likely to succeed where others have failed? How does it plan to deal with the threats that are likely to arise?

    I've found that when a project isn't interested in answering those questions clearly, thoughtfully, and openly, it's because it doesn't have good answers. Frank, critical discussion of IPFS's pros, cons, threats, and solutions needs to be front and center in their communication.

    It looks like from a privacy and security point of view, IPFS is much like bittorrent, which means that any peer can see what data you're accessing, and also what data you're "seeding" or hosting. It becomes trivial for governments and corporations to spy on who is accessing and hosting what content. Since IPFS only works if users agree to publicly host and share content, this means that anything controversial will be dangerous to access, and even more dangerous to provide access to.

    IPFS does have room for adding tor-like security layers on top of it, but it seems like that would destroy most of the benefits of using the system.

    Worse still, the system does not seem to attempt to guarantee a certain level of redundancy or availability for any particular data. Every user hosts only the data they choose, for only as long as they choose. (Compare this to existing distributed blockchain-based file storage systems, which are designed to guarantee file availability.)

    So, the content that is most likely to be preserved is the content that is most popular and least controversial - i.e. content that is pretty likely to be preserved anyway, and very likely the content that is least important to preserve.

    reply to this | link to this | view in chronology ]

    • identicon
      Rocky, 6 May 2020 @ 9:54am

      Re: Worse privacy and security than the regular web?

      Worse still, the system does not seem to attempt to guarantee a certain level of redundancy or availability for any particular data. Every user hosts only the data they choose, for only as long as they choose. (Compare this to existing distributed blockchain-based file storage systems, which are designed to guarantee file availability.)

      We already have CDN's today, it's would be quite easy for them to add in IPFS which would definitely alleviate any redundancy problems.

      reply to this | link to this | view in chronology ]

  • identicon
    Rekrul, 6 May 2020 @ 9:25am

    Edonkey2K had a system like this where you point it at a directory and it shares everything in there, with files being accessed by their hash and not just their name. Availability of files was spotty at best and has diminished over time. BitTorrent allows users to share individual files in a similar way, but any torrent more than a few months old is almost certainly dead by now.

    I agree that having central points of failure for data is a bad idea, but depending on users keeping that data available for others to access isn't any better. Only the most popular files will be kept online and anything less popular or obscure will disappear quickly. People will delete some files simply because they need to free up space on their drive.

    How is that any better than what we have now?

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 6 May 2020 @ 12:26pm

      Re:

      How is that any better than what we have now?

      As noted elsewhere, it would make archiving easier.

      Content disappears today when whoever originally posted it stops caring. Maybe archive.org has a copy, or maybe they have a copy of something totally different that once used that URL. With IPFS, they or anyone else could run the mirror; people will be able to find it automatically and know they're getting the correct content. If this works.

      reply to this | link to this | view in chronology ]

  • icon
    Ehud Gavron (profile), 6 May 2020 @ 3:08pm

    Interplanetary Pot Filled Smoke

    Great undefined buzzwords that all feel good in a long time -- can't beat that.

    Interplanetary Pot Filled Smoke sounds great. Hope it works out well for

    • the people who give you money
    • the money you take from them

    Maybe... just maybe... if you're REALLY serious about a content based search that has nothing to do with "servers" (oh my aching gasping laughing chest) you should hire some people who are experts at this ... oh wait... you did? They told you this wasn't viable? Oh, sorry.

    /backs out of room slowly.

    E

    reply to this | link to this | view in chronology ]

  • identicon
    Hankon, 10 May 2020 @ 3:41am

    if you're looking for cybersecurity, then join Utopia. This is a reliable and secure browser where everything remains secret!

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here



Subscribe to the Techdirt Daily newsletter




Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Close

Add A Reply

Have a Techdirt Account? Sign in now. Want one? Register here



Subscribe to the Techdirt Daily newsletter




Comment Options:

  • Use markdown. Use plain text.
  • Remember name/email/url (set a cookie)

Follow Techdirt
Insider Shop - Show Your Support!

Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories
Advertisement
Report this ad  |  Hide Techdirt ads

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.