Archivists Grapple With Problems Of Preserving Recent Culture Held On Tape Cassettes And Floppy Drives

from the digital-archaeology dept

Most Techdirt readers probably surround themselves with the latest technology. But there's a slightly unusual class of professionals who are only now beginning to grapple with things like CP/M, 8-inch floppy disk drives and the Apple Lisa. These are the archivists, whose job is preserving cultural artifacts from all periods of history. That includes the recent past, whose technologies now seem paradoxically so strange and distant. The real-life consequences of that growing chasm between today's digital technologies, and those that were commonplace 10, 20 or 30 years ago, are made evident in an article published by the Guardian last week:

In the belly of a former whisky store in the inner Melbourne suburb of Brunswick lies a vast and varied collection of artefacts that feminist scholars can't wait to get their hands on.

Nearly 500 boxes in this dark, temperature-controlled warehouse hold a lifetime of handwritten letters, browning manuscripts and newspaper clippings.

But there are more modern treasures too: floppy disks containing an unpublished book about Margaret Thatcher; two computers, a Mac Powerbook G4 and iMac G5; and voicemail recordings about dinner plans in 1976.
These are all part of the archives of the well-known Australian writer Germaine Greer. According to the article, Greer has been hoarding personal documents and artifacts from the 1950s to the present day, which means they are in both analog and digital forms:
Greer's archive includes floppy disks, tape cassettes and CD-roms, once cutting-edge technologies that are now obsolete. They are vulnerable to decay and disintegration, leftovers from the unrelenting tide of technological advancement. They will last mere decades, unlike the paper records, which could survive for hundreds of years.
It is an irony of these formerly high-tech holdings that they are far less durable than old-fashioned paper-based systems. And researchers studying them face problems of compatibility that simply don't arise with paper. This is a major issue that is only now being faced, as cultural figures of Greer's generation pass on their archives to universities and libraries, who must start to grapple with the core tasks of deciphering and preserving them.

The good news is that once they have been decoded, they can be transferred to other media, and in more open formats that will be easier to access in the years to come. But that still leaves the problem of how to store all these archives in a way that will stand the test of time. Perhaps they will be encoded as data held on the ultimate storage medium, DNA. Or maybe it would just be easier to print the lot out on paper.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Lawrence D’Oliveiro, 12 Aug 2016 @ 12:43am

    Archival Is An Active Process

    So you decode the files (hopefully they’re in open, documented formats), write reusable open-source tools to deal with them, then store the whole lot on current-vintage servers. Then as you upgrade your hardware, you keep copying everything onto the new machines, and maintain the software so it will keep running as well.

    Archiving stuff doesn’t mean leaving it on shelves in basements. That was never true even for paper documents, or papyrus, or clay tablets, or anything else. The digital world is just more of the same.

    And if you publish the software and the documents, then others can make their own copies, adding to the redundancy.

    reply to this | link to this | view in chronology ]

    • icon
      PaulT (profile), 12 Aug 2016 @ 1:26am

      Re: Archival Is An Active Process

      "And if you publish the software and the documents, then"

      ...someone sues to get the archive shut down because some obscure part of the release contained copyrighted material without the archivists' knowledge.

      Sadly, corrected for accuracy.

      reply to this | link to this | view in chronology ]

      • identicon
        Daydream, 12 Aug 2016 @ 3:18am

        Re: Re: Archival Is An Active Process

        Do we have to bring up copyright now?

        Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?

        And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren't, wrongful takedowns and all site seizures are guaranteed to do far more harm than good?

        reply to this | link to this | view in chronology ]

        • icon
          PaulT (profile), 12 Aug 2016 @ 3:45am

          Re: Re: Re: Archival Is An Active Process

          "Do we have to bring up copyright now?"

          Yes, because it's a major issue with archiving, especially with digital media where the only known copies are DRM infected, and especially if you start redistributing the content. By definition, anything on digital media will still be under copyright today, orphaned or not, and any attack from a corporate copyright holder places the whole archive at risk.

          "Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?"

          Because the purpose of those sites is not to archive? I understand the sentiment, but there's a huge difference in scope and intent.

          "And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren't"

          Citation? Popular works are distributed widely, pirated or not. Obscure titles are generally not, whether they're pirated or not. It's to do with how many people wish to obtain them, not whether they're pirated. Archiving is about preserving those works that are not popular or easily obtained.

          "wrongful takedowns and all site seizures are guaranteed to do far more harm than good?"

          This, however is correct. Avoidable collateral damage is always wrong, and taking entire sites down for a percentage of infringing content is also unacceptable. Which is why we need to be aware that legitimate archives are going to be a target from people who worship the broken copyright system.

          reply to this | link to this | view in chronology ]

  • identicon
    Boat-face McBoaty, 12 Aug 2016 @ 1:25am

    Germaine Greer

    An unpublished book about Margaret Thatcher? I do hope that one hits the shelves.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 12 Aug 2016 @ 3:22am

      Re: Germaine Greer

      Germaine Greer ain't germane any more. I don't reckon any of her archived material will be significant to society. Since she made a name for herself, she became a pompous ass - spent too much time with the poms.

      reply to this | link to this | view in chronology ]

  • icon
    Peter (profile), 12 Aug 2016 @ 2:29am

    The New Alexandrians

    Google's Book project has been compared to the Library of Alexandria, an attempt by the ancient Greeks to collect all the world's knowledge in one place. With the difference that Google books can be accessed by anyone, anywhere. And it is as futureproof as Google's computer network.

    As for more personal communication: Between Facebook, NSA and Gmail, it is probably more difficult to delete the more embarrasing parts than retrieving what's interesting ....

    reply to this | link to this | view in chronology ]

  • identicon
    Chris Laarman, 12 Aug 2016 @ 3:18am

    digitize, but keep the source

    I think that the primary act of archiving is digitizing data at all, with file formats as a secundary choice. But that concerns the content of the data (and some metadata).
    It may as well be useful to archive the original media.

    Two examples come to my mind.
    - My home town of Amsterdam (NL) welcomes masses of tourists who want to see the Night Watch painting by Rembrandt, even though it can be scrutinized on-line.
    - Think of button-badges with slogans. Their messages may be simple and easily digitized, but that would not convey their full meaning to posterity. A movie showing protest rallies would, but having just the movie would not show the actual thing. And posterity may draw interesting conclusions from the ledgers of the badge-making industry through time.

    In my part of the world, many things get archived, often with public money. I do hope that documenting our history won't fall victim to political decisions.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 12 Aug 2016 @ 3:47am

      Re: digitize, but keep the source

      The problem being discussed here is dealing with archiving when the source is digital. This runs into all sorts of problems due to no longer supported media and file formats. So the problem is one of where the source format is no longer supported, and it is necessary to resurrect old hardware, and find copies of old programs just to get the files into some usable format on more modern hardware.

      reply to this | link to this | view in chronology ]

  • icon
    TRX (profile), 12 Aug 2016 @ 4:46am

    In my experience, HD (1.44Mb) floppies have a lifespan of about ten years before the magnetic domains blur until they can't be read. About the same for CD-ROMs - the kind you can write on your PC. I have 25-year-old commercial music CDs that still play fine, but they use a different technology.

    Various backup tape formats, less than 5 years. VHS video tapes start looking cartoony after 10 years. Cassette tapes start going bad at around the same age, even if not played. (most cassettes died from wear, not age)

    The lifespan of a recording medium appears to be inversely proportional to its density. Chipped stone, now *that* is permanent. Baked clay isn't as good. That newfangled "paper" stuff, who'd want that?

    The ringers are the various drum or disc audio formats; Edison cylinders and records. As long as you don't play them they'll last almost forever. And near the end of the LP era, there were players that used a laser instead of a stylus, so the grooves wouldn't wear. And they're not at all bad for density; that's why the Compact Disc format needed 700Mb to store an LP at a reasonably high sample rate.

    reply to this | link to this | view in chronology ]

    • identicon
      Anonymous Coward, 12 Aug 2016 @ 5:34am

      Re:

      I like how you try to sound like an expert, when you're obviously not. I've been hearing this "10 year" nonsense about floppies and CD-Rs for, well, MORE than 10 years. I'm kind of an amateur archivist. I love old computers. I've got 100s of 40 year old floppies and 20 year old CD-Rs. Pretty much all of them still play today.

      I realize that is anecdotal, but that a better data point than your parroted nonsense.

      reply to this | link to this | view in chronology ]

      • identicon
        Anonymous Coward, 12 Aug 2016 @ 5:52am

        Re: Re:

        Lifespan of CD-R's and to a lesser extent floppies depends depends of such factors as climate, especially humidity for CD's. The idea of copying them onto fresh media every 10 years is a recommendation to maximize the chances of the files surviving for long periods. It is a bit late to attempt the copy when the media has developed unrecoverable errors due to degradation.

        reply to this | link to this | view in chronology ]

        • icon
          John Fenderson (profile), 12 Aug 2016 @ 3:25pm

          Re: Re: Re:

          It also depends on when the floppies were made. Earlier floppies were of substantially higher quality than later floppies, and tend to last multiple decades unless abused.

          reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 13 Aug 2016 @ 6:57am

          Re: Re: Re:

          > The idea of copying them onto fresh media every 10 years is a recommendation to maximize the chances of the files surviving for long periods.

          Also, it starts looking silly to keep around 32 CDs when you can replace them with a single BD-R. But we're overdue for some new backup media. In theory there are 119 GB discs, but I've never been able to find them; and the 46 GB discs cost more per byte than the normal 23 GB ones.

          Soon, hard drives are likely to be the only reasonable option. But they might be running for years before they fill up, and in that time are vulnerable to disk failure, ransomware, etc. (unlike discs, which could be damaged by malware but not rewritten--and only the amount of data on the disc that's in there at the time).

          How well do HDDs work for long-term storage? I've rarely tried using 30-year-old drives, though last time I did they were fine. I suppose any data I cared about would have been copied to a new drive before 10 years, because keeping a 10-year-old drive spinning would be silly when newer drives would have 30X capacity. But archivists have to deal with cases where the data owner didn't care, forgot about it in an attic, etc.; and not everyone can afford to replace hardware before it dies.

          reply to this | link to this | view in chronology ]

      • icon
        PaulT (profile), 12 Aug 2016 @ 5:53am

        Re: Re:

        Me too, I've retrieved old media plenty of times and found they work OK.

        It's all about storage, ultimately. Magnetic media can degrade quickly, but may last for a long time if taken care of properly. I certainly have floppies from the late 80s that still work. Ditto cassettes and VHS tapes, although I did have to dispose of a few due to chewing or mould issues. I've also seen CDs and DVDs I've had to throw out due to a manufacturing flaw, even if stored properly. But, the majority of my collection seems to work normally, at least last time I'd manage to check.

        His comment about not using them is interesting, though, as it's rather true. Keep using something that depends on physical contact, whether it's a piano roll, vinyl or tape, and you run the risk of it getting damaged or simply eroded. Archivists won't let people read very old books because acids and other chemicals present in human skin will damage the paper.

        But, ultimately, it's all about the content. It's great to preserve the original media, but it's also important to retrieve the content lest the original become unusable. That probably *will* happen for most modern storage media, although the timescales some claim do seem rather underestimated.

        reply to this | link to this | view in chronology ]

        • identicon
          Anonymous Coward, 12 Aug 2016 @ 6:35am

          Re: Re: Re:

          You are not thinking like an archivist, who set the time before copying to new media to be less than that required for the first failures to start to show up in a cohort of tapes/disks etc. To start copying after the failures start to show up is to lose some of the archives.

          reply to this | link to this | view in chronology ]

      • icon
        TRX (profile), 12 Aug 2016 @ 10:28am

        Re: Re:

        Let's see... a 40-year-old floppy would be from 1976.

        I bow in respect for your greatness! And the hundreds of 1.2 and 1.44Mb floppies my floppy drives ground on, trying to recover *anything*, were obviously figments of my imagination.

        reply to this | link to this | view in chronology ]

      • identicon
        Lawrence D’Oliveiro, 12 Aug 2016 @ 3:24pm

        Re: Pretty much all of them still play today.

        You were lucky.

        When I got my first CD writer, back around 1999 or so, one of the first things I did was make backups of all my floppy disks. In particular, I had been carefully maintaining two separate copies of the floppies containing all the programs I had written.

        And guess what? There was one disk where both copies had bad sectors. Luckily not in the same files. But I came this close to losing my only copies of those files.

        Floppies have never been really reliable. (Why do you think I was maintaining multiple copies?) I was happy to see the back of them.

        reply to this | link to this | view in chronology ]

    • identicon
      Rekrul, 12 Aug 2016 @ 7:53am

      Re:

      In my experience, HD (1.44Mb) floppies have a lifespan of about ten years before the magnetic domains blur until they can't be read. About the same for CD-ROMs - the kind you can write on your PC.

      For what it's worth, I just tested one of the oldest CDs I burned from 7/16/03 and every file tested 100%. My discs are kept in the round "cake" box containers that they originally came in. Beyond that I don't do anything special. I live in Connecticut and it gets pretty hot and humid here during the summer. I used to use air conditioning for much of the summer, but since our electricity rates went from OK to ridiculous, I now just use fans except for the very hottest days.

      reply to this | link to this | view in chronology ]

  • icon
    Vidiot (profile), 12 Aug 2016 @ 5:48am

    Assigning value

    One of the core issues here is a variant on the fundamental question, "What's worth preserving?"

    For most of us, the answer would be "everything". I get a kick out of reading mundane, slice-of-life moments from PDF's of old regional newspapers, probably more than from viewing a digitized image of the Magna Carta. The value we assign to preservation, i.e., archiving, is a floating quantity.

    But, as always, perceived value needs to be matched by monetary value. Even the monks who hand-duplicated ancient manuscripts had to be fed and sheltered... there's always a cost. Librarians and archivists working primarily with paper archives faced this, too, but it only occurred at long intervals; today, our proliferation of digital formats means that the archivist's job is nearly continuous. As soon as a collection has been fully migrated from one fading medium to the next, greatest platform, you can bet that the process will begin again, as new technology becomes old. Preservation cycles formerly measured in centuries are now measured, at best, in decades.

    And someone needs to pay. Continuously.

    As a result, archiving efforts for the biggest, most prestigious collections are funded, because we can all agree on that question of value. Not so much, though, for media of secondary interest; those are likely to be ignored until hardware vanishes, file formats disappear or magnetic coercivity fades into oblivion. Sometimes, content's best hope is that it will drop below that secondary threshold, into the realm of "quirky ephemera", where oddballs like me might step in and volunteer to migrate the media.

    So maybe that's the next great role for the world's underused, underappreciated network of public libraries -- archiving and preservation of mid-value content. Makerspaces and 3D printers are nice, but professional librarians all have advanced degrees in the fields that would make them indispensible to this effort.

    reply to this | link to this | view in chronology ]

  • icon
    Ninja (profile), 12 Aug 2016 @ 7:18am

    Physical DRM

    There, the perfect DRM!

    I suggest the MAFIAA releases their crap.. Ahem, awesome works in tapes or 5 in disks. Nobody will ever pirate anything in the future!

    reply to this | link to this | view in chronology ]

  • identicon
    Rekrul, 12 Aug 2016 @ 7:56am

    If you think preserving culture on old cassettes and floppies is hard, wait until someone tries to preserve today's current digital only, DRM locked culture. How do you preserve a library of games that only exist in digital form, sent directly from the company servers to a locked down game console?

    reply to this | link to this | view in chronology ]

    • icon
      That One Guy (profile), 12 Aug 2016 @ 11:18am

      Pirates: Archivists of the modern age

      But wait, it gets worse, because archiving and preserving DRM infected content requires breaking the DRM, and with the law as it stands now doing so opens you up to legal action, since even actions that are perfectly legal(archiving, creating backups) become illegal if it requires breaking or bypassing DRM.

      As such you either have archivists/historians risking legal action to preserve something, pirates taking up the role of archivists, or risk the content being lost entirely.

      reply to this | link to this | view in chronology ]

  • icon
    Groaker (profile), 12 Aug 2016 @ 8:33am

    This problem has been recognized for more than 20 years. It is not solely in the domain of user devices, but professional data centers as well. First of course is the issue of bit rot, followed by loss of data structures. But most important is the loss of devices to read the data on (tape drives, disk platters, card readers, etc,) and of course the software with which to read it.

    The Y2K frenzy caused a lot of hardware and software to be replaced, but much of the data wasn't transferred. And now if you try to find a programmer who is familiar with say BDAM files, you will be luck, if the individual doesn't suffer from dementia. Never mind the hardware to run it on.

    What is on personal PCs may be significant, but the data in large centers dwarfs it in volume and value.

    reply to this | link to this | view in chronology ]

  • identicon
    bob, 12 Aug 2016 @ 11:44am

    digital records

    The fact that digital doesn't last as long as paper is a major reason why the world will never have a truly digital-only office. There will always be something written down to preserve it.

    reply to this | link to this | view in chronology ]

  • icon
    David (profile), 12 Aug 2016 @ 5:02pm

    Sorry, this has been known for decades

    This is an acknowledged issue of long standing in the computer business. So, the only irony I see is that it is once again mentioned as an, ah-ha! moment. Haalp, we might lose these dinner plans.

    If you want to archive your work, collections or whatever then there are known issues that will need to be addressed. It takes time, people with time, dedication and often some actual real world cash.

    Good luck, but leave off the irony for rehashed news.

    reply to this | link to this | view in chronology ]

  • icon
    John85851 (profile), 16 Aug 2016 @ 10:49am

    What's the file format

    I wonder what file format the documents on the computers are using? What happens if they're in VisiCalc '82 format that's unreadable simply because no one's needed a VisiCalc emulator for 25 years? Or are there plenty of smart people who can write an emulator? :)
    And is there any chance that the files could be in an unknown format that can't be emulated or read?

    reply to this | link to this | view in chronology ]

  • identicon
    walter carroll, 8 Nov 2016 @ 10:49am

    That makes sense, but it does not really explain how they are doing it. It just says we will not keep as many copies of your older images as we do of your most recent images. However we will keep a copy on a device or tape (?) somewhere if we need to get it....

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Techdirt Gear
Shop Now: I Invented Email
Advertisement
Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Advertisement
Report this ad  |  Hide Techdirt ads
Recent Stories
Advertisement
Report this ad  |  Hide Techdirt ads

Close

Email This

This feature is only available to registered users. Register or sign in to use it.