Archivists Grapple With Problems Of Preserving Recent Culture Held On Tape Cassettes And Floppy Drives

Most Techdirt readers probably surround themselves with the latest technology. But there’s a slightly unusual class of professionals who are only now beginning to grapple with things like CP/M, 8-inch floppy disk drives and the Apple Lisa. These are the archivists, whose job is preserving cultural artifacts from all periods of history. That includes the recent past, whose technologies now seem paradoxically so strange and distant. The real-life consequences of that growing chasm between today’s digital technologies, and those that were commonplace 10, 20 or 30 years ago, are made evident in an article published by the Guardian last week:

In the belly of a former whisky store in the inner Melbourne suburb of Brunswick lies a vast and varied collection of artefacts that feminist scholars can’t wait to get their hands on.

Nearly 500 boxes in this dark, temperature-controlled warehouse hold a lifetime of handwritten letters, browning manuscripts and newspaper clippings.

But there are more modern treasures too: floppy disks containing an unpublished book about Margaret Thatcher; two computers, a Mac Powerbook G4 and iMac G5; and voicemail recordings about dinner plans in 1976.

These are all part of the archives of the well-known Australian writer Germaine Greer. According to the article, Greer has been hoarding personal documents and artifacts from the 1950s to the present day, which means they are in both analog and digital forms:

Greer’s archive includes floppy disks, tape cassettes and CD-roms, once cutting-edge technologies that are now obsolete. They are vulnerable to decay and disintegration, leftovers from the unrelenting tide of technological advancement. They will last mere decades, unlike the paper records, which could survive for hundreds of years.

It is an irony of these formerly high-tech holdings that they are far less durable than old-fashioned paper-based systems. And researchers studying them face problems of compatibility that simply don’t arise with paper. This is a major issue that is only now being faced, as cultural figures of Greer’s generation pass on their archives to universities and libraries, who must start to grapple with the core tasks of deciphering and preserving them.

The good news is that once they have been decoded, they can be transferred to other media, and in more open formats that will be easier to access in the years to come. But that still leaves the problem of how to store all these archives in a way that will stand the test of time. Perhaps they will be encoded as data held on the ultimate storage medium, DNA. Or maybe it would just be easier to print the lot out on paper.

Comments on "Archivists Grapple With Problems Of Preserving Recent Culture Held On Tape Cassettes And Floppy Drives"

Lawrence D’Oliveiro says:

Archival Is An Active Process

So you decode the files (hopefully they’re in open, documented formats), write reusable open-source tools to deal with them, then store the whole lot on current-vintage servers. Then as you upgrade your hardware, you keep copying everything onto the new machines, and maintain the software so it will keep running as well.

Archiving stuff doesn’t mean leaving it on shelves in basements. That was never true even for paper documents, or papyrus, or clay tablets, or anything else. The digital world is just more of the same.

And if you publish the software and the documents, then others can make their own copies, adding to the redundancy.

Daydream says:

Re: Re: Archival Is An Active Process

Do we have to bring up copyright now?

Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?

And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren’t, wrongful takedowns and all site seizures are guaranteed to do far more harm than good?

PaulT (profile) says:

Re: Re: Re: Archival Is An Active Process

“Do we have to bring up copyright now?”

Yes, because it’s a major issue with archiving, especially with digital media where the only known copies are DRM infected, and especially if you start redistributing the content. By definition, anything on digital media will still be under copyright today, orphaned or not, and any attack from a corporate copyright holder places the whole archive at risk.

“Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?”

Because the purpose of those sites is not to archive? I understand the sentiment, but there’s a huge difference in scope and intent.

“And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren’t”

Citation? Popular works are distributed widely, pirated or not. Obscure titles are generally not, whether they’re pirated or not. It’s to do with how many people wish to obtain them, not whether they’re pirated. Archiving is about preserving those works that are not popular or easily obtained.

“wrongful takedowns and all site seizures are guaranteed to do far more harm than good?”

This, however is correct. Avoidable collateral damage is always wrong, and taking entire sites down for a percentage of infringing content is also unacceptable. Which is why we need to be aware that legitimate archives are going to be a target from people who worship the broken copyright system.

Peter says:

The New Alexandrians

Google’s Book project has been compared to the Library of Alexandria, an attempt by the ancient Greeks to collect all the world’s knowledge in one place. With the difference that Google books can be accessed by anyone, anywhere. And it is as futureproof as Google’s computer network.

As for more personal communication: Between Facebook, NSA and Gmail, it is probably more difficult to delete the more embarrasing parts than retrieving what’s interesting ….

Chris Laarman (profile) says:

digitize, but keep the source

I think that the primary act of archiving is digitizing data at all, with file formats as a secundary choice. But that concerns the content of the data (and some metadata).
It may as well be useful to archive the original media.

Two examples come to my mind.
– My home town of Amsterdam (NL) welcomes masses of tourists who want to see the Night Watch painting by Rembrandt, even though it can be scrutinized on-line.
– Think of button-badges with slogans. Their messages may be simple and easily digitized, but that would not convey their full meaning to posterity. A movie showing protest rallies would, but having just the movie would not show the actual thing. And posterity may draw interesting conclusions from the ledgers of the badge-making industry through time.

In my part of the world, many things get archived, often with public money. I do hope that documenting our history won’t fall victim to political decisions.

Anonymous Coward says:

Re: digitize, but keep the source

The problem being discussed here is dealing with archiving when the source is digital. This runs into all sorts of problems due to no longer supported media and file formats. So the problem is one of where the source format is no longer supported, and it is necessary to resurrect old hardware, and find copies of old programs just to get the files into some usable format on more modern hardware.

TRX (profile) says:

In my experience, HD (1.44Mb) floppies have a lifespan of about ten years before the magnetic domains blur until they can’t be read. About the same for CD-ROMs – the kind you can write on your PC. I have 25-year-old commercial music CDs that still play fine, but they use a different technology.

Various backup tape formats, less than 5 years. VHS video tapes start looking cartoony after 10 years. Cassette tapes start going bad at around the same age, even if not played. (most cassettes died from wear, not age)

The lifespan of a recording medium appears to be inversely proportional to its density. Chipped stone, now *that* is permanent. Baked clay isn’t as good. That newfangled “paper” stuff, who’d want that?

The ringers are the various drum or disc audio formats; Edison cylinders and records. As long as you don’t play them they’ll last almost forever. And near the end of the LP era, there were players that used a laser instead of a stylus, so the grooves wouldn’t wear. And they’re not at all bad for density; that’s why the Compact Disc format needed 700Mb to store an LP at a reasonably high sample rate.

Anonymous Coward says:

Re: Re:

I like how you try to sound like an expert, when you’re obviously not. I’ve been hearing this “10 year” nonsense about floppies and CD-Rs for, well, MORE than 10 years. I’m kind of an amateur archivist. I love old computers. I’ve got 100s of 40 year old floppies and 20 year old CD-Rs. Pretty much all of them still play today.

I realize that is anecdotal, but that a better data point than your parroted nonsense.

Anonymous Coward says:

Re: Re: Re:

Lifespan of CD-R’s and to a lesser extent floppies depends depends of such factors as climate, especially humidity for CD’s. The idea of copying them onto fresh media every 10 years is a recommendation to maximize the chances of the files surviving for long periods. It is a bit late to attempt the copy when the media has developed unrecoverable errors due to degradation.

Anonymous Coward says:

Re: Re: Re: Re:

The idea of copying them onto fresh media every 10 years is a recommendation to maximize the chances of the files surviving for long periods.

Also, it starts looking silly to keep around 32 CDs when you can replace them with a single BD-R. But we’re overdue for some new backup media. In theory there are 119 GB discs, but I’ve never been able to find them; and the 46 GB discs cost more per byte than the normal 23 GB ones.

Soon, hard drives are likely to be the only reasonable option. But they might be running for years before they fill up, and in that time are vulnerable to disk failure, ransomware, etc. (unlike discs, which could be damaged by malware but not rewritten–and only the amount of data on the disc that’s in there at the time).

How well do HDDs work for long-term storage? I’ve rarely tried using 30-year-old drives, though last time I did they were fine. I suppose any data I cared about would have been copied to a new drive before 10 years, because keeping a 10-year-old drive spinning would be silly when newer drives would have 30X capacity. But archivists have to deal with cases where the data owner didn’t care, forgot about it in an attic, etc.; and not everyone can afford to replace hardware before it dies.

PaulT (profile) says:

Re: Re: Re:

Me too, I’ve retrieved old media plenty of times and found they work OK.

It’s all about storage, ultimately. Magnetic media can degrade quickly, but may last for a long time if taken care of properly. I certainly have floppies from the late 80s that still work. Ditto cassettes and VHS tapes, although I did have to dispose of a few due to chewing or mould issues. I’ve also seen CDs and DVDs I’ve had to throw out due to a manufacturing flaw, even if stored properly. But, the majority of my collection seems to work normally, at least last time I’d manage to check.

His comment about not using them is interesting, though, as it’s rather true. Keep using something that depends on physical contact, whether it’s a piano roll, vinyl or tape, and you run the risk of it getting damaged or simply eroded. Archivists won’t let people read very old books because acids and other chemicals present in human skin will damage the paper.

But, ultimately, it’s all about the content. It’s great to preserve the original media, but it’s also important to retrieve the content lest the original become unusable. That probably will happen for most modern storage media, although the timescales some claim do seem rather underestimated.

Lawrence D’Oliveiro says:

Re: Re: Pretty much all of them still play today.

You were lucky.

When I got my first CD writer, back around 1999 or so, one of the first things I did was make backups of all my floppy disks. In particular, I had been carefully maintaining two separate copies of the floppies containing all the programs I had written.

And guess what? There was one disk where both copies had bad sectors. Luckily not in the same files. But I came this close to losing my only copies of those files.

Floppies have never been really reliable. (Why do you think I was maintaining multiple copies?) I was happy to see the back of them.

Rekrul says:

Re: Re:

In my experience, HD (1.44Mb) floppies have a lifespan of about ten years before the magnetic domains blur until they can’t be read. About the same for CD-ROMs – the kind you can write on your PC.

For what it’s worth, I just tested one of the oldest CDs I burned from 7/16/03 and every file tested 100%. My discs are kept in the round “cake” box containers that they originally came in. Beyond that I don’t do anything special. I live in Connecticut and it gets pretty hot and humid here during the summer. I used to use air conditioning for much of the summer, but since our electricity rates went from OK to ridiculous, I now just use fans except for the very hottest days.

Vidiot (profile) says:

Assigning value

One of the core issues here is a variant on the fundamental question, “What’s worth preserving?”

For most of us, the answer would be “everything”. I get a kick out of reading mundane, slice-of-life moments from PDF’s of old regional newspapers, probably more than from viewing a digitized image of the Magna Carta. The value we assign to preservation, i.e., archiving, is a floating quantity.

But, as always, perceived value needs to be matched by monetary value. Even the monks who hand-duplicated ancient manuscripts had to be fed and sheltered… there’s always a cost. Librarians and archivists working primarily with paper archives faced this, too, but it only occurred at long intervals; today, our proliferation of digital formats means that the archivist’s job is nearly continuous. As soon as a collection has been fully migrated from one fading medium to the next, greatest platform, you can bet that the process will begin again, as new technology becomes old. Preservation cycles formerly measured in centuries are now measured, at best, in decades.

And someone needs to pay. Continuously.

As a result, archiving efforts for the biggest, most prestigious collections are funded, because we can all agree on that question of value. Not so much, though, for media of secondary interest; those are likely to be ignored until hardware vanishes, file formats disappear or magnetic coercivity fades into oblivion. Sometimes, content’s best hope is that it will drop below that secondary threshold, into the realm of “quirky ephemera”, where oddballs like me might step in and volunteer to migrate the media.

So maybe that’s the next great role for the world’s underused, underappreciated network of public libraries — archiving and preservation of mid-value content. Makerspaces and 3D printers are nice, but professional librarians all have advanced degrees in the fields that would make them indispensible to this effort.

That One Guy (profile) says:

Re: Pirates: Archivists of the modern age

But wait, it gets worse, because archiving and preserving DRM infected content requires breaking the DRM, and with the law as it stands now doing so opens you up to legal action, since even actions that are perfectly legal(archiving, creating backups) become illegal if it requires breaking or bypassing DRM.

As such you either have archivists/historians risking legal action to preserve something, pirates taking up the role of archivists, or risk the content being lost entirely.

Groaker (profile) says:

This problem has been recognized for more than 20 years. It is not solely in the domain of user devices, but professional data centers as well. First of course is the issue of bit rot, followed by loss of data structures. But most important is the loss of devices to read the data on (tape drives, disk platters, card readers, etc,) and of course the software with which to read it.

The Y2K frenzy caused a lot of hardware and software to be replaced, but much of the data wasn’t transferred. And now if you try to find a programmer who is familiar with say BDAM files, you will be luck, if the individual doesn’t suffer from dementia. Never mind the hardware to run it on.

What is on personal PCs may be significant, but the data in large centers dwarfs it in volume and value.

David (profile) says:

Sorry, this has been known for decades

This is an acknowledged issue of long standing in the computer business. So, the only irony I see is that it is once again mentioned as an, ah-ha! moment. Haalp, we might lose these dinner plans.

If you want to archive your work, collections or whatever then there are known issues that will need to be addressed. It takes time, people with time, dedication and often some actual real world cash.

Good luck, but leave off the irony for rehashed news.

John85851 (profile) says:

What's the file format

I wonder what file format the documents on the computers are using? What happens if they’re in VisiCalc ’82 format that’s unreadable simply because no one’s needed a VisiCalc emulator for 25 years? Or are there plenty of smart people who can write an emulator? 🙂
And is there any chance that the files could be in an unknown format that can’t be emulated or read?

