Archivists Grapple With Problems Of Preserving Recent Culture Held On Tape Cassettes And Floppy Drives
from the digital-archaeology dept
Most Techdirt readers probably surround themselves with the latest technology. But there’s a slightly unusual class of professionals who are only now beginning to grapple with things like CP/M, 8-inch floppy disk drives and the Apple Lisa. These are the archivists, whose job is preserving cultural artifacts from all periods of history. That includes the recent past, whose technologies now seem paradoxically so strange and distant. The real-life consequences of that growing chasm between today’s digital technologies, and those that were commonplace 10, 20 or 30 years ago, are made evident in an article published by the Guardian last week:
In the belly of a former whisky store in the inner Melbourne suburb of Brunswick lies a vast and varied collection of artefacts that feminist scholars can’t wait to get their hands on.
Nearly 500 boxes in this dark, temperature-controlled warehouse hold a lifetime of handwritten letters, browning manuscripts and newspaper clippings.
But there are more modern treasures too: floppy disks containing an unpublished book about Margaret Thatcher; two computers, a Mac Powerbook G4 and iMac G5; and voicemail recordings about dinner plans in 1976.
These are all part of the archives of the well-known Australian writer Germaine Greer. According to the article, Greer has been hoarding personal documents and artifacts from the 1950s to the present day, which means they are in both analog and digital forms:
Greer’s archive includes floppy disks, tape cassettes and CD-roms, once cutting-edge technologies that are now obsolete. They are vulnerable to decay and disintegration, leftovers from the unrelenting tide of technological advancement. They will last mere decades, unlike the paper records, which could survive for hundreds of years.
It is an irony of these formerly high-tech holdings that they are far less durable than old-fashioned paper-based systems. And researchers studying them face problems of compatibility that simply don’t arise with paper. This is a major issue that is only now being faced, as cultural figures of Greer’s generation pass on their archives to universities and libraries, who must start to grapple with the core tasks of deciphering and preserving them.
The good news is that once they have been decoded, they can be transferred to other media, and in more open formats that will be easier to access in the years to come. But that still leaves the problem of how to store all these archives in a way that will stand the test of time. Perhaps they will be encoded as data held on the ultimate storage medium, DNA. Or maybe it would just be easier to print the lot out on paper.
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+
Filed Under: archivists, cassette tapes, deteriorating media, digital archives, floppy drives, obsolete formats, old media, preservation
Comments on “Archivists Grapple With Problems Of Preserving Recent Culture Held On Tape Cassettes And Floppy Drives”
Archival Is An Active Process
So you decode the files (hopefully they’re in open, documented formats), write reusable open-source tools to deal with them, then store the whole lot on current-vintage servers. Then as you upgrade your hardware, you keep copying everything onto the new machines, and maintain the software so it will keep running as well.
Archiving stuff doesn’t mean leaving it on shelves in basements. That was never true even for paper documents, or papyrus, or clay tablets, or anything else. The digital world is just more of the same.
And if you publish the software and the documents, then others can make their own copies, adding to the redundancy.
Re: Archival Is An Active Process
“And if you publish the software and the documents, then”
…someone sues to get the archive shut down because some obscure part of the release contained copyrighted material without the archivists’ knowledge.
Sadly, corrected for accuracy.
Re: Re: Archival Is An Active Process
Do we have to bring up copyright now?
Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?
And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren’t, wrongful takedowns and all site seizures are guaranteed to do far more harm than good?
Re: Re: Re: Archival Is An Active Process
“Do we have to bring up copyright now?”
Yes, because it’s a major issue with archiving, especially with digital media where the only known copies are DRM infected, and especially if you start redistributing the content. By definition, anything on digital media will still be under copyright today, orphaned or not, and any attack from a corporate copyright holder places the whole archive at risk.
“Well, if you have to, why not point out how the takedowns of Megaupload and KickassTorrents prove that copywrong holders are willing to destroy digital archives if it means making a few cents more money?”
Because the purpose of those sites is not to archive? I understand the sentiment, but there’s a huge difference in scope and intent.
“And that because pirated works are, almost by definition, widely distributed, whereas innocent and legitimate files frequently aren’t”
Citation? Popular works are distributed widely, pirated or not. Obscure titles are generally not, whether they’re pirated or not. It’s to do with how many people wish to obtain them, not whether they’re pirated. Archiving is about preserving those works that are not popular or easily obtained.
“wrongful takedowns and all site seizures are guaranteed to do far more harm than good?”
This, however is correct. Avoidable collateral damage is always wrong, and taking entire sites down for a percentage of infringing content is also unacceptable. Which is why we need to be aware that legitimate archives are going to be a target from people who worship the broken copyright system.
Germaine Greer
An unpublished book about Margaret Thatcher? I do hope that one hits the shelves.
Re: Germaine Greer
Germaine Greer ain’t germane any more. I don’t reckon any of her archived material will be significant to society. Since she made a name for herself, she became a pompous ass – spent too much time with the poms.
The New Alexandrians
Google’s Book project has been compared to the Library of Alexandria, an attempt by the ancient Greeks to collect all the world’s knowledge in one place. With the difference that Google books can be accessed by anyone, anywhere. And it is as futureproof as Google’s computer network.
As for more personal communication: Between Facebook, NSA and Gmail, it is probably more difficult to delete the more embarrasing parts than retrieving what’s interesting ….
digitize, but keep the source
I think that the primary act of archiving is digitizing data at all, with file formats as a secundary choice. But that concerns the content of the data (and some metadata).
It may as well be useful to archive the original media.
Two examples come to my mind.
– My home town of Amsterdam (NL) welcomes masses of tourists who want to see the Night Watch painting by Rembrandt, even though it can be scrutinized on-line.
– Think of button-badges with slogans. Their messages may be simple and easily digitized, but that would not convey their full meaning to posterity. A movie showing protest rallies would, but having just the movie would not show the actual thing. And posterity may draw interesting conclusions from the ledgers of the badge-making industry through time.
In my part of the world, many things get archived, often with public money. I do hope that documenting our history won’t fall victim to political decisions.
Re: digitize, but keep the source
The problem being discussed here is dealing with archiving when the source is digital. This runs into all sorts of problems due to no longer supported media and file formats. So the problem is one of where the source format is no longer supported, and it is necessary to resurrect old hardware, and find copies of old programs just to get the files into some usable format on more modern hardware.
In my experience, HD (1.44Mb) floppies have a lifespan of about ten years before the magnetic domains blur until they can’t be read. About the same for CD-ROMs – the kind you can write on your PC. I have 25-year-old commercial music CDs that still play fine, but they use a different technology.
Various backup tape formats, less than 5 years. VHS video tapes start looking cartoony after 10 years. Cassette tapes start going bad at around the same age, even if not played. (most cassettes died from wear, not age)
The lifespan of a recording medium appears to be inversely proportional to its density. Chipped stone, now *that* is permanent. Baked clay isn’t as good. That newfangled “paper” stuff, who’d want that?
The ringers are the various drum or disc audio formats; Edison cylinders and records. As long as you don’t play them they’ll last almost forever. And near the end of the LP era, there were players that used a laser instead of a stylus, so the grooves wouldn’t wear. And they’re not at all bad for density; that’s why the Compact Disc format needed 700Mb to store an LP at a reasonably high sample rate.
Re: Re:
I like how you try to sound like an expert, when you’re obviously not. I’ve been hearing this “10 year” nonsense about floppies and CD-Rs for, well, MORE than 10 years. I’m kind of an amateur archivist. I love old computers. I’ve got 100s of 40 year old floppies and 20 year old CD-Rs. Pretty much all of them still play today.
I realize that is anecdotal, but that a better data point than your parroted nonsense.
Re: Re: Re:
Lifespan of CD-R’s and to a lesser extent floppies depends depends of such factors as climate, especially humidity for CD’s. The idea of copying them onto fresh media every 10 years is a recommendation to maximize the chances of the files surviving for long periods. It is a bit late to attempt the copy when the media has developed unrecoverable errors due to degradation.
Re: Re: Re: Re:
It also depends on when the floppies were made. Earlier floppies were of substantially higher quality than later floppies, and tend to last multiple decades unless abused.
Re: Re: Re: Re:
Re: Re: Re:
Me too, I’ve retrieved old media plenty of times and found they work OK.
It’s all about storage, ultimately. Magnetic media can degrade quickly, but may last for a long time if taken care of properly. I certainly have floppies from the late 80s that still work. Ditto cassettes and VHS tapes, although I did have to dispose of a few due to chewing or mould issues. I’ve also seen CDs and DVDs I’ve had to throw out due to a manufacturing flaw, even if stored properly. But, the majority of my collection seems to work normally, at least last time I’d manage to check.
His comment about not using them is interesting, though, as it’s rather true. Keep using something that depends on physical contact, whether it’s a piano roll, vinyl or tape, and you run the risk of it getting damaged or simply eroded. Archivists won’t let people read very old books because acids and other chemicals present in human skin will damage the paper.
But, ultimately, it’s all about the content. It’s great to preserve the original media, but it’s also important to retrieve the content lest the original become unusable. That probably will happen for most modern storage media, although the timescales some claim do seem rather underestimated.
Re: Re: Re: Re:
You are not thinking like an archivist, who set the time before copying to new media to be less than that required for the first failures to start to show up in a cohort of tapes/disks etc. To start copying after the failures start to show up is to lose some of the archives.
Re: Re: Re:
Let’s see… a 40-year-old floppy would be from 1976.
I bow in respect for your greatness! And the hundreds of 1.2 and 1.44Mb floppies my floppy drives ground on, trying to recover anything, were obviously figments of my imagination.
Re: Re: Pretty much all of them still play today.
You were lucky.
When I got my first CD writer, back around 1999 or so, one of the first things I did was make backups of all my floppy disks. In particular, I had been carefully maintaining two separate copies of the floppies containing all the programs I had written.
And guess what? There was one disk where both copies had bad sectors. Luckily not in the same files. But I came this close to losing my only copies of those files.
Floppies have never been really reliable. (Why do you think I was maintaining multiple copies?) I was happy to see the back of them.
Re: Re:
For what it’s worth, I just tested one of the oldest CDs I burned from 7/16/03 and every file tested 100%. My discs are kept in the round “cake” box containers that they originally came in. Beyond that I don’t do anything special. I live in Connecticut and it gets pretty hot and humid here during the summer. I used to use air conditioning for much of the summer, but since our electricity rates went from OK to ridiculous, I now just use fans except for the very hottest days.
Assigning value
One of the core issues here is a variant on the fundamental question, “What’s worth preserving?”
For most of us, the answer would be “everything”. I get a kick out of reading mundane, slice-of-life moments from PDF’s of old regional newspapers, probably more than from viewing a digitized image of the Magna Carta. The value we assign to preservation, i.e., archiving, is a floating quantity.
But, as always, perceived value needs to be matched by monetary value. Even the monks who hand-duplicated ancient manuscripts had to be fed and sheltered… there’s always a cost. Librarians and archivists working primarily with paper archives faced this, too, but it only occurred at long intervals; today, our proliferation of digital formats means that the archivist’s job is nearly continuous. As soon as a collection has been fully migrated from one fading medium to the next, greatest platform, you can bet that the process will begin again, as new technology becomes old. Preservation cycles formerly measured in centuries are now measured, at best, in decades.
And someone needs to pay. Continuously.
As a result, archiving efforts for the biggest, most prestigious collections are funded, because we can all agree on that question of value. Not so much, though, for media of secondary interest; those are likely to be ignored until hardware vanishes, file formats disappear or magnetic coercivity fades into oblivion. Sometimes, content’s best hope is that it will drop below that secondary threshold, into the realm of “quirky ephemera”, where oddballs like me might step in and volunteer to migrate the media.
So maybe that’s the next great role for the world’s underused, underappreciated network of public libraries — archiving and preservation of mid-value content. Makerspaces and 3D printers are nice, but professional librarians all have advanced degrees in the fields that would make them indispensible to this effort.
Physical DRM
There, the perfect DRM!
I suggest the MAFIAA releases their crap.. Ahem, awesome works in tapes or 5 in disks. Nobody will ever pirate anything in the future!
If you think preserving culture on old cassettes and floppies is hard, wait until someone tries to preserve today’s current digital only, DRM locked culture. How do you preserve a library of games that only exist in digital form, sent directly from the company servers to a locked down game console?
Re: Pirates: Archivists of the modern age
But wait, it gets worse, because archiving and preserving DRM infected content requires breaking the DRM, and with the law as it stands now doing so opens you up to legal action, since even actions that are perfectly legal(archiving, creating backups) become illegal if it requires breaking or bypassing DRM.
As such you either have archivists/historians risking legal action to preserve something, pirates taking up the role of archivists, or risk the content being lost entirely.
This problem has been recognized for more than 20 years. It is not solely in the domain of user devices, but professional data centers as well. First of course is the issue of bit rot, followed by loss of data structures. But most important is the loss of devices to read the data on (tape drives, disk platters, card readers, etc,) and of course the software with which to read it.
The Y2K frenzy caused a lot of hardware and software to be replaced, but much of the data wasn’t transferred. And now if you try to find a programmer who is familiar with say BDAM files, you will be luck, if the individual doesn’t suffer from dementia. Never mind the hardware to run it on.
What is on personal PCs may be significant, but the data in large centers dwarfs it in volume and value.
digital records
The fact that digital doesn’t last as long as paper is a major reason why the world will never have a truly digital-only office. There will always be something written down to preserve it.
Sorry, this has been known for decades
This is an acknowledged issue of long standing in the computer business. So, the only irony I see is that it is once again mentioned as an, ah-ha! moment. Haalp, we might lose these dinner plans.
If you want to archive your work, collections or whatever then there are known issues that will need to be addressed. It takes time, people with time, dedication and often some actual real world cash.
Good luck, but leave off the irony for rehashed news.
What's the file format
I wonder what file format the documents on the computers are using? What happens if they’re in VisiCalc ’82 format that’s unreadable simply because no one’s needed a VisiCalc emulator for 25 years? Or are there plenty of smart people who can write an emulator? 🙂
And is there any chance that the files could be in an unknown format that can’t be emulated or read?
That makes sense, but it does not really explain how they are doing it. It just says we will not keep as many copies of your older images as we do of your most recent images. However we will keep a copy on a device or tape (?) somewhere if we need to get it….