Will Digital Archiving Difficulties Wipe Out Important Elements Of Our History?

Over the years, we’ve had quite a few posts about the risks of data extinction. That is, as more and more of our important data goes digital, there’s a bigger risk that it could disappear. At the very least, it’s easy for digitally stored data to become corrupted. Even if there are backups, it’s possible that multiple copies could become corrupted. A bigger concern, though, is in the applications necessary to read the data. Even if you can store the data perfectly forever, without the right applications, it’s meaningless. Matt Sullivan writes in with yet another article on the topic, this time from Popular Mechanics, that suggests we could be facing a “digital ice age” as plenty of data from this era of history are lost to bad archiving capabilities.

Of course, there are some people working on solutions. A few years ago, we wrote about Dan Bricklin’s idea that we need “social infrastructure software” that is designed to last for many years to deal with exactly this issue. Of course, that only works if such software exists and people use it. The Popular Mechanics article notes that the National Archives is working on a big system to deal with just this issue — though, when we last wrote about the system, it sounded full of potential problems, and reading the latest details are not that reassuring. Basically, they’re spending over $300 million to have Lockheed Martin build a system that will translate more than 4500 different document types into flexible formats, like XML. However, it seems quite likely that important data or metadata is likely to get lost in the process. Others are suggesting that such a plan is dangerous, and they’d be much better off focusing on emulation techniques — but again, that seems to get awfully cumbersome awfully fast, and that doesn’t even touch on the copyright issues associated with such a project. In the meantime, some are arguing that the entire problem of data extinction is overblown — saying that important data gets updated as systems change, and there will always be some way to go back and get other data if necessary.

Comments on “Will Digital Archiving Difficulties Wipe Out Important Elements Of Our History?”

Jezsik says:

"...there will always be some way..."?

I seem to recall a story about how some NASA researchers are concerned that the data collected in some older probes are already lost because the machines that can read the tapes no longer exist. We can’t use new analytical techniques to re-examine the old data – much to our loss.

I tell ya, the Babylonians were on to something with those clay tablets.

Rico J. Halo (user link) says:

another potential problem

I wonder if important data might be “lost” but not due to corruption or loss of the hardware to read it but just because there’s such a constant avalanche of new data it gets buried. I think the bigger problem is going to be keeping track of the huge amounts of new data. There’s so much that no indexing system can possibly keep up with it all. I am already seeing clients with the problem not of “lost” data per se but misplaced data because they don’t have a suitable indexing system.


Adam (user link) says:

The answer are CDs

Why not just put everything into ascii txt and store on CDs!

Cds are supposed to last like a 200 years right? Much better than floppy disks! (And yes, I’m being cyncial here!).

There are really two questoins of concern here though, not 1.

1. What format to store the data in.
2. What medium to store the data on?

I think the 2nd quesiton is far more pressing then the first.

Watching the evelotion of computing over the years, (especially in the last 20), it is reasonable to say that we can and likely will continue to have the ability to read data in old formats, even if worst comes to worst, someone just needs to write a software/application bridge to import or convert the data.

The real problem is what to store it on. Lots of real world examples. Data backed up on tape, where the reaaders are hard to come by. Floppy disks who now are unreadable because the magnetic strips have become depolarized.

CDs & DVDs…. I hate them. I have NEVER liked the format ,and thanks to HD-DVD & Blu-Ray, looks like this crappy medium is with us for a bit longer. The medium is fragile, a few scratches or to much sun and you can turn the disc into a coaster, etc. etc.

There is no perfect medium yet. I much prefer solid state flash memory, but their just getting under way now, so the formats are rapdily changing (Compact Flash, SD, SD-mini, etc.). And their prices are still to expensive and storage ability is still to limited. In a few years though, these devices should come down in price and increase in storage so that they can match or exceed the storage of a new DVD (Blu-ray / HD), and price wil hopefully be less than $1 per gig. Don’t know what the life cycle is of these though, but in about 3 years, we should start seeing notebook manufacturers start moving away from IDE based drives and using these as secondary or alternative drives.


slide23 says:

Much ado about NOTHING

I architected a digital library that focuses on historical documents. Archivists and librarians have forgotten more about how to keep historical documents safe than all of the Chicken Littles put together. The issue is completely overblown.

For starters, CD-ROM is barely accepted as an archival format by archivists. Second, the same plans that go into three-9 business continuity are often the same kinds of plans that are put into place for archiving. Rotate formats, archival hard copies, redundant digital copies stored far apart, etc, etc.

But really, this all boils down to doing a good job. The same person that would do a bad job archiving their digital documents would probably do an equally bad job archiving their “rlspc” documents. Either way, documents are just not safe in their hands.

And how many of us have ever experienced irretrievable digital documents because the application no longer existed? Data corruption I can see for not being able to open a particular file, but you did your job badly if the information in that file was lost because you did not back it up properly.

jd says:

Another license monopoly

Oh yeah, let’s put it in Adobe format and let them bend us over like Microsoft is doing. Microsoft software is already more than hardware, and now we want to give Adobe more bargaining power to force us to pay them and to manage “licensing” issues like Microsoft where it is ok to copy and use so long as you buy their hardware?? And talking about updates that hose a system… Adobe has done a good job to make sure I can only use their $500 software on ONE of my computers and it has a freshly formatted HD thanks to an Adobe update that I finally said OK to get it off my screen.

It is only a matter of time until we move to a subscripton service for software and then we can revert to older versions as part of our monthly payment. The same probably holds true for storing your data… I would pay a nominal fee to guarantee my data is backed up frequently. The problem is still how to back up all your data since the Microsoft tools to search and copy all data on a computer is still useless in my experience. It probably will not be long until Microsoft tells me that I don’t own the copyright to my data because my certificates are not valid or some other BS they are trying to force down our throat to get vendors to pay them to be their protectors. It is like hiring the wolf to watch over the sheep.

slide23 says:

Re: Another license monopoly

Oh, shut up. Big bad Adobe is going to lock down the PDF standard and suddenly we won’t be able to read our PDFs? Please stop engaging in gluteal dialectic. When you are going on an anti-corporate rant, it helps to base your viewpoint somewhere near reality.

In case you were not aware, PDF has become an ISO standard. There is even archival-specific version of PDF (1.4, ISO 19005-1) that specifically disables particular features that just MIGHT not work in future applications.

And if you felt that Adobe had you grabbing ankles, perhaps you should run a search for other PDF utilities.

chris (profile) says:

look at previous civilizations

if a civilization (digital or otherwise) was able to preserve it’s history, there would be no need for archaeologists. the presence of the field of archaeology proves that all civilizations eventually fall and all but the tiniest of bits of thier histories fall with them.

the moores’ law style of the advance of storage technology *should* mean that we can cheaply store multiple copies of everything… and yet artificial software death, closed formats, copyrights, DRM, and the like pretty much guarantee that our digital history will be lost to the next century, and possibly even to the next generation.

even if you could maintain all that was written, it is impossible to archive the semantics of what was written. you cannot archive the context of a work, it’s true meaning. look at the constitution, even though it was written in english, the language used is vastly different than what we use today… and it is interpreted differently than it was interpreted when it was written… and the document is merely two centuries old. compare that to the bible, which is far older. look at how it was interpreted by the romans, then by the puritains, and compare that to the way it is interpreted today.

spoon!?!?! says:

Too much ado about Adobe

It’s by far not the only PDF reader around, and OpenOffice supports one-click PDF creation. There IS competition, here. And still, I doubt Adobe would (or even could) lock the format down if the masses turned to them and shouted “f*ck you!”…

And I gotta agree with Rico @ 7. An exponental growth in data only compounds the problem. One aspect we should be focusing on is what data isn’t important enough to keep. We can’t just keep it all forever.

My solution: RAID 0+1’s for all!

PhysicsGuy says:

Is it possible to store data and not have it become corrupted in some way? The most complex form of data storage, the human brain, has an absurd amount of problems maintaining proper information. The reasons (regarding the emotional input of the data) are obvious as to why, however, even in our old system… writing everything down, the data still gets “corrupted” by our interpretation of the written material. I side with the skeptics who say this problem is overblown. Let the important data be updated with the advance of technology. If any important data remains in an old format, I’ll guarantee you can find an old reader of said format and then engineer it to work within whatever current system we posses.

yarrdape6 says:


The problems faced by digitally encoding data are not new problems. They are simply an extension of the problems faced by traditional means of historical data storage. Nothing lasts forever. I do think though that data does have a better chance digitally than otherwise. For the same reason that people recommend that you do not sell old HD’s , data may become corrupted, but you can bring back a lot of data that has been partially deleted. Plus digital storage offers an almost unlimited backup capacity. Redundant systems are cheap enough that if the data is important enough, we can save it indefinitely.

Thank god for Firefox spell check.

Anonymous Coward says:

Nobody is asking the real question here- exactly what IS important? We’re currently being crushed under a mountain of useless information. Are my personal pictures, notes, blogs, etc really important for future generations?

Someone mentioned it above. The stuff that’s actually important will survive. A lot of the little stuff will die. This is not a worse situation than a thousand years ago. The important stuff always survives.

