Sunday, August 7, 2011

When Data Disappears

by Kari Kraus

Last spring, the Harry Ransom Center at the University of Texas acquired the papers of Bruce Sterling, a renowned science fiction writer and futurist. But not a single floppy disk or CD-ROM was included among his notes and manuscripts. When pressed to explain why, the prophet of high-tech said digital preservation was doomed to fail. “There are forms of media which are just inherently unstable,” he said, “and the attempt to stabilize them is like the attempt to go out and stabilize the corkboard at the laundromat.”

Mr. Sterling has a point: for all its many promises, digital storage is perishable, perhaps even more so than paper. Disks corrode, bits “rot” and hardware becomes obsolete.

But that doesn’t mean digital preservation is pointless: if we’re going to save even a fraction of the trillions of bits of data churned out every year, we can’t think of digital preservation in the same way we do paper preservation. We have to stop thinking about how to save data only after it’s no longer needed, as when an author donates her papers to an archive. Instead, we must look for ways to continuously maintain and improve it. In other words, we must stop preserving digital material and start curating it.

At first glance, digital preservation seems to promise everything: nearly unlimited storage, ease of access and virtually no cost to making copies. But the practical lessons of digital preservation contradict the notion that bits are eternal. Consider those 5 1/4-inch floppies stockpiled in your basement. When you saved that unpublished manuscript on them, you figured it would be accessible forever. But when was the last time you saw a floppy drive?

And even if you could find the right drive, there’s a good chance the disk’s magnetic properties will have decayed beyond readability. The same goes, generally speaking, for CD-ROMs, DVDs and portable drives.

Even the software needed to read the bits may prove elusive. Like Egyptian hieroglyphs, whose code was indecipherable until the rediscovery of the Rosetta Stone, the string of 1s and 0s on a floppy is meaningless in the absence of a set of computer instructions for translating them. If you don’t have a copy of WordPerfect 2 around, you’re out of luck. No wonder preservationists often wax ominous about the “digital dark ages.”

Of course, there’s always the option of migrating data from old to new media. But migration isn’t as simple as copying files — it’s more like translating from Japanese to Hungarian. Information is invariably lost; do it enough times and the result will be like the garbled message at the end of a game of telephone.

Another option is emulation, in which a software program impersonates a retro hardware environment; essentially, an emulator temporarily “downgrades” a modern computer to act like an old one. But over time, emulation becomes unwieldy: because the host systems for which emulators are designed will themselves become obsolete, emulators must eventually be moved to new computer platforms — emulators to run emulators, ad infinitum.

Nor is the problem just with the medium. We generate over 1.8 zettabytes of digital information a year. By some estimates, that’s nearly 30 million times the amount of information contained in all the books ever published. Even if we had perfectly stable storage, could we ever have enough to preserve everything?

The short answer is no — but only because we’re trying to replicate the practices used for decades to maintain paper archives. In this model, preservation begins only after a record is past its use. With data, intervention needs to happen earlier, ideally at an object’s creation. And tough decisions need to be made, early on, regarding what needs to be saved. We must replace digital preservation with digital curation.

Read more: