Tuesday, February 27, 2024

The Quest For a DNA Data Drive

How much thought do you give to where you keep your bits? Every day we produce more data, including emails, texts, photos, and social media posts. Though much of this content is forgettable, every day we implicitly decide not to get rid of that data. We keep it somewhere, be it in on a phone, on a computer’s hard drive, or in the cloud, where it is eventually archived, in most cases on magnetic tape. Consider further the many varied devices and sensors now streaming data onto the Web, and the cars, airplanes, and other vehicles that store trip data for later use. All those billions of things on the Internet of Things produce data, and all that information also needs to be stored somewhere.

Data is piling up exponentially, and the rate of information production is increasing faster than the storage density of tape, which will only be able to keep up with the deluge of data for a few more years. The research firm Gartner predicts that by 2030, the shortfall in enterprise storage capacity alone could amount to nearly two-thirds of demand, or about 20 million petabytes. If we continue down our current path, in coming decades we would need not only exponentially more magnetic tape, disk drives, and flash memory, but exponentially more factories to produce these storage media, and exponentially more data centers and warehouses to store them. Even if this is technically feasible, it’s economically implausible.

Fortunately, we have access to an information storage technology that is cheap, readily available, and stable at room temperature for millennia: DNA, the material of genes. In a few years your hard drive may be full of such squishy stuff.

Storing information in DNA is not a complicated concept. Decades ago, humans learned to sequence and synthesize DNA—that is, to read and write it. Each position in a single strand of DNA consists of one of four nucleic acids, known as bases and represented as A, T, G, and C. In principle, each position in the DNA strand could be used to store two bits (A could represent 00, T could be 01, and so on), but in practice, information is generally stored at an effective one bit—a 0 or a 1—per base.

Moreover, DNA exceeds by many times the storage density of magnetic tape or solid-state media. It has been calculated that all the information on the Internet—which one estimate puts at about 120 zettabytes—could be stored in a volume of DNA about the size of a sugar cube, or approximately a cubic centimeter. Achieving that density is theoretically possible, but we could get by with a much lower storage density. An effective storage density of “one Internet per 1,000 cubic meters” would still result in something considerably smaller than a single data center housing tape today.

Most examples of DNA data storage to date rely on chemically synthesizing short stretches of DNA, up to 200 or so bases. Standard chemical synthesis methods are adequate for demonstration projects, and perhaps early commercial efforts, that store modest amounts of music, images, text, and video, up to perhaps hundreds of gigabytes. However, as the technology matures, we will need to switch from chemical synthesis to a much more elegant, scalable, and sustainable solution: a semiconductor chip that uses enzymes to write these sequences.

After the data has been written into the DNA, the molecule must be kept safe somewhere. Published examples include drying small spots of DNA on glass or paper, encasing the DNA in sugar or silica particles, or just putting it in a test tube. Reading can be accomplished with any number of commercial sequencing technologies.

Organizations around the world are already taking the first steps toward building a DNA drive that can both write and read DNA data. I’ve participated in this effort via a collaboration between Microsoft and the Molecular Information Systems Lab of the Paul G. Allen School of Computer Science and Engineering at the University of Washington. We’ve made considerable progress already, and we can see the way forward.

How bad is the data storage problem?

First, let’s look at the current state of storage. As mentioned, magnetic tape storage has a scaling problem. Making matters worse, tape degrades quickly compared to the time scale on which we want to store information. To last longer than a decade, tape must be carefully stored at cool temperatures and low humidity, which typically means the continuous use of energy for air conditioning. And even when stored carefully, tape needs to be replaced periodically, so we need more tape not just for all the new data but to replace the tape storing the old data.

To be sure, the storage density of magnetic tape has been increasing for decades, a trend that will help keep our heads above the data flood for a while longer. But current practices are building fragility into the storage ecosystem. Backward compatibility is often guaranteed for only a generation or two of the hardware used to read that media, which could be just a few years, requiring the active maintenance of aging hardware or ongoing data migration. So all the data we have already stored digitally is at risk of being lost to technological obsolescence.

How DNA data storage works


The discussion thus far has assumed that we’ll want to keep all the data we produce, and that we’ll pay to do so. We should entertain the counterhypothesis: that we will instead engage in systematic forgetting on a global scale. This voluntary amnesia might be accomplished by not collecting as much data about the world or by not saving all the data we collect, perhaps only keeping derivative calculations and conclusions. Or maybe not every person or organization will have the same access to storage. If it becomes a limited resource, data storage could become a strategic technology that enables a company, or a country, to capture and process all the data it desires, while competitors suffer a storage deficit. But as yet, there’s no sign that producers of data are willing to lose any of it.

If we are to avoid either accidental or intentional forgetting, we need to come up with a fundamentally different solution for storing data, one with the potential for exponential improvements far beyond those expected for tape. DNA is by far the most sophisticated, stable, and dense information-storage technology humans have ever come across or invented. Readable genomic DNA has been recovered after having been frozen in the tundra for 2 million years. DNA is an intrinsic part of life on this planet. As best we can tell, nucleic acid–based genetic information storage has persisted on Earth for at least 3 billion years, giving it an unassailable advantage as a backward- and forward-compatible data storage medium. (...)

There is global interest in creating a DNA drive. The members of the DNA Data Storage Alliance, founded in 2020, come from universities, companies of all sizes, and government labs from around the world. Funding agencies in the United States, Europe, and Asia are investing in the technology stack required to field commercially relevant devices. Potential customers as diverse as film studios, the U.S. National Archives, and Boeing have expressed interest in long-term data storage in DNA.

Archival storage might be the first market to emerge, given that it involves writing once with only infrequent reading, and yet also demands stability over many decades, if not centuries. Storing information in DNA for that time span is easily achievable. The challenging part is learning how to get the information into, and back out of, the molecule in an economically viable way. (...)

The University of Washington and Microsoft team, collaborating with the enzymatic synthesis company Ansa Biotechnologies, recently took the first step toward this device. Using our high-density chip, we successfully demonstrated electrochemical control of single-base enzymatic additions. The project is now paused while the team evaluates possible next steps. Nevertheless, even if this effort is not resumed, someone will make the technology work. The path is relatively clear; building a commercially relevant DNA drive is simply a matter of time and money.

by Rob Carlson, IEEE Spectrum |  Read more:
Images: Edmon De Haro; Chris Philpot
[ed. In other emerging and probably not too distant tech, see also: Smartphone Screens Are About to Become Speakers (IEEE).]