Follow Datanami:
August 21, 2012

DNA to Carry New Data Burden

Data scientists have been interested in storing information biologically for some time. After all, DNA was coding proteins and enzymes long before humans started coding electromagnets into bits of information.

Harvard scientists from the Wyss Institute have taken a significant step forward, encrypting 700 terabytes of data such that a single gram of DNA would carry almost a zettabyte of information, defeating the previous DNA information density record by a factor of a thousand.

George Church and Sriram Kosuri have found a way to store data as DNA in an incredibly efficient manner. In essence, each base pair acts as either a 0 or 1 (where thymine and guanine equal zero and adenine and cytosine equal one) with the exception of the initial constant, nineteen-bit start strand which acts much the same as a start sequence for an amino acid.

From those 19 initial bits, 96 bits, or 12 bytes, of information can be conveyed. This means that 83% of the DNA can be used to actually hold information. However, it can only do that if the DNA isn’t actually part of a living organism.

“In an organism,” Church said, “your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”

To test their technique, they stored 70 billion copies of a book Church had written, taking up 5.5 petabits overall.

It should be noted that this is a purely storage enterprise. While retrieval is relatively simple (the 19-bit initial strand also serves as an address of sorts), getting the data to interact with each other while in biological form is impractical. After all, DNA’s primary function is to synthesize amino acids for proteins. Perhaps the data-carrying DNA can synthesize some new fascinating proteins which can then interact and tell researchers something neat, but this seems doubtful.

Either way, DNA would immediately become both the cheapest and least massive storage medium on earth, storing in fractions of a gram what it reportedly takes 151 kilograms of hard drive. Further, it takes massive, energy-intensive air-conditioning systems to cool data warehouses to an acceptable temperature. DNA has no such shortfalls and can be theoretically preserved indefinitely.

The possibilities here are endless. In a piece here last week, NetApp’s Dave Einstein discussed how he believes the amount of data will far outgrow our ability to store it by the year 2020. However, storing data in the form of DNA would almost instantly solve that problem in the immediate future.

The article mentions placing cameras in every nook and cranny of the world to record everything, leading to some unfortunate Orwellian state. But there exist more optimistic and exciting possibilities. For example, physicists could build entire models of the universe on the back of DNA. Or geneticists could break the fourth wall by storing their DNA research in actual DNA.

According to Kosuri, the entire world’s information could be stored in four grams of DNA. If this technology is viable and catches on, a tiny Petri dish in a biologist’s laboratory could end up containing all of the information gleaned by the astrophysics department next door.

Related Stories

A Different Einstein on Another Old Problem

The Path to Personalized Medicine

Live from GTC Asia: Accelerating Big Science