Only a Fraction of 160 Zettabyte ‘Datasphere’ to Be Stored
Think you’ve seen big data? Think again. The amount of data created in the universe will skyrocket from 16 zettabytes this year to more than 160 zettabytes by 2025, according to a new IDC report sponsored by Seagate. Luckily for the rest of us, only about 15% will be stored.
The data management challenges that we face now are nothing compared to the challenges we will face over the next decade, says Tim Bucher, the vice president of Seagate‘s consumer product division, who recently briefed Datanami on the IDC white paper, titled “Data Age 2025: The Evolution of Data to Life-Critical”
“It really poses a lot of challenges for CIOs and heads of IT and enterprises in general,” Bucher says. “How do you deal with all that data? What are you going to do? How do you secure it? How do you manage? And what do you know is valuable data? Because not all data is created equal.
“I think that’s something we as an industry really need to grapple with over the next few year and beyond,” he continues. “It’s an exciting time as this datasphere is exploding. Companies like Seagate and others who have data storage products really need to work together to help enterprises and consumers alike with what’s coming.”
Cat videos obviously will not explain the 10-fold increase in data volumes over the next eight years. What we today call the Internet of Things (IoT) of connected devices (and which in the future will probably just be called “normal life”) will be responsible for most of the run-up.
In fact, the IoT will account for 95% of this data, according to the IDC, which predicts that, by 2025, the average person will interact with connected devices nearly 4,800 times per day, or about once every 18 seconds.
We’ll be much more connected to embedded devices in 2025 than we are today, according to the authors of the IDC report, David Reinsel, John Gantz, and John Rydning. Whereas most of our digital interactions occur with PCs, servers, tablets and smart phones today, in the future, we’ll interact more with wearables, security cameras, smart meters, fueling stations, medical implants, building automation, vending machines, digital signage, and toys, the group says.
If the idea of a digital assistant like Microsoft Cortana or IBM Watson communicating with your implanted glucose meter sounds gross to you today, take solace in the fact that, by 2025, it will seem perfectly natural to most people.
“Much of this interaction will fade into the background as intelligent assistants like the Amazon Echo and intelligence built into cars become part of the environment with which consumers habitually interact,” IDC says in the report
Storage Holds Steady
The volume of machine-generated data will be so vast that it will be impossible to store it all. In fact, the IDC estimates that we’ll store only about 15% of the total amount of data we generate.
What’s interesting about that figure is that it’s actually about the same fraction that we now store, Bucher says. “It’s about the same, percentage-wise,” he says. “But if you’re denominator is growing, your numerator is going to have trouble keeping up.”
Hard disk drives (HDDs) will still account for the bulk of our data storage needs in the future, according to IDC, which predicts HDD shipments will account for 50% of all storage shipments by 2025, down from about 70% today.
Flash will gradually eat into HDDs lead and account for about 40% of the byte capacity shipped, the analyst group predicts, while tape’s share will gradually drop but won’t go away entirely, accounting for around 7% in eight years. Optical storage, which today is about equal to tape in terms of bytes shipped, will shrink to almost nothing, while DRAM storage will account for perhaps 1% or 2 % of shipped storage capacity, according to the report.
Before you go out and gobble up shares of Seagate and its main storage competitors, Western Digital and Toshiba, in anticipation of a run on storage stock, consider this: While Seagate’s solid-state and spinning disk drive shipments are essentially keeping up with the growth of data from a total byte perspective, the number of units shipped is holding steady, more or less.
“I’m not saying that [data growth] translates to more volume of HDD or SSDs, because each of those devices is growing in terms of how many terabytes you can put on one single portable drive,” Bucher says. “Two years ago the biggest portable drive was barely 2TB and today we have 5TB drive you just plug into USB port.”
Increasing areal density will be key to keeping the data deluge from getting out of hand (or perhaps more accurately, from preventing it from getting much, much worse). Seagate’s research division has been working to increase the areal density in its products with its heat-assisted magnetic recording (HAMR) technology, which is expected to help the company deliver drives with upwards of 30TB of capacity in the future, Bucher says.
All that machine-generated data in the datasphere will represent fertile ground for analysis through machine learning algorithms. In fact, ML won’t just be a “nice-to-have” feature for analyzing big data – it will be become an absolute requirement.
“If you just look at the portion that’s going to be stored, we have to rely more on machine learning and artificial intelligence to basically help us collectively deal with this digital data deluge,” Bucher says. “That’s something that I think all companies need to take more seriously as we move forward here.”
Data processing on the edge will need to improve if we’re going to keep up with the data deluge, Bucher says. Today’s smartest cars, for example, use powerful GPU processors to continually process data collected from cameras and other sensors in real time, thereby eliminating the need to write any of it to permanent storage.
“Even though there might be a gigabyte of data generated on my car every time I drive, none of it is stored,” Bucher says. “It’s processed in real time, which is what you do with machine learning or general AI technology.”
The Wildcard: Security
IDC says 90% of the global datasphere will require some level of security by 2025, including the data that’s flowing from machines and sensors in real time. However, less than half of that data will actually be secured, the analyst group predicts.
“That’s going to be one of the biggest challenges, is how do we deal with security,” Bucher says. “You already see this today. And if the datasphere and world of IoT keeps growing, we are probably going to find it more and more challenging.”
There is actually less security required on data today than what we’ll see in the future, according to the IDC. While security driven by privacy concerns will fluctuate up and down over the next eight years, the analyst firm sees a steady drumbeat of data security requirements driven by other reasons, including compliance, protection of confidential data, and data that needs to be in “lockdown.”
IDC’s “Data Age 2025” white paper can be downloaded here (pdf).