Follow Datanami:
December 19, 2017

2018 Predictions: Opening the Big Data Floodgates

The pace of innovation in big data today is exceeded only by the actual volume of data generated. So how will the industry respond to the opportunities and challenges posed by huge pipes of data that will be opened up in 2018? We turn to technology leaders to find out how we can prepare.

We kick off our 2018 predictions with some warnings about the computer architectures that we use to manage data and turn it into useful information. We may have been doing it wrong, because VoltDB CEO David Flowers is forecasting the “death of Hadoop.”

“Hadoop is finding its place in enterprise, mostly for storing static data, but the hype that once surrounded the technology is certainly waning. The Strata Data Conference is now more focused on data science and AI and the Hadoop Summit has evolved into the DataWorks Summit. In addition, Hadoop pioneers Cloudera, Hortonworks and MapR have really scaled back the Hadoop-centric messaging,” Flowers writes.

Similarly, Kinetica CTO and Co-founder Nima Negahban sees 2018 marking the beginning of the end for the traditional data warehouse, which predominantly ride atop column-oriented relational databases.

Will Hadoop, data warehouses, and other data repositories be disrupted in 2018?  (mw2st/Shutterstock)

“As the volume, velocity and variety of data being generated continues to grow, and the requirements to manage and analyze this data continue to grow at a furious pace as well, the traditional data warehouse is increasingly struggling with managing this data and analysis. While in-memory databases have helped alleviate the problem to some extent by providing better performance, data analytics workloads continue to be more and more compute-bound,” Negahban writes.

“These workloads can be up to 100x faster leveraging the latest advanced processors like GPUs, however this means a near complete re-write of the traditional data warehouse,” he continues. “In 2018, enterprises will start to seriously re-think their traditional data warehousing approach and look at moving to next-generation databases either leveraging memory or advanced processors architectures (GPU, SIMD) or both.”

And anything you run on premise can probably run better on the cloud, according to David Hsieh, senior vice president of marketing for Qubole, who says “on-premise big data implementations will become obsolete.”

“Many companies have already begun the transition to cloud-first operations, and in the next few years a growing number of these companies will make the full transition to cloud-only operations. On-premises data implementations will disappear from the mainstream enterprise as businesses realize the cost, flexibility and agility benefits of cloud, and new security capabilities eliminate the last barriers to cloud adoption,” Hsieh writes.

The edge will be front and center in many business plans in 2018

Think cloud-first is cutting edge? Try “edge-first,” says Mark Barrenechea, the CEO and CTO, of OpenText. “As the number of devices requiring immediate or high-volume data processing continues to increase, edge computing will push the cloud to the sidelines, where it will act as a supporting technology. Together, the cloud and edge computing will offer the benefits of agility and savings, while providing the infrastructure we need to support the ever-expanding IoT universe,” Barrenechea writes.

Regardless of whether your data lake runs on the ground, lives in the cloud, or sits on the edge, it will need to deliver value or it’s just a waste of time and money, says Ken Hoang, vice president of strategy and alliances at Alation.

“The new dumping ground of data — data lakes — has gone through experimental deployments over the last few years, and will start to be shut down unless they prove that they can deliver value. The hallmark for a successful data lake will be having an enterprise catalog that brings information discovery, AI and information stewarding together to deliver new insights to the business,” Hoang writes.

Big data today is full of roll-your-own types, technology lovers who are up for the challenge of cobbling together their own systems. But in 2018, the pendulum will swing back towards vendors that can solve multiple challenges and deliver on opportunities with a unified product stack, according to Pete Schlampp, vice president of Workday Analytics (formerly Platfora).

“In 2018, we’ll see more and more organizations leaning on technology platforms that provide a comprehensive view of their data for richer insights and greater business agility. Whether they are seeking a single repository encompassing a company’s people and financial information, or a system that can combine data from various sources with in-house data for advanced analytics, companies will increasingly eliminate disparate technologies that contribute to siloed data in order to fully harness the potential of their data,” Schlampp writes.

Data fabrics are just getting started (agsandrew/Shutterstock)

The capability to knit multiple data flows into a coherent whole – or a data fabric, say — will be an in-demand skill in 2018, predicts Ted Dunning, the chief application architect for MapR Technologies.

“This coming year, we will see more and more businesses treat computation in terms of data flows rather than data that is just processed and landed in a database. These data flows can capture key business events and mirror business structure. A unified data fabric that breaks down silos to give comprehensive access to multiple kinds of computation and data from many sources creating a foundation for building these large-scale, flow-based systems. Databases will become the natural partner and complement of a dataflow. The emerging trend is to have a data fabric that provides data-in-motion and data-at-rest needed for multi-cloud computation provided by things like Kubernetes,” Dunning writes.

What if there was a technology that let companies easily connect seemingly unconnected dots? There is, and it will be huge in 2018, according to TigerGraph CEO Yu Xu.

Graph databases are poised for a breakout year

“We will continue to see the adoption of real-time big graphs by companies with colossal amounts of data. Real-time big graphs incorporate hundreds of billions to trillions of graph elements (vertices or edges, equivalently entities or relationships). Today, real-time big graphs are already in use by some of the world’s leading organizations, including Alipay, VISA, SoftBank, State Grid Corporation of China, and Elementum…Enterprises no longer need to struggle with slow data loading or slow query performance, and can reap insights into their big data for a unified view into their businesses,” Xu writes.

Developers today switch databases as if they’re going out of style, sometimes using multiple databases within the same application. Getting a handle on that database sprawl will be critical to managing big data, predicts Couchbase SVP of Engineering and CTO Ravi Mayuram.

“One-trick technology solutions that solve singular customer problems will begin to peel away. To maintain a lasting business strategy, companies need to become a true partner for continual innovation rather than point solutions that fill niche issues. The cost of integrating numerous solutions to a platform will not be worth the complexity and headache, and the businesses that provide one platform that fills multiple customer needs will thrive. Organizations need to adapt to customer expectations, and having an agile approach to technology will be the key differentiator,” Mayuram writes.

Forget big data – 2018 will be the year of smart data, says Sandy Steier, CEO of 1010data.

Streaming analytics will continue to grow in 2018, experts predict

“As businesses analyze more and more data, it’s apparent that big data is noisy data. Extracting insight is like looking for a needle in a haystack – requiring expensive data scientists to expend intense and laborious effort. But no doubt the insights are there and they’re valuable. In 2018, the advantage will go to the businesses that develop rapid, repeatable processes to extract signal from noise, quickly evaluating new data sets for their ability to deliver valuable insights and efficiently turning big data into smart data that business users and business-focused analysts can utilize in their everyday decision-making.”

The velocity of big data flows can cause real problems – especially when the volume is growing too. To get on top of these twin challenges, companies will ramp up their adoption of real-time data anatlycis, according to Kostas Tzoumas, the data Artisans, the company behind Apache Flink.

“Stream processing technologies will become mainstream in the enterprise by the end of 2018, moving beyond technology companies. At data Artisans, we are seeing strong adoption from large organizations in financial services, telecommunications, manufacturing and other industries. The adoption is accelerating as well and surpassing our expectations. Backing this up are analyst predictions that the streaming data applications market will reach more than $13 billion by 2021,” Tzoumas writes.

When data comes home to roost, it needs a comfy pad. For years, this job fell to hard disk drives (HDDs), but they’ve gotten a ton of bad press lately, and been labeled obsolete in an age of solid state drives (SSDs). Coming to HDD’s rescue is Peter Godman, the co-founder and CTO of Qumulo.

Tape is dead! Long live HDDs! (Full_chok/Shutterstock)

“SSD won’t be cheaper than HDD. In 2016, every all-flash vendor in the world claimed that SSD was now cheaper than HDD, based on two nonsensical claims that (1) all data is compressible and duplicated and (2) compression and dedupe don’t apply to HDDs. Western Digital’s MAMR announcement makes it clear that the ratio of NAND flash capacity cost to HDD capacity cost will remain close to 10x for years to come.”

Any discussion about storage technology going obsolete is incomplete without a mention of tape, according to David Friend, the CEO and co-founder of Wasabi Technologies, who may have a stockpile of old tapes sitting in his basement.

“Many are aware that video is by far the biggest data type on the internet by volume. What many aren’t aware of, however, is how the media and entertainment industry has been storing their own video content in the form of tapes, with tens of thousands of old shows, newscasts, unreleased feature films, etc. sitting in Hollywood basements and warehouses. Assets that are sitting in dead storage are hard to monetize.  Moving these assets back into hot storage allows them to be marketed through all the new streaming channels that are available today.   In 2018, the media and entertainment industry will embrace ‘de-archiving’ and create new revenue streams from old content,” Friend writes.

Betting on technology can be like betting on horses. Do your homework, and it can pay off big. When it comes to big data, you can bet that Anand Venugopal, the AVP and Head of Product of StreamAnalytix for Impetus Technologies, is putting his money on Apache Spark.

Picking a big data winner can be a crapshoot if you just focus on tech (Olga_i)

“Apache Spark will continue its rise and dominate widely as a de-facto big data processing engine that will be used for both traditional Ingest and ETL functionality to load the data lake and also for machine learning training and scoring jobs. Initial users will increase their Spark usage across a wider range of use-cases while other companies not using Spark will start to deploy it. Penetration levels will equal and could even surpass Hadoop adoption due to cloud-based approaches and non-Hadoop usage of Apache Spark.”

While today’s big data technology is certainly powerful, and getting more powerful by the day, one shouldn’t fall into the trap of investing in technology for technology’s sake, says Kunal Agarwal, the CEO of Unravel Data.

“In the past, people were focused on learning the various big data technologies: Hadoop, Spark, Kafka, Cassandra, etc. It took time for users to understand, differentiate, and ultimately deploy them. There was a lot of debate and plenty of hype. Now that organizations have cut through the noise and figured all that out, they’re concerned about actually putting their data to use,” Agarwal writes.

Necessity is the mother of all invention, it’s been said. That’s why today’s big data innovation is happening at customer shops, not in vendor’s labs, according to DataTorrent CEO Guy Churchward.

“Technology innovation is happening at a dizzying pace with myriad technology companies, large and small, tracking credit for these innovations. However, it’s really the customers who are the driving force behind many of the game-changing solutions that are developed. In our world, big data analytics, customers know what outcomes they need to be more competitive and, in turn, deliver value to their customers. The problem is, more often than not, they can’t find what they need and, as a result, innovation is happening at the customer site, creating their own data science recipes and exploring advanced development utilizing building blocks the industry enjoys.”

That concludes our first batch of 2018 predictions. Stay tuned for more to come.