Cloud Looms Large for Big Data in 2020
If you’re involved with big data in 2020, then it will be hard to avoid the cloud, which has become the de-facto standard platform for storing and processing vast amounts of data. The cloud will change quickly this year, as cloud giants battle for supremacy. Successfully navigating these dynamics in the cloud could mean the difference between celebrating a big data victory and cleaning up a digital mess.
The modern cloud stack relies on Kubernetes for container orchestration. Expect progress to be made in getting big data and AI workloads enabled on Kubernetes this year, or what Haoyuan “HY” Li, founder and CTO of Alluxio, calls “Kubernetifying” the analytics stack.
“While containers and Kubernetes works exceptionally well for stateless applications like Web servers and self-contained databases, we haven’t seen a ton of container usage when it comes to advanced analytics and AI,” Li says. “In 2020, we’ll see a shift to AI and analytic workloads becoming more mainstream in Kubernetes land. ‘Kubernetifying’ the analytics stack will mean solving for data sharing and elasticity by moving data from remote data silos into K8s clusters for tighter data locality.”
At this point, we have reached a critical mass of cloud-native analytics apps, and the cloud will resign supreme in 2020 because of it, say two Information Builders vice presidents, Eric Raab and Kabir Choudry.
“Organizations using BI and analytics tools fall into one of three groups today: those currently operating in the cloud, those migrating to the cloud, and those discussing a migration to the cloud,” Raab and Choudry say. “While they may have been held back in the past by concerns over whether their platform’s architecture is designed to integrate with and exploit a cloud ecosystem, there are now proven solutions that are purpose-built for cloud-based operation. 2020 will see the floodgates open with organizations moving to the cloud to take advantage of the usability, scalability and flexibility of cloud-native solutions.”
Enterprises will become cloud-first in the deployment of all new analytic workloads, predicts Brian Wood, director of cloud marketing at Teradata.
“IT departments will be expected to default to the public cloud to support any business initiative not considered mere capacity expansion for existing infrastructure,” he says. “‘Use it or lose it’ bulk purchase agreements with public cloud vendors will spur enterprise IT departments to blindly prefer cloud deployment location over solution fit, much to their leaders’ eventual regret. The frenzy of meeting short-term budget objectives will trump the measured wisdom of considered planning and strategic investment.”
Public clouds have gotten most of the press. But in 2020, we’ll see the re-emergence of private clouds, predicts Jon Toor, chief marketing officer of Cloudian.
“Organizations with large-scale storage needs—such as those in healthcare, scientific research, and media and entertainment—face unique challenges in managing capacity-intensive workloads that can reach tens of petabytes,” Toor says. “Private clouds address these challenges by providing the scale and flexibility benefits of public clouds along with the performance, access, security, and control advantages of on-premises storage.”
The simplicity and flexibility of the cloud are big plusses. But the cloud’s utility-based pricing isn’t a great fit for all workloads. One of the square pegs in the cloud’s round hole is the current round of digital transformation initiatives, or DX 2.0, says Monte Zweben, CEO of Splice Machine.
“The ‘Cloud Disillusionment’ blossoms because the meter is always running,” he says. “Companies that rushed to the cloud finish their first phase of projects and realize that they have the same applications they had running before that do not take advantage of new data sources to make them supercharged with AI. In fact, their operating expenses actually have increased because the savings in human operators were completely overwhelmed by the cost of the cloud compute resources for applications that are always on. Ouch. These resources were capitalized before on-premise but now hit the P&L.”
The rise of hybrid computing means the traditional data center will disappear, says Yinghua Qin, the senior manager of software engineering for Quest Software’s Foglight R&D team.
“Over the next several years, we can expect to see the traditional data center disappear as cloud services, IoT, and other innovations limit the advantages that traditional on-premise data centers can offer,” Qin says. “Computing workloads will need to be located based on business needs rather than physical locations and as a result, companies will begin to move to the hybrid cloud in order to provide a more flexible infrastructure.”
Hybrid environments are growing as companies seek ways to combine on-prem and cloud-based resources. Figuring out how to manage a mix of cloud and on-prem resources won’t be easy, but emerging management planes will help, says Sean Roberts, general manager of public cloud at Ensono.
“The major cloud players are starting to recognize that they can’t own all the workloads and most companies will have a multi-cloud strategy,” Roberts says. “So a new front has opened in the war: controlling the management plane. Think Microsoft’s announcement of its Azure Arc – a set of technologies designed to bring Azure services and management to any infrastructure, enabling Microsoft cloud clients to manage resources across AWS and Google Cloud.”
The public clouds will look alike to a large segment of the population, which will make it harder for the cloud platforms to differentiate themselves. In response, clouds will go on-prem, says Sazzala Reddy, Datrium CTO and co-founder.
“Cloud agnosticism will seize the day,” Reddy says. “Cloud vendors are realizing that many enterprise customers aren’t ready to completely move to the cloud just yet. Everyone says they want cloud, but it remains a challenge for companies to make that shift. In response, cloud vendors will ship on-prem products that give large enterprises the on-prem experience with a slow migration to the cloud – AWS Outposts, for example. However, this is all temporary, because the goal of Outposts is to provide the easy bridge to their particular cloud.”
Cloud data warehouses like AWS Redshift, Snowflake, and Google BigQuery are growing quickly. But will the momentum continue in 2020? Tomer Shiran, co-founder and CEO Dremio, has his doubts.
“Given the tremendous cost and complexity associated with traditional on-premise data warehouses, it wasn’t surprising that a new generation of cloud-native enterprise data warehouse emerged,” Shiran says. “But savvy enterprises have figured out that cloud data warehouses are just a better implementation of a legacy architecture, and so they’re avoiding the detour and moving directly to a next-generation architecture built around cloud data lakes.”
In this new architecture, data doesn’t get moved or copied, there is no data warehouse, and no associated ETL, cubes, or other workarounds, Shiran says. “We predict 75% of the global 2000 will be in production or in pilot with a cloud data lake in 2020, using multiple best-of breed engines for different use cases across data science, data pipelines, BI, and interactive/ad-hoc analysis,” he continues.
But Rob Woollen, the co-founder and CEO of Sigma Computing, doesn’t see a slow down in cloud data warehouses at all. In fact, he sees cloud data warehouses (CDWs) continuing to gain traction.
“CDWs have taken off for a number of reasons,” Woollen says. “Scalability, flexibility, lower costs, and connectivity, and many people now see a data warehouse in the cloud as more secure than an on-premises system. Their reasoning is that because cloud data warehouse providers’ entire business models rely on data security and encryption, they may be better at it than you are. These companies invest heavily in security technology and dedicate entire departments to the protection of your data. CDWs can even ease the burden of compliance. By storing all data in one place, organizations don’t have to deal with the complexity of searching various discrete business systems and data stores to locate the relevant data.”
Did we mention Kubernetes? Kubernetes, or K8S, is a key technology that enables much of the flexibility of the cloud stack. Unravel Data CEO Kunal Agarwal doesn’t see the Kubernetes trend waning any time soon.
“Kubernetes recently surpassed Docker as the most talked about container technology,” Agarwal says. “In the future, every data technology will run on Kubernetes. We may not quite get there in 2020, but Kubernetes will continue to see rising adoption as more major vendors base their flagship platforms on it. There are still some kinks to be ironed out, such as issues with persistent storage, but those are currently being addressed with initiatives like BlueK8s. The entire big data community is behind Kubernetes, and its continued domination is assured.”
The intersection of cloud and open source provides fertile ground for developers, says Emil Eifrem, CEO and co-founder of Neo4j.
“The most well-positioned technology companies will be those who have a B2D (business to developer), developer/practitioner-led approach and deliver a SaaS offering ready to support practitioners as they move to the cloud,” Eifrem says. “These tend to be Open Source Software (OSS) companies who have built trust with the developer community over time, understand the evolving needs of developers, and are well versed with a classic enterprise software selling motion….OSS companies with a SaaS offering are extremely well-positioned to capture the entire spectrum of the market from individuals who expect to pay nothing, up to classic six-figure enterprise deals, both on-premise and in the cloud.”
The emerging enterprise computing platform is a combination of public cloud and on-prem resources, says Jeff Clarke, COO of Dell Technologies.
“The idea that public and private clouds can and will co-exist becomes a clear reality in 2020,” Clarke says. “Multi-cloud IT strategies supported by hybrid cloud architectures will play a key role in ensuing organizations have better data management and visibility, while also ensuring that their data remains accessible and secure…But private clouds won’t simply exist within the heart of the data center. As 5G and edge deployments continue to roll out, private hybrid clouds will exist at the edge to ensure the real-time visibility and management of data everywhere it lives. That means organizations will expect more of their cloud and service providers to ensure they can support their hybrid cloud demands across all environments.”
The public cloud might be great for developing AI, but production AI has requirements that favor on-prem resources, says Curtis Anderson, a software architect at Panasas.
“As AI projects graduate from exploratory deployments to full production, organizations will find they will need to leave the public clouds for less costly on-premises solutions, which in turn will fund and produce a boom in HPC infrastructure build-out,” Anderson predicts. “However, public clouds will have a large influence on the next generation of on-premise infrastructure…The industry has become used to the extreme flexibility and simplicity of management that public clouds provide, and therefore organizations will want to retain those characteristics in their on-premise solutions at the lower cost it provides.”
When you throw emerging data regulations into the mix, the scales tip in favor of on-prem, argues Wilson Pang, CTO of Appen.
“As more organizations experiment with more data for their AI initiatives, security and ethical use of AI will become more and more important,” Pang says. “Chief among the concerns in this arena are data leaks, especially with personally identifiable information (PII), and new product ideas and proprietary information. These concerns should lead to more on-premises solutions for enabling AI creation, including solutions for data annotation and leveraging a diversified crowd securely.”
Get ready for disruption on the edge, says NetApp Chief Strategy Officer Atish Gude. “In preparation for the widespread emergence of 5G, lower-cost sensors, and maturing AI applications will be leveraged to build compute-intensive edge environments, laying the groundwork for high bandwidth, low latency AI-driven IoT environments with the potential for huge innovation – and disruption,” he says.