Cloudera Delivers Private Cloud Amid Public Speculation of Sale
Cloudera today announced that it has begun a limited tech preview for the on-premise version of its enterprise data lake, called Cloudera Data Platform (CDP) Private Cloud, which is its first product to support Kubernetes outside of a public cloud environment. Meanwhile, the California chose not to respond to a published report that it has discussed a possible sale with private equity firms.
CDP Private Cloud is the culmination of nearly two year’s of work to not only merge the best components of the Cloudera and Hortonworks Hadoop distributions, but also to re-architect the ensuing data platform to run in a containerized environment managed by Kubernetes, which has become the defacto standard resource scheduler for managing distributed computer clusters for the IT industry at large, instead of the YARN scheduler that has been used in Hadoop since 2014.
The company kicked off its cloud strategy last September when it delivered two versions of CDP, called Cloudera Data Warehouse (DWX) and Cloudera Machine Learning (MLX), that were designed to run specific big data workloads on Amazon Web Services. These next-gen offerings not only run atop Kubernetes, but they also use the AWS S3 object storage system instead of the Hadoop Distributed File System (HDFS) that has traditionally been the digital heart of Hadoop clusters. They also form the first two “experiences” that Cloudera is supporting on CDP Private Cloud (more on that later).
At the same time, Cloudera launched a general-purpose version of its platform that was designed to run legacy workloads on public clouds. Called Cloudera Data Hub, it retains YARN and HDFS and focused on bringing existing Spark and MapReduce applications to AWS. The company followed that up this spring with the general availability of the CDP experiences and Data Hub on Microsoft Azure utilizing its Kubernetes environment and the S3-compatible Azure Data Lake Service (ADLS) data store, and the company is in the midst of supporting Google Cloud and its respective K8s resource scheduler and object store, too.
Atop those three cloud apps – CDW, CDL, and Data Hub — Cloudera delivered two additional products that are critical to its hybrid and multi-cloud strategy. The first was Shared Data Experience (SDX), which provides security, governance, and lineage to data stored across all Cloudera solutions, as well as to synchronize or push data from one location to another. The second was Control Plane, which functions as a “single pane of glass” for administrators to spin up and spin down clusters in the cloud, on-prem, and hybrid scenarios, as well as “burst” out to the cloud when greater processing resources are needed. Today, Cloudera calls that product its CDP Management Console.
Meanwhile, Cloudera also pursued an on-premise version of its products. Many of Cloudera’s customers are quite happy running on-prem, have no desire to move to the public cloud, and perhaps are even legally required to keep their data within their own four walls due to regulatory reasons, according to Cloudera Chief Architect Doug Cutting.
“There’s a lot of folks who don’t want to get locked into a single cloud vendor, who want to run things on premises, who have large enough systems or have legal requirements where they can’t put things in a public cloud,” Cutting told Datanami in April 2019.
The on-prem lineup got going in October 2019, when Cloudera rolled out another product called CDP Data Center. This product was a superset of the legacy Cloudera Distribution of Hadoop (CDH) and Hortonworks Data Platform (HDP) offerings, and featured a full YARN/HDFS stack. It was designed to run in a traditional manner — on bare metal servers, with data locality and triple replication among data nodes — as opposed to running on containers and relying on object store’s erasure encoding routines for data protection. It delivered the full breadth of query engines and capabilities that previously were grouped under the Hadoop banner, including Spark, Hive, HBase, Impala, and the rest of the zoo.
The company is doing its best to forget those old Hadoop days, however, and the zoo animals, while still there, have been taken off display. Today, Cloudera is focused on transforming itself into a nimble provider of a data platform that can run either on the public cloud or on-prem, or even a mixture of both, which the company views as its secret weapon in the fight against public cloud domination. And the forthcoming launch of CDP Private Cloud – which is built on top of CDP Data Center and expected to become generally available later this summer – is a critical piece of that story.
CDP Private Cloud is based on CDP Data Center and relies on that product for much of its underlying data management. But instead of providing the full gamut of Hadoop zoo animals (in addition to Apache Kafka, which gained first class billing in CDP Data Center), CDP Private Cloud provides cloud-like experiences via the DWX and MLX offerings.
Because CDP Data Center is a prerequisite for CDP Private Cloud, customers can still run on bare metal if they like. But the whole point of CDP Private Cloud is to deliver those easy-to-adopt experiences rather that dealing with the heaviness that was Hadoop. Kubernetes is a core enabler of that, and CDP Private Cloud ships with RedHat’s Openshift K8s distribution, although customers can run other ones if they like. Running on Kubernetes gives the customer the power to spin up and spin down CDP resources as needed, just as it’s done in the public cloud.
“The value of CDP Private Cloud is that it offers the full power of all the components and engines available from Cloudera in a deployment that is easy to use and manage,” says Sushant Rao, Cloudera senior director of product marketing. “Aside from projects that have been deprecated due to lack of community involvement (e.g. Apache Storm), all components in CDH/HDP are available in CDP Private Cloud, either underpinning one of the experiences or through the bare metal workloads.”
With the CDP apps generally available in two out of the three public clouds and available soon in an on-prem delivery mode, Cloudera has followed through on its commitment to build a data platform that gives enterprises the flexibility to run their big data workloads wherever they like – and to move them if they want too.
“The launch today of CDP Private Cloud is the culmination of the vision for an enterprise data cloud that allows businesses to navigate complex data processes across multiple clouds, manage data governance, and enable multi-function analytics, regardless of where the data resides,” Cloudera marketing chief Mick Hollison says in a press release.
There are still some unanswered questions about CDP Private Cloud, however. First, while Kubernetes gets top billing as the primary resource scheduler, YARN will also exist on the platform as part of the CDP Data Center “base cluster,” the company says. So it will be interesting to see how Kubernetes and YARN co-exist.
Cloudera says it’s still evaluating the Kubernetes-YARN co-existence as part of its roadmap. “Running a container scheduler on top of another container scheduler is inefficient,” Rao says. “However, Kubernetes does not have all the scheduling and resource management functionality of YARN. Our preferred approach to this problem is to enhance the Kubernetes scheduler through projects like Apache Yunikorn. But even then, some applications may still take longer to port natively to Kubernetes, so we are evaluating whether a stacked scheduler approach makes sense for those.”
Storage is also a bit of an open question for Cloudera. The company has made some moves to support object storage through Project Ozone, which incorporates a key-value storage system atop HDFS and will support large numbers of smaller files much more efficiently than HDFS, which really prefers big honking files. But it’s unclear if this six-year-old project will give customers the full cloud-like experience and S3 API that the market is asking for.
Amidst all this, there are fresh questions about Cloudera’s future. On Tuesday, Bloomberg reported that the company is working with a financial advisor and has held discussions with potential buyers, including private equity firms. The publication cited people familiar with the matter but did not name them.
Following the report, Cloudera’s stock, which is traded on the New York Stock Exchange under the ticker symbol CLDR, increased by 20% in value. A Cloudera spokesperson told Datanami that the company does not comment on rumors and speculation.
Cloudera has undergone quite a bit of drama over the past year, including the ouster of its CEO Tom Reilly and co-founder Mike Olson a year ago following disappointing financial results. Rob Bearden, who formerly was the CEO of Hortonworks and still had a seat on Cloudera’s board, was brought back to the executive suite in early 2020 to replace Reilly as CEO.