Cloudera’s Vision for Cloud Coming Into Focus
Cloudera today unveiled a host of new cloud-based offerings — including Cloudera Altus Shared Data Experience (SDX), a cloud-based machine learning offering, and a cloud-based SQL data warehouse offering — that get it one step closer to meeting its vision for the type of secure yet flexible, cloud-based data processing capabilities that its clients demand.
The cloud era for Cloudera is here. The company says 20% of its customers are already running on public cloud, a figure that it expects to grow to 60% within a few years thanks to its various Cloudera Altus offerings that run on Amazon Web Services, Microsoft Azure, and (eventually) Google Cloud Compute.
The company has four specific software packages that it wants to get on the cloud, running in a managed, platform as a service (PaaS) manner, as opposed to infrastructure as a service (IaaS), where the management is left to the customer. The first was Altus Data Engineering, which is focused on data ingest and transformation tasks, and includes Spark, Hive, Hive on Spark, and MapReduce2 engines. The second was Altus Analytic DB, which it unveiled last fall to provide an Impala-based SQL data warehouse.
Today, Cloudera announced the beginning of the beta for Altus Data Science, which will bring R- and Python-based machine learning workloads to a platform as a service (PaaS) environment. The fourth package will be an operational database that features HBase, which the company expects to unveil later this year.
Getting all these four SKUs working together and sharing data across four major deployment options (IaaS public clouds, bare metal public cloud, private cloud, and on-prem) in a secure and governed manner is a major engineering effort, which is why Cloudera is taking a phased approach to the roll-out of its so-called “four-by-four-by-one” strategy. SDX is a critical part of this voyage to the cloud.
SDX, which was first unveiled at the Strata Data Conference in New York City last fall, serves as a common metadata framework that allows various processing engines in Cloudera’s platform to share information that’s important for providing security, governance, and workload management.
Up to this point, SDX has only been available for on-premise customers who manage workloads using Cloudera Director. In that role, SDX helped to provide a standard way to take data from, say, an ETL process running in Spark and make it available for a BI workload in Impala.
With Altus SDX, Cloudera is now giving its Altus cloud customers the same capability to streamline and standardize metadata management across different workloads across their Cloudera cluster. It’s all about giving customers the capabilities they need to make the most of their data, says Mala Ramakrishnan, who works in product management and marketing for Cloudera.
“It’s standardization, it’s saving costs, it’s saving resources and time,” Ramakrishnan tells Datanami. “It gives you the flexibility to spin up and spin down clusters reliably in order to execute the workloads that you need. Without it what would happen is each time you’d actually have to sit down and configure it. It could be error prone if you have to do it manually each time.”
Sharing data across different data processing engines wasn’t important when there was only a single cloud offering, Altus Data Engineering. But as Cloudera offers more pre-configured and managed cloud offerings, SDX will serve as the common metadata substrate for sharing data and cutting across silos, says David Tishgart, who works in product marketing at Cloudera.
“SDX wasn’t necessarily important when we announced Cloudera Altus initially because Altus was single function,” he says. “Now we’ve announced Altus Analytic DB beta at the end of year. We’re talking about Altus Data Science coming soon. So Altus is now a significantly more robust platform and multi-function and there it needs something like SDX to deliver that shared data experience and allow customer to build more complex workloads and develop applications that really help the business.”
Big data applications like recommendation engines or next-best-offer systems will typically touch multiple engines and pull data from multiple places. “All these workloads that require multiple functions running together in a seamless environment, which are very difficult to create on a cloud today that has siloed analytic applications,” Tishgart says.
The majority of Cloudera customers are either in the cloud or testing the cloud waters at the moment, and Cloudera is evolving its cloud strategy to be ready to support its customers. “Cloud is the future,” Ramakrishnan says. “We do expect that about 60% of our base is going to move to the cloud in another next few years and so we’re planning toward that.”
SDX plays a critical role in supporting Cloudera’s four-by-four-by-one strategy. “Most of [our customers] are going to be migrating to the public cloud over the next few years. But it’s not like you wake up one day and everything’s all on AWS or Azure,” she continues. “While we’re going through the process, we have a unified framework that will be able to solve all the data management needs and be the data management platform choice, whether you’re on the private cloud, public cloud, bare metal or the fourth option – choosing to host your own products in an IaaS and manage it yourself.”