What Cloudera Did At Strata + Hadoop World This Week
This week’s Strata + Hadoop World conference in New York City was expected to draw more than 7,000 attendees, making it biggest big data conference on the planet. It’s also a showcase for Cloudera, which is the main sponsor of the show along with O’Reilly.
Cloudera made a range of announcements at the show this week. Much of the new technology the company plans to ship, such as Apache Kudu 1.0 and Apache Spark 2.0, will find its way into customers’ hands in about three weeks via the GA of the version 5.9 release of CDH, Cloudera’s Distribution of Hadoop. Kudu, of course, is Cloudera’s new data storage and processing engine for better real-time analytics, while Spark is the uber popular general-purpose data processing engine.
We’ll see several other enhancements with CDH 5.9, including new cloud capabilities that the company unveiled this week. Charles Zedlewski, vice president of products for Cloudera, tells Datanami that 20% of Cloudera’s customers run in the cloud, and that cloud-deployments are growing quickly.
“We’ve been supporting customers on public cloud infrastructure for five years now but the growth in the last two years has been much more pronounced,” he says. “We’re trying to make the experience much more agile and flexible.”
To that end, Cloudera is introducing several enhancements in CDH 5.9 that will make it easier for customers to run Hadoop in the cloud. For starters, it’s simplifying Hadoop deployments and all the configuration decisions that go into it.
According to Zedlewski, CDH 5.9 users will be able to deploy a 100 node Hadoop cluster in under nine minutes. The company is delivering that big speed-up by pre-configuring Hadoop, and making configuration decisions on behalf of customers.
“In a fixed infrastructure context, nobody cares if it takes two hours to install something,” Zedlewski says. “But in a cloud context, you care on many accounts, because it slows the whole thing. [You want to] create it, throw it away, make it fast.”
Cloudera is also improving how Impala runs on Amazon Web Services, the undisputed giant in cloud computing. To that end, the company has integrated Impala with Amazon’s S3 object-storage format, thereby enabling Impala to read data directly from S3 and not just HDFS.
Having Impala (in CDH 5.8) hit S3 directly saves time and money for AWS customers, Zedlewski says. “We needed to optimize the performance so we can get to the same level of performance they expect when they run a parallel query-engine on fixed high-end hardware.”
Cloudera is also making it easier for customers to move CDH among the various cloud providers. To that end, the company announced that it’s supporting CDH on Microsoft Azure. The company now supports its Hadoop distribution on AWS, Google Cloud Platform, and Azure.
Lastly, the company is helping customers save money on the cloud by introducing “per sip” pricing. Instead of buying an annual subscription, Cloudera is enabling customers to buy access to cloud-based implementations of CDH on an hourly basis.
“You only pay Cloudera for what you use,” Zedlewski says. “If you want an annual subscription, that’s great. If you want to pay by the drink we can bill all the way down to the one-hour level on all the major cloud providers.”
The company also announced that its cybersecurity software effort, called Open Network Insight (ONI), is now a full-fledged project at the Apache Software Foundation called Apache Spot (incubating).
As Cloudera CEO Tom Reilly told Datanami earlier this year, ONI (now Apache Spot) is all about creating a data standard that vendors and customers can use to improve their cybersecurity posture. “ONI was designed as a forward-looking tool that uses machine learning algorithms to detect emerging threats,” Reilly said.
If you’ve ever heard Mike Olson talk, you’ve surely noticed the Cloudera co-founder and chief strategy officer’s devotion to using big data for the public good. The company now is doubling down on big data’s potential to further public health with its Precision Medicine Advisory Counci, which is designed to bolster President Barack Obama’s Precision Medicine Initiative (PMI).
According to Cloudera, the new council will help Cloudera evaluate grant applications from university labs that are requesting software, and also provide training to analyze and manage precision medicine data.
“Cloudera knows big data, and our customers, partners, and colleagues have deep expertise in biology, bioinformatics, genomics combined with technology for in research and patient-centered applications. Our new Advisory Council will help us ensure that grantees under the Cloudera Precision Medicine program are advancing the state of the art in healthcare with big data.”