Follow Datanami:
November 13, 2013

Oracle Expands Use of Cloudera Hadoop in Big Data Kit

Alex Woodie

Oracle is often heralded as the biggest purveyor of “legacy” IT gear that inevitably will be replaced in this brave new big data world. But the reality is a little more complex, and that became evident yesterday when Oracle announced that it’s now preloading the entire Cloudera stack with its latest Big Data Appliance, dubbed X4-2.

Oracle has had a relationship with Cloudera for close to two years. The partnership goes back to January 2012, when the companies announced that the Oracle Big Data Appliance would include the core Cloudera Distribution for Hadoop (CDH) offering, along with Cloudera Manager.

The companies cemented that partnership a bit more this week when Oracle announced that the entire Cloudera Hadoop stack will be preloaded on the latest X64-based appliances. So in addition to the core CDH distribution (HDFS and MapReduce) and Cloudera Manager, Oracle is now shipping Cloudera Impala, Cloudera Search, HBase, Cloudera Backup and Disaster Recovery, and Cloudera Navigator.

Oracle has obviously found a Hadoop distributor that it likes, and is sticking with it. Cloudera has jumped out to an early lead in the Hadoop space, according to the IDC, which found that, among organizations that have deployed Hadoop, about 25 percent of them are using CDH.

CDH joins other key components in the Big Data Appliance, including Oracle NoSQL database and Oracle Big Data Connectors. The Big Data Connectors handle much of the ETL-related work involved with prepping data for processing in Hadoop, and enabling Oracle’s R distribution, SQL, and XQuery processing engines to be used against data residing in the appliance.

Oracle also bolstered its XQuery for Hadoop technology. This technology, which came from the Oracle database side of the house, will allow users to query XML and JSON files stored in HDFS. Big Data Appliance customers will also find a better R Connector for Hadoop that provides more statistical algorithms.

The Big Data Connectors are key to Oracle’s strategy of positioning its Oracle Exadata appliance as an adjunct data processing engine for more SQL-related workloads. Thanks to speedy Infiniband interconnects, data can be moved between the Big Data Appliance and Exadata systems at speeds up to 15TB per hour.

On the hardware front, Oracle has increased the total storage capacity in its Big Data Appliance to 864TB, a 33 percent increase over the previous X4-3 appliance, largely due to the use of 4TB SAS disks, instead of 3TB disks. Oracle has also upgraded the compute nodes to use Intel Xeon E5-265 V2 processors. Each X4-2 rack can contain up to 18 individual compute and storage nodes, in one-third increments. According to Oracle’s data sheet for the product, each node has two eight-core Xeon processors, 64GB of memory (expandable to 512GB), 12 disk drives, 2 QDR Infiniband ports, and four 10Gbit Ethernet ports.

Oracle also announced that it’s joined the Apache Sentry project as a co-founder. Sentry is an Apache incubator project started by Cloudera that’s aimed at providing role-based access to Hadoop resources. Oracle is utilizing Sentry in its Big Data Appliance, along with LDAP-based authorization and support for Kerberos authentication as well.

It’s not the first time Oracle has taken steps to improve the security of Hadoop. In September, the company announced that it hooked its Audit Vault and Database Firewall software up to Hadoop running on its appliance, thereby giving administrators alerts whenever suspicious or unauthorized activities occur in Hadoop.

Oracle Big Data Appliance starts at around half a million dollars, which the company claims is about 40 percent less than if an organization tried to build a Hadoop environment itself.

Related Items:

Oracle Addresses Hadoop Security with Big Data Appliance

Oracle Gives 12c Database a Column-Oriented Makeover

IDC Report: Cloudera Leading Hadoop Distro Choices

 

Datanami