Oracle Aims to Break Big Data Silos with SQL
Oracle today unveiled Big Data SQL, a new technology it’s adding to its Big Data Appliance offering that allows users to get at data in Hadoop, NoSQL, and Oracle relational data stores with a single SQL query. It’s Big Red’s take on the logical data warehouse concept first espoused by Gartner, and will only work with Oracle’s products when it ships later this summer.
Big data technologies are making this an exciting time to be in the IT business, akin to the launch of the mainframe in the 1960s, the rise of the mini-computers and the relational database in the 1970s and 80s, the distributed computing revolution unfurled by PCs in the 1990s, and then the Internet’s lurch back toward centralization of data processing in the 2000s.
With the 2010’s rise of social media, Hadoop, NoSQL databases, and scale out relational databases, we’re in the middle of another period of accelerated technology evolution, or what Stephen Jay Gould might call “punctuated equilibrium” if IT was a living species. This game is not for the faint of heart, particularly if you’re a vendor whose business is being impacted.
“The marketplace is in a bit of tizzy trying to understand, what do I do now? Do I unload my data warehouse into Hadoop? Can I build a data warehouse in Hadoop? What do I do about data standardization?” says Neil Mendelson, senior VP of product management for Oracle. “A lot of these questions are a function of the vendors essentially trying to push you to one approach or another based on their own inadequacies.”
Instead of ripping out an established Oracle-based data warehouse and replacing it with Hadoop running Hive, Impala, or any number of other manifestations of SQL on the great yellow pachyderm–which Oracle is deathly afraid of, by the way–Oracle wants to give customers a third option that allows Hadoop (and NoSQL databases to a lesser extent) to peacefully co-exist with Oracle data warehouses.
That essentially is what Oracle is introducing with Big Data SQL, which runs on its converged Big Data appliance offering. Big Data SQL effectively extends the Oracle 12c concept of an external database table to Hadoop and NoSQL, enabling customers to write one SQL query over data that exists in Hadoop, NoSQL databases, and Oracle’s database, says Dan McClary, Oracle’s product manager for Big Data SQL.
“To be specific, that means a user can write a query that joins data from that single query data from Oracle, data from Hadoop, and data from a NoSQL source, and it works and feels just exactly like an external table in Oracle,” he tells Datanami. “We’re making this much more performant than something like, say, Apache Hive, because we’re leveraging some innovation in Smart Scan surrounding the Exadata product, to minimize data movement, but at the same time maximize security. So at the same time you can have remarkably fast data over all data with all security and certainty of Oracle database.”
There are some caveats to Big Data SQL. On the Hadoop side, it only works with the OEMed version of Cloudera‘s CDH that Oracle sells for the Big Data Appliance. So if you’re interested in hooking your Oracle data warehouse up to a Hadoop cluster running on Apache Hadoop or distributions from Hortonworks, MapR Technologies, or Pivotal, you are out of luck.
On the NoSQL side, Big Data SQL will work only with Oracle’s NoSQL offering. Oracle does have plans to support other NoSQL databases, notably MongoDB, Cassandra (which is backed by DataStax), and HBase. Any other NoSQL database that supports the storage handler API will be able to be accessed by Big Data SQL.
Lastly, on the data warehouse side, Big Data SQL works only with data warehouses designed on top of Oracle 12c databases, not, say, Teradata or Pivotal’s Greenplum or HP‘s Vertica. For all the talk of democratizing access to big data sources, this is still very much an Oracle-centric solution.
Oracle is doing some nifty work on the Hadoop side, albeit only within Hadoop running on the Big Data Appliance. As McClary explains, the use of Exadata Smart Scan technology can keep data movement between Hadoop and the data warehouse to a minimum.
“We’ll read that data local to Hadoop and then only transmit back to the database only the relevant rows and columns,” he says. “So if we’re talking about JSON or Avro data, that could be poly-structured and have nested fields. If I’m only interested in a couple of fields and only in a case where a particular value is within a range, only that data will be read and sent back to the database, so you’re really getting a massive minimizing of data movement.”
This approach allows Big Data Appliance users to get the flexibility of schema on read without giving up the “full verbosity of Oracle SQL,” McClary says.
The new option is slated to be available in late August or early September. Oracle is hosting a webinar about the new Big Data SQL option later this morning. You can see all of its big data products at www.oracle.com/bigdata/index.html.