Follow Datanami:
July 18, 2017

IBM Bolsters Spark Ties with Latest SQL Engine

(ami mataraj/Shutterstock)

IBM is extending its commitment to Apache Spark as a key component of in-memory analytics with the latest release of its SQL engine for Hadoop.

The new version of IBM Big SQL released last week also solidifies the company’s joint distribution deal with Hortonworks announced last month that includes Hortonwork’s Hadoop and stream processing distributions.

IBM (NYSE: IBM) said version 5.0 of its SQL platform targets enterprise requirements for data lakes by integrating Spark 2.1 on the Hortonworks Data Platform, the company’s Hadoop distribution. It also connects with Hortonworks DataFlow, the stream-processing platform.

IBM’s SQL approach emphasizes access to data across Hadoop and relational databases hosted either on-premise, in the cloud or within hybrid deployments. The company also touted a “fluid query” capability designed to enhance its ability to virtualize to data warehouses.

Stressing analytics capabilities, the new version also is positioned as the only SQL engine for Hadoop that leverages Apache Hive, Apache HBase and Spark concurrently.

Performance upgrades include faster processing than previous versions of Spark along with compatibility with Oracle (NYSE: ORCL) databases. Oracle support includes SQL database dialects, meaning many applications written against Oracle can run on Big SQL without modification.

The new version also supports IBM Open Power servers designed to run SQL and Spark on a range of Linux-based servers, the company said.

Indeed, Hortonworks (NASDAQ: HDP) is perhaps the only Hadoop distributor supporting IBM’s Power8 server, which runs “little endian” Linux as well as AIX and IBM i. The deal with IBM leaves Hortonworks as the main Hadoop and stream processing platform provider for IBM, allowing the larger company to focus on data science and machine learning software apps, such as the Data Science Experience and its PowerAI offering.

Meanwhile, the new version of its Big SQL also includes enterprise features such as SQL compliance, security and workload management. Those and other performance improvements are touted as demonstrating greater functionality than other SQL on Hadoop engines. The SQL engine also reflects IBM’s ongoing efforts to provide faster, more secure access and analytics to data stored on-premise as well as the cloud.

Meanwhile, Big SQL strengthens IBM’s ongoing commitment to Spark.

“Spark has got a great federated data approach so you don’t have to be moving your data around,” notes Mike Desens, vice president of IBM’s Z Systems unit. “Keep your [data] lakes where they are and bring your value with wherever the data gravity is.”

Depending on how analytics applications are structured, Desens added: “If they can be leveraging Spark, you can have data staying on the mainframe and staying secure.” That transactional data about clients can be combined in an analytics engine “with data that’s potentially in a public cloud,” he explained.

IBM said its Big SQL version 5.0 is available now.

Recent items:

IBM Throws In With Hortonworks’ Hadoop

Apache Spark Surrounded by Cloud Data Services at IBM