April 30, 2014

Databricks Selects Simba ODBC Driver for Shark

VANCOUVER, B.C., April 30 — Simba Technologies Inc., the industry’s expert for Big Data connectivity, announced today that Databricks has licensed Simba’s ODBC Driver as its standards-based connectivity solution for Shark, the SQL front-end for Apache Spark, the next generation Big Data processing engine. Founded by the creators of Apache Spark and Shark, Databricks is developing cutting-edge systems to enable enterprises to discover deeper insights, faster.

“We believe that Big Data is a tremendous opportunity that is still largely untapped, and we are working to revolutionize what organizations can do with it,” says Ion Stoica, Chief Executive Officer at Databricks, and Professor of Computer Science at UC Berkeley. “As part of this mission, we understand that BI tools will continue to be a key medium for consuming data and analytics and are excited to announce the availability of an enterprise-grade connectivity option for users of BI tools. Simba is the trusted name for enterprise Big Data connectivity, and was the clear partner choice for Databricks as we work to reach new heights in Big Data analytics and query speeds.”

“When it comes to distributed data, Shark is cutting edge,” notes Simba Technologies CTO George Chow. “Its innovative distributed memory abstraction enables SQL queries on Big Data at speeds up to 100 times faster than current industry norms. Pair that velocity with Simba’s Shark ODBC Driver to connect industry-leading BI tools (like Tableau and SAP Lumira) with Apache Hadoop distributions, and you’ve got an enterprise solution that revolutionizes Big Data and enables incredibly powerful business insight.”

Shark is an open-source distributed SQL query engine for Hadoop data that was originally developed at UC Berkeley’s AMPLab, delivering state-of-the-art performance and advanced analytics by using the powerful Apache Spark engine to speed up computations. Users can run Hive queries up to 100 times faster in memory or 10 times faster on disk. Shark can run unmodified Hive queries on existing warehouses, is fully compatible with existing Hive data, queries, and UDFs, and can call complex analytics functions like machine learning right from SQL. Shark supports mid-query fault tolerance, letting it scale to very large jobs and serve as the single tool for addressing the spectrum of SQL-query workloads. Furthermore, Shark is an integral part of building end-to-end data workflows with Spark that, in addition to SQL, include streaming data, graph computation, and machine-learning functionality.