Big Data • Big Analytics • Big Insight

July 1, 2014

Alteryx and Databricks to Lead Development of SparkR for Hadoop

SAN FRANCISCO, Calif., July 1  Alteryx and Databricks today announced they are collaborating to drive the value of Apache Hadoop and Spark into the hands of everyday analysts. These companies will become the primary committers to SparkR, a subset of the overall Spark framework. In addition, Alteryx and Databricks are announcing a technology and go-to-market partnership to accelerate the adoption of SparkR and SparkSQL, in order to help analysts get greater value from Spark as the leading open-source in-memory engine.

“We are focused on becoming the most complete option for data analysts across the Hadoop landscape. Our goal is to empower analysts to utilize data everywhere to make the best analytic decisions possible,” said George Mathew, President and COO of Alteryx. “We believe the Apache Spark framework to be the primary method for our customers to achieve scalable, analytic freedom with their Hadoop investment. We’re delighted to be driving the new analytic stack with Databricks.”

Apache Spark, an open source data analytics framework, has quickly been gaining traction for its fast and scalable in-memory analytic processing capabilities inside and independent of Hadoop. SparkR is an R package that enables the R programming language to run inside of the Spark framework in order to manipulate the data for analytics. The collaboration between Alteryx and Databricks will foster faster delivery of a market leading in-memory engine for R-based analytics within Hadoop that is available for the Spark community. Together the companies will work to bring the SparkR package to a 1.0 production version, utilizing a growing array of machine learning algorithms.

“The strong traction that Apache Spark has gained in the industry is a clear indication of the value to the broad user community and the need to further invest in the development of projects such as SparkR,” said Amr Awadallah, chief technology officer at Cloudera. “Eliminating the complexities of analytics in Hadoop for users will enable everyday analysts to deliver highly scalable analytics in their Hadoop-based enterprise data hubs.”

“The Databricks team is putting forward the best technology for the betterment and adoption of Apache Spark,” said Ion Stoica, CEO of Databricks. “Our collaboration with Alteryx on SparkR will only accelerate this value to a wider audience.”

Alteryx and Databricks will also collaborate on joint technology and go-to-market activities to speed the ease of use and adoption of SparkR and SparkSQL technologies for data blending and advanced analytics on the Spark platform.

Alteryx will be adopting the Apache Spark framework into a future release of the Alteryx Analytics platform to allow its customers to achieve faster, scalable analytics across all of their data. As an important foundation, Alteryx will support the ability to read and write  directly to Hadoop HDFS in an upcoming release to the Alteryx Analytics platform.