Databricks Announces Availability of Apache Spark 2.3 Within its Unified Analytics Platform
SAN FRANCISCO, March 6, 2018 — Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark, today announced the availability of Apache Spark 2.3.0 on Databricks’ Unified Analytics Platform. Databricks is the first vendor to support Apache Spark 2.3 within a compute engine, Databricks Runtime 4.0, which is now generally available. In addition to support for Spark 2.3, Databricks Runtime 4.0 introduces new features including Machine Learning Model Export to simplifying production deployments and performance optimizations.
The Apache Spark community made multiple valuable contributions to the Spark 2.3 release which was introduced on February 28.
“The community continues to expand on Apache Spark’s role as a unified analytics engine for big data and AI. This is a major milestone to introduce the continuous processing mode of Structured Streaming with millisecond low-latency, as well as other features across the project,” said Matei Zaharia, creator of Apache Spark and chief technologist and co-founder of Databricks. “By making these innovations available in the newest version of the Databricks Runtime, Databricks is immediately offering customers a cloud-optimized environment to run Spark 2.3 applications with a complete suite of surrounding tools.”
The Databricks Runtime, built on top of Apache Spark, is the cloud-optimized core of the Databricks Unified Analytics Platform that focuses on making big data and artificial intelligence simple for enterprise organizations. The enhancements introduced in the Spark 2.3, which is supported within the latest Databricks Runtime 4.0, focus on usability, stability, and refinement. In addition to introducing stream-to-stream joins and extending new functionality to SparkR, Python, MLlib, and GraphX, the new release provides a millisecond-latency Continuous Processing mode for Structured Streaming.
Continuous Processing Mode for Structured Streaming
Instead of micro-batch execution, new records are processed immediately upon arrival, reducing latencies to milliseconds and satisfying low-level latency requirements. Now developers can elect either mode—continuous or micro-batching—depending on their latency requirements to build real-time streaming applications at scale while benefiting from the fault-tolerance and reliability guarantees that Structured Streaming engine afford.
In addition to support for Spark 2.3, Databricks Runtime 4.0 adds the following Databricks features:
Databricks Machine Learning Model Export simplifies Machine Learning Production Deployments
The new model export capability enables data scientists to quickly deploy machine learning models into real-time business processes. Databricks Machine Learning Model Export allows you to export models and full Machine Learning pipelines from Apache Spark and import them into Spark and other custom platforms to do scoring and make predictions.
Databricks Runtime 4.0 is up to 2x faster than Databricks Runtime 3.0
Databricks Caching in Runtime 4.0 automatically caches hot input data for a user and load balances across a cluster. It leverages the advances in NVMe SSD hardware with state-of-the-art columnar compression techniques and can improve interactive and reporting workloads performance significantly. It can cache 30 times more data than Spark’s in-memory cache. Together with other performance improvements, Databricks Runtime 4.0 is 2x faster than Databricks Runtime 3.0 in industry standard TPC-DS benchmark.
Databricks recently hosted a webinar where Reynold Xin, key committer to Apache Spark and co-founder and chief architect at Databricks, reviews how the innovations announced in Spark 2.3 and Databricks Runtime 4.0 can unite big data and machine learning. View the on-demand webinar.
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Viacom, Shell and HP. For more information, visit www.databricks.com.