August 25, 2015

Spark 1.5 to Incorporate ‘Tungsten’ Upgrades

George Leopold

A preview release of the Apache Spark open source in-memory processing framework incorporates major performance upgrades, according to Databricks Inc., the big data processing company founded by Spark’s creators.

Databricks said Aug. 18 is expects to release Apache Spark 1.5 in a “few weeks,” adding the preview release would allow the Spark community to conduct quality assurance testing.

Databricks said Spark 1.5 would focus on “under-the-hood changes to improve Spark’s performance, usability and operational stability.” Included in the latest release is the first phase of “Project Tungsten,” described as a new execution backend for DataFrames/SQL. The DataFrames API was introduced in February to expand big data processing to a wider audience.

Project Tungsten is touted as the largest change to Spark’s execution engine since the project’s inception and is designed to bring Spark closer to bare metal by improving memory and processing efficiency for Spark applications.

“Through code generation and cache-aware algorithms, Project Tungsten improves the runtime performance with out-of-the-box configurations,” Databricks engineers noted in a recent blog post. “Through explicit memory management and external operations, the new backend also mitigates the inefficiency in [Java virtual machine] garbage collection and improves robustness in large-scale workloads.”

Databricks also provided a performance comparison between Spark 1.4 and the latest version that implements Project Tungsten. The comparison shows “out-of-the-box” performance with no configuration changes for an aggregation query totaling 16 million records and 1 million composite keys. Runtime for an aggregation query on Spark 1.5 appears to be nearly four times faster on the new version, according to the performance comparison.

Source: Databricks Inc.

Another new feature dubbed “back pressure for Spark Streaming” is touted as allowing dynamic control of data ingestion rates ” to adapt to unpredictable variations in processing load,” Databricks said. “This allows streaming applications to be more robust against bursty workloads and downstream delays.”

Other new features in Spark 1.5 include new machine learning algorithms, a “sequential pattern mining” capability, improved R language support and upgrades for reporting memory usage.

San Francisco-based Databricks said Spark 1.5.0 would be released for testing on a “ready-to-go” cluster. Since multiple versions are supported, Spark 1.5 “canary clusters,” or the push of code changes, would be able to run with existing Spark production clusters, the company noted.

Databricks estimates that more than 220 open source developers from over 80 organizations contributed to the latest Spark release.

A Spark overview and a progress report on Apache Spark 1.5 can be found here.

The pace of Spark development is accelerating. Spark 1.4 was released in June, incorporating support for R notebooks and SparkR, the Databricks hosted service.

Spark got another boost in June when IBM announced it would integrate Spark software into the “core” of its analytics and commerce platforms. It also began offering Apache Spark as a service on its Bluemix cloud application development platform.

Recent items:

IBM, Databricks Join Forces to Advance Spark

Hortonworks Hatches a Roadmap to Improve Apache Spark

Applications: Predictive Analytics

Technologies: Frameworks

Sectors: Financial Services, Healthcare, Manufacturing, Retail

Tags: apache spark, databricks, in-memory analytics, Spark 1.5

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Spark 1.5 to Incorporate ‘Tungsten’ Upgrades

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Spark 1.5 to Incorporate ‘Tungsten’ Upgrades

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link