June 15, 2015

IBM, Databricks Join Forces to Advance Spark

George Leopold

IBM has jumped on the Apache Spark bandwagon, revealing it would throw its considerable weight behind the open source in-memory processing framework that has been gaining momentum over the last year.

Separately, Databricks, the company formed by the creators of the analytics engine, released Apache Spark 1.4 that includes the SparkR API, its first new language API since 2012.

IBM said Monday (June 15) it would integrate Spark software into the “core” of its analytics and commerce platforms while offering Apache Spark as a service on its Bluemix cloud application development platform.

The commitment to Apache Spark also gives IBM another vehicle besides its Watson cognitive computing platform for advancing its machine learning technology.

Along with advancing Spark’s machine learning capabilities through collaboration with Databricks, IBM also said it would open a Spark Technology Center in San Francisco while committing more than 3,500 developers and researchers to focus on Spark-related projects.

Backing for Apache Spark also includes the donation of IBM’s SystemML machine learning technology to the Spark open source project. IBM also said it would leverage current partnerships to train as many as 1 million data scientist and engineers on Apache Spark.

It also plans to host Spark applications on its Power and Z Systems infrastructure.

The partners said they plan to introduce new domain specific algorithms to the Spark ecosystem and add new machine learning primitives to the Apache Spark Project.

IBM’s full-throated endorsement of Apache Spark reflects the growing momentum of what has emerged as Hadoop’s most popular open-source projects. Last fall, Hortonworks outlined a similar investment in Spark aimed at moving the platform to the enterprise.

In a statement, IBM said it is fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way.”

Developed by AMPLab (“Algorithms, Machine, People”) at the University of California at Berkeley in 2009, Spark was released by startup Databricks in 2013. It is described as a general-purpose data processing engine packaged to handle SQL queries and advanced analytics like machine learning. The cluster-computing framework with in-memory processing quickly gained traction in the analytics market, with hyper-scale deployments by Internet giants like Yahoo and Baidu.

Sparks’ creators said their intent was to forge the next generation of analytics tools to derive insights from heterogeneous data by combining machine learning, hyper-scale computing and “human computation.”

IBM said its data scientists would begin working over the next few months with Apache Spark open-source community to advance machine-learning capabilities. The initial goal is development of “smart business apps,” the company said.

As part of its plan to integrate Spark into its analytics and consumer platforms, IBM said it would begin offering a beta version of its “Spark-as-a-Service” on its Bluemix cloud platform.

In a blog post, Fred Reiss of IBM’s Spark Technology Center said several hundred data scientists, developers and designers would begin working at the San Francisco center over the next several months. The center was formed to speed IBM’s adoption of new Spark technologies. For example, it integrated an earlier version of Spark (version 1.3.1) to IBM’s Open Platform for Apache Hadoop.

IBM said developers have been steadily reducing Spark’s backlog of bug fixes while working to improve its performance. Reiss said the next step would be contributing new features and components to Apache Spark, with special emphasis on machine learning as the company shifts its technology to the open-source community.

It also expects to begin demonstrating business applications based on Spark in the coming weeks.

The company said more than 300 IBM engineers are already working on Hadoop and Spark open source development efforts.

Meanwhile, Databricks said its 1.4 release of Spark could be downloaded here. The release adds window functions to Spark SQL and in its DataFrame library. Databricks said window functions are increasingly popular among data analysts, allowing them to compute statistics over window ranges.

Three Things Spark Needs to Out-Hadoop Hadoop

Applications: Artificial Intelligence, Enterprise Analytics, Visualization

Technologies: Cloud, Frameworks, Systems

Sectors: Financial Services, Healthcare, Manufacturing, Other, Retail

Tags: apache spark, databricks, IBM, In-memory computing, machine learning, Spark

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

IBM, Databricks Join Forces to Advance Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

IBM, Databricks Join Forces to Advance Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link