Follow Datanami:
November 30, 2015

IBM Expands Spark Support With SystemML

IBM’s machine learning technology has been accepted as an open source project, adding momentum to the steady enterprise shift to open-source development. The development effort also extends IBM’s reach into the Apache Spark ecosystem.

IBM (NYSE: IBM) said last week its SystemML originally developed for its BigInsights data analytics platform has been accepted as an Apache Incubator open source project. SystemML is a machine learning algorithm translator designed to help developers building machine-learning models used for predictive analytics across a range of industries.

The open-source version of SystemML is intended to help data scientists transfer their algorithms to production environments without the need for rewriting the entire code base. That, the company claimed, enables the ability to scale data analysis from a laptop to large clusters.

“This allows for domain –or industry –specific machine learning, providing developers what they need from a base code to customize applications,” Rob Thomas, vice president of development for IBM Analytics, noted in a statement.

IBM announced in June 2015 it would donate SystemML as an open-source platform for building intelligence into applications. Since then, the company said Apache SystemML has produced more than 320 patches, including APIs, “data ingestion,” language and runtime operators as well as additional algorithms, testing and documentation.

Along with more than 15 outside contributors focusing on boosting the capabilities of the core SystemML engine, IBM said the project also has generated more than 90 contributions to the Apache Spark program. These include contributions from more than 25 engineers at IBM’s Spark Technology Center in San Francisco focused on machine learning as well as other components of Apache Spark.

The long-term plan is to integrate SystemML with Apache Spark, including as a “complement framework” for Apache MLlib, developers said.

According to the Apache SystemML web site, the project would

provide “declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark.”

SystemML could be used on a single machine, allowing data scientists to develop algorithms locally without a distributed cluster. Those algorithms could then be distributed across Hadoop or Spark, organizers noted. SystemML also can be operated via Java, Python and Scala.

IBM’s SystemML contribution builds on its growing focus on Apache Spark as a core component of its analytics platforms. In October, the company announced it would redesign more than 15 analytics and commerce products with Apache Spark.

It also released a Spark-as-a-service offering on its Bluemix application development platform.

Since announcing its commitment to Apache Spark in June, IBM said last month it has made more than 60 contributions to the Spark project, including machine learning and SQL. Meanwhile, the IBM Spark Technology Center has hired 35 Apache Spark contributors.

Datanami