Follow Datanami:
June 6, 2016

MapR Unveils Spark-Only Distro

Big data practitioners who want to get started quickly with Apache Spark but don’t want to mess around with Hadoop may be interested in new software that MapR Technologies announced today.

MapR’s new Apache Spark Distribution provides the complete Spark stack, enabling developers to begin building Spark apps that utilize various APIs for batch and stream processing, graph analytics, and SQL.

While the software uses YARN as a resource scheduler and MapR’s file system (which borrows from HDFS), there aren’t any other Hadoop components in the distribution. Customers can add standard Hadoop features like MapReduce, Hive, and Pig if they want–as well as proprietary MapR add-ons like MapR-DB and Map-Streams–but they don’t have to.

MapR calls it a “Spark-focused distribution,” and it’s no coincidence that it’s being unveiled on the first day of Databricks‘ Spark Summit event that’s taking place in San Francisco.

“Previously Spark was bundled with Hadoop and optionally converged with all the other options of the MapR platform (NoSQL, etc.),” says Jack Norris, senior vice president of data and applications at MapR. “We’ve seen a lot of interest in Spark and many developers and organizations are starting with Spark directly. So with this Spark Distribution it allows organizations that just want Spark to have a dedicated distro with the integrated data platform.”

Norris says MapR’s Apache Spark Distribution gives big data developers and analysts the Spark functions they need, without forcing them to make compromises.

“If you are looking to have a large scale distributed data store with Spark,” he tells Datanami, “you have to compromise with a platform geared to batch (HDFS) or a NoSQL with eventual consistency (Cassandra).”

MapR will also leverage its Spark Distribution in its Quick Start Solution offerings, which include pre-built templates, configuration and installation.  The most popular use cases for Spark include building data pipelines and developing advanced analytical applications leveraging machine learning.

MapR says it’s seen “significant growth” of customers who are deploying Spark as their primary compute engine. That backs up research by Enterprise Strategy Group that shows 16% of businesses have already deployed Spark to production and that another 47% are “very interested” in implementing Spark. “As such, Spark will power the next wave of big data,” says senior ESG analyst Nik Rouda in MapR’s press release.

The launch of a dedicated Spark distribution is the strongest commitment to Spark among what was once the three “Hadoop” distributors–MapR, Cloudera, and Hortonworks. All three companies include Spark in their distributions, but MapR is the only one who has taken most of the Hadoop components out.

This approach will make it easier for customers to get started with Hadoop, says Anoop Dawar, vice president product management for MapR. “We believe this gives our customers a converged compute and storage engine for batch, analytics, and real-time processing that helps build and deploy applications rapidly,” he says in a statement.

Related Items:

Big Data Benchmark Gauges Hadoop Platforms

Spark Takes On Dataflow in Benchmark Test