Follow Datanami:
May 12, 2016

MapR Announces Availability of Apache Spark 1.6.1 on the Converged Data Platform

SAN JOSE, Calif., May 12 — MapR Technologies, Inc., provider of the industry’s only Converged Data Platform, today announced the immediate availability of Apache Spark 1.6.1 on the MapR Converged Data Platform making it the eighth release of the full Spark stack available to MapR customers. Additionally, the free, complete online Spark On Demand Training (ODT) courses via MapR Academy have achieved the highest course enrollment rate since the ODT program’s initial launch.

“We have seen a significant customer adoption of Spark for building data pipelines and advanced analytics,” said Anoop Dawar, vice president of product management, Spark and Hadoop, MapR Technologies.  “MapR has fully supported the Spark stack for two years – more than any other vendor in this industry.  Based on customer feedback MapR provides early preview releases so data scientists and developers can try cutting edge features and then follows it up with a GA release for production deployments.”

Spark continues to attract significant interest from developers and 30% of course registrants have already become certified as MapR Certified Spark Developers.  This industry credential validates a developer’s technical knowledge, skills and abilities to use Spark in an enterprise environment to process large datasets.

Apache Spark version 1.6.1 on the MapR Converged Data Platform features:

  • Improved performance gains with core Spark engine

With Spark 1.6.1 automatic memory management, both execution memory and storage memory can be changed dynamically based on workload characteristics. Execution memory can now borrow available memory from the storage region and vice versa.

  • Persistence of machine learning pipelines

Spark 1.6.1 adds new features to machine learning that take persistence beyond models to persisting the entire pipeline, including transformers and estimators. The entire workflow can be persisted which includes pipeline persistence along with model persistence, without needing to write custom code for exporting or importing.

  • Dataset API

Spark 1.6.1 introduces a new experimental interface called Dataset API that is an extension of the DataFrames API. Datasets contain encoders that can be used in both Scala and Java, with Python support to be added in future releases. The biggest benefit of this new Dataset API is the reduction in memory usage as it can create a more optimal layout in memory when caching datasets.

About MapR Technologies

MapR provides the industry’s only converged data platform that integrates the power of Hadoop and Spark with global event streaming, real-time database capabilities, and enterprise storage, enabling customers to harness the enormous power of their data. Organizations with the most demanding production needs, including sub-second response for fraud prevention, secure and highly available data-driven insights for better healthcare, petabyte analysis for threat detection, and integrated operational and analytic processing for improved customer experiences, run on MapR. A majority of customers achieves payback in fewer than 12 months and realizes greater than 5X ROI. MapR ensures customer success through world-class professional services and with free on-demand training that over 50,000 developers, data analysts and administrators have used to close the big data skills gap. Amazon, Cisco, Google, HPE, Microsoft, SAP, and Teradata are part of the worldwide MapR partner ecosystem. Investors include Google Capital, Lightspeed Venture Partners, Mayfield Fund, NEA, Qualcomm Ventures and Redpoint Ventures. Connect with MapR on LinkedIn, and Twitter.


Source: MapR

Datanami