Follow Datanami:
February 24, 2016

Spreading Spark Enterprise-wide

Doug Black

Spark is in in the spotlight. Companies with big data analytics needs are increasingly looking at the open source framework for lightning quick in-memory performance – reputedly up to 100X faster than Hadoop MapReduce (according to http://spark.apache.org/). As the data tsunami rolls on and quintillion bytes of data are generated every day, Spark is one of the answers to the daunting task of pulling insight and value out of oceanic data sets.

But it’s also often the case that business analysts and data scientists in the enterprise are so eager to get their hands on Spark that they stray off the IT reservation and set up ad hoc Spark clusters, causing resource strains, siloed data, security risks and other management challenges.

The launch of IBM’s Platform Conductor for Spark is intended to keep Spark under the big IT tent, enabling production-ready, IT-approved and manage multiple Spark instances across the enterprise. IBM calls it a hyperconverged, multi-tenant offering that uses Spectrum Scale (formerly GPFS) File Place Optimizer to add the Spark environment to massive data sets.

To read the rest of the article, see www.enterprisetech.com/2016/02/23/spreading-spark-enterprise-wide.

Datanami