Apache Spark Surrounded By Cloud Data Services at IBM
IBM has made no secret about its admiration for Apache Spark, which it sees as the future for in-memory analytics. Today the IT giant unveiled a host of new cloud-based data services that bolsters its hosted Apache Spark business with NoSQL, graph, and machine learning capabilities.
IBM added a number of new offerings to Cloud Data Services, which is a portfolio of more than two dozen database engines and related data-oriented services that it runs for customers as part of its Bluemix cloud platform. A hosted version of Apache Spark is one of those Cloud Data Services, and now IBM is surrounding Spark with new engines that can feed more data into Spark for analysis, as well as put insights derived from Spark into production.
IBM has buoyed that virtuous cycle that connects analysis with production with six new NoSQL and relational data services. The introduction of support for hosted MongoDB, Redis, ElasticSearch, PostgreSQL, RethinkDB, and RabbitMQ comes by way of Compose, a provider of hosted databases that IBM bought last year. IBM is now offering these hosted database services through its Compose Enterprise offering.
The idea behind Compose Enterprise is to empower developers to adopt open-source database management systems without needlessly frightening IT departments in the process, says Adam Kocoloski, CTO of IBM Cloud Data Services.
“As these technologies crop up in experiments in lines of business, there’s a battle over whether they can actually move forward because the IT organization doesn’t have the confidence level on that offering yet,” Kocoloski tells Datanami. “We’re trying to give the IT organization a way to restore the confidence and to know that as the landscape evolves and new technologies arrive, they can be plugged into this framework and managed in a consistent way.”
IBM Graph is the first commercially available graph database built atop the Apache TinkerPop API, according to Kocoloski. IBM is not saying what actual graph databases it has running under the covers, but says it doesn’t matter so much, because the TinkerPop API provides an abstraction layer above the actual graph engine. IBM is bullish on graphs, and sees its graph service being adopted in the areas of fraud detection, recommendation systems, and route analysis.
“We feel that graph database and graph traversal algorithms have a number of use case that people are only just starting to explore,” Kocoloski says. “By providing this as a managed service we lower the barrier to adoption. We’re not asking people to learn how to install and operate a potentially unfamiliar system. We’re taking care of that for you. We’re saying ‘Here’s an API that allows you to immediately start exploring the things that can be done when you model your enterprise data in a graph.'”
The third leg of today’s announcement—dubbed IBM Predictive Analytics–gives developers access to a library of machine learning models they can use to imbue applications with predictive powers. IBM says the predictive capabilities can be leveraged without the assistance of a data scientist, and can be put into production through a simple API call.
“The predictive analytic service is designed to provide an easy onramp into this world of data modeling and machine learning,” Kocoloski says. “It has an auto-modeling capability that can take a look at a data set and suggest multi-variate models that might be the most appropriate for extracting a signal from the noise.”
Finally, IBM has also announced its Analytics Exchange, which provides access to more than 150 public datasets curated by IBM. Most of the data is publicly available already, such as via the Federal Government’s http://www.data.gov site and other sources. But IBM brings it all together for easy pickings.
“The attempt here is to provide a broad spectrum of things that are already accessible but accessible in a variety of different locations and not curated in a way that makes it easy for them to be incorporated into an analysis,” Kocoloski says.
IBM sees these services helping to grow its Spark business. “We continue to augment that core Spark service, which is the foundation of our analytics platform, with a number of other services that help data engineers and data scientists and data analysts collaborate together,” Kocoloski says. “The important point here is we’re not treating the application development world in isolation, in a silo. There’s a connected back to the world of analytics and the connectivity happens through Spark.”