Follow Datanami:
October 28, 2013

Revolution Ships RRE7 to Marry R with Hadoop

Alex Woodie

Organizations that are looking to apply statistical analysis to data stored in their Hadoop clusters got a shot in the arm today when Revolution Analytics formally announced the availability of Revolution R Enterprise version 7 (RRE7), the newest version of its parallelized library of R statistical functions.

With RRE7–currently in limited availability with GA expected December 13–Revolution is enabling scores of R functions to run directly on a Hadoop data store. Specifically, the company is supporting Cloudera’s CDH3/CDH4 and Hortonworks Data Platform version 1.3. Those are not the latest versions of their respective Hadoop distributions, but they are the most widely deployed, especially as Hadoop version 2 is just getting off the ground.

Revolution is encouraging its users to adopt in-database analytics, whereby its R functions run directly on the big data store directly, instead of moving the data to a dedicated box or cluster for analysis. In addition to partnerships with Cloudera and Hortonworks, Revolution has a deal in place with Teradata to enable its analytics to run directly on both the Teradata Data Warehouse and the Aster data discovery platform.

Revolution appears to be gaining some traction in the big data analytics space with RRE, a commercialized version of the open source R statistics language that the company tweaked to run efficiently on big clusters. Specifically, Revolution says it eliminated the in-memory requirement of the R functions and enabled them to execute simultaneously across multiple processor threads. Without that work, R simply doesn’t scale on big data clusters, says Revolution vice president of corporate marketing and community relations David Smith.

Revolution is now scaling up the number big data platforms it supports with its Parallel External Memory Algorithms (PEMAs), Smith says. “What ties it all together is ‘run once, deploy anywhere,'” he says. “There’s a little bit of confusion in the big data landscape. Things are changing pretty rapidly for CIOs in terms of what platform they should adopt for advanced analytics.” The capability to support a number of big data platforms gives Revolution an advantage, he says.

Cloudera and Hortonworks aren’t the only Hadoop distributors, of course, and MapR Technologies and Intel are noticeably absent from Revolution’s list of supported vendors. MapR Technologies isn’t being supported, “at least in this announcement,” Smith said. The wordage was slightly less optimistic when it came to Intel. There’s nothing stopping users from running Revolution’s plain vanilla R-based algorithms on data stored in Intel’s Hadoop distribution, he pointed out. But it’s not RRE7, and it won’t scale beyond a single thread.

“There are lots of products, including MapR, that have connections to open source R,” Smith says. “But it is just that–it’s open source R, it’s in-memory, single threaded, and most importantly–especially for distributed platform like MapR–if you wanted to do a big data logistical regression, you can only use one thread on one node to do that. If you want to use all the nodes at once, you have to go through all mathematics of figuring out how to split up the data, how to parallelize the computation, and how to aggregate the results. That’s what we have done with scalar.”

RRE7 also introduces integration with Alteryx and its front-end visualization tool, called Strategic Analytics, which we covered earlier this month. That gives Revolution customers the option of viewing the results of various R-based analytic processes from various front-end BI and visualization tools, including QlikTech and Jaspersoft.

Revolution has added several new algorithms to its collection with RRE7. Smith specifically mentioned the new Decision Forest algorithms as being particularly useful for clients. “It’s basically a very powerful predictive analytics technique,” he says. “This is an ensemble tree method. Tree models have been around for many years. An ensemble tree is the next generation of that.”

Related Items:

Hadoop Version 2: One Step Closer to the Big Data Goal

Alteryx and Revolution Partner to Foster R Adoption

Putting the “R” Into Hadoop

 

Datanami