Radoop: A Predictive Analytic Alternative to R on Hadoop
The Hungarian firm Radoop today unveiled the second version of its eponymous product, which integrates Rapid Miner’s data mining and predictive analytic tools atop Apache Hadoop. Radoop 2.0 brings many of the features available in Hadoop version 2, including support for YARN, as well as new operators and easier scoring of models.
The Radoop software products are designed to make Hadoop implementations more powerful and easier to use. The software presents operators as building blocks that users can assemble to perform specific operations upon their data, such as ETL and data cleansing, as well as aggregations, and joins. It also brings the full power of Rapid Miner’s suite of predictive algorithms—such as neural network, linear and logistic regressions, K-means, and decision trees—to bear on data stored in any Apache Hadoop or commercial Hadoop distribution.
With Radoop version 2, the company is now providing more than of its 50 own operators (not including the RapidMiner operators, which number well over 100). For example, the new Generate Rank operator can assign a rank to log records, and will be very useful in Web analytics, where users want to assign sequential IDs to individual clicks in clickstream data. Version 2 also brings new helper operators, such as Remove Duplicates, Union, and Split Data.
Radoop has also made it easier to validate predictive models for classifications and regression. “It is essential in predictive analytics to have standard model descriptions that allow compatibility with other tools,” the company writes. To that end, Radoop’s classification and clustering models are fully compatible with RapidMiner. They can also be exported and imported to and from predictive model markup language (PMML).
The connection to RapidMiner positions Radoop well. The developer of data mining and predictive analytics tools recently moved from Germany to Cambridge, Massachusetts, to be closer to the burgeoning market for big data and predictive analytic tools in the United States. Ingo Mierswa, co-founder and CEO of RapidMiner, says his company’s offering offers users similar capabilities to products from SAS or the open source language R, but without years of specialized training.
“We decided to build a platform that has the flexibility and the feature richness of a program like R, but at the same time can make use of the fact that we are in a specific domain of advanced analytics, and support the user with a graphical user interface, so they can actually create advanced analytic applications without writing a single line of code,” Mierswa told Datanami in an interview last month. “Others tried and failed to create something like that, and we succeed.”
Support for Hadoop’s YARN should also enable Radoop users to better allocate cluster resources among the various Hadoop engines that Radoop works with, including MapReduce, Hive, Pig, Mahout, and Impala. Radoop 2 is certified for Cloudera’s Distribution for Hadoop (CDH) version 5.
The company also announced new customers, including Schneider Electric, a €26-billion French multinational that specializes in electricity distribution; Fractal Analytics, a provider of predictive analytics software and services; Prezi, an online presentation company; and video streaming service Ustream.
“We are happy to see so many companies extracting value from big data by using our products and services,” Radoop CEO Zoltan Prekopcsak says in a press release. “We continue to innovate and add the latest functionalities of the vibrant Hadoop ecosystem to our products and present them on a streamlined user interface. It enables analysts to extract business value without the need to be an expert in dozens of Hadoop components.”