Follow Datanami:
October 29, 2013

0xdata Aims to Turn Hadoop Into a Fortune-Telling Math Wiz

Isaac Lopez

One of the interesting phenomenon surrounding the rise of NoSQL databases are the companies cropping up to turn that data into predictive insight. One company, 0xdata, has developed a toolset that is says is like using Legos to build predictive analytic models.

The toolkit, dubbed H2O, is 0xdata’s attempt to get in on the “let’s simplify predictive analytics” bandwagon by offering a Google search-like interface, and syntax familiar with users of the R statistical computing language. This week, the company announced that H2O is being released in its second generation.

“We developed H2O to unlock the predictive power of big data through better algorithms,” said SriSatish Abati, CEO and co-founder of 0xdata, in a statement. “H2O is simple, extensible and easy to use and deploy from R, Excel, and Hadoop.”

One of the more interesting features that H2O employs is the ability for users to iterate their predictive models in real time. The company says that rather than wait for an entire job to finish, H2O provides approximate results at every step of the analysis process, giving analysts the ability to get a general idea of how the data lays out. If the user doesn’t like what they see, they kill the model and start over until they get results in their anticipated range.

While the H2O platform can be used to explore and model data from a myriad of sources, including Amazon S3, SQL, and NoSQL data sources, the company puts a special focus on Hadoop, R, and Excel, where it claims to be the fastest prediction engine with speeds up to 100X faster than other predictive analytics providers.

Per 0xdata:

“H2O’s in-memory columnar compression and fine-grain parallelism via Map Reduce provides unmatched speed, scale and extensibility for advanced algorithms on big data. Customers can extend the Lego-like architecture and run their own algorithms and models. Or take advantage of 0xdata’s latest algorithms for Distributed Trees and Regression, such as Gradient Boosting Machine (GBM), Random Forest (RF), Generalized Linear Modeling (GLM), k-Means and Principal Component Analysis (PCA). The speed is blazing fast. H2O’s GLM on a dataset with 150 million rows and 750 categorical columns clocked less than five seconds for Logistic Regression on commodity hardware.”

Ambati says the idea behind 0xdata is to level the playing field between the algorithm-haves and have-nots. “With our viral and open Apache software license philosophy, along with close ties into the math, Hadoop and R communities, we bring the power of Google-scale machine learning and modeling without sampling to the rest of the world.”

Along the way, 0xdata has picked up customers that would definitely fall into the “algorithm-haves” category, including streaming-media darling, Netflix, who has been deploying H2O for modeling in the cloud.

“H2O is the platform for big analytics that we have found gives us the biggest advantage compared with other alternatives,” said Chris Pouliot, Director of Algorithms and Analytics at Netflix – who also happens to be an advisor to 0xdata. “Our data scientists can build sophisticated models, minimizing their worries about data shape and size on commodity machines.”

Founded last year, 0xdata received a $1.7 million round in venture funding this past January from Nexus Venture Partners. The company has used the money to expand the functionality of the H2O product, while working to build community around the framework. According to 0xdata, they’ve sponsored or participated in more than two-dozen meet-ups in the bay area since April 2013.

Related items:

Standing on the Shoulders of (Hadoop) Giants 

A Tale of Two Hadoop Journeys 

YARN to Spin Hadoop into Big Data Operating System