Follow Datanami:
March 11, 2014

DataRPM Gets $5M to Liberate Data from Warehouses

Alex Woodie

What if you could ask your Hadoop cluster questions about your data in a natural, conversational style, and get the results served to you in a visual interface? That is essentially what the data analytics startup DataRPM has created, and today, the company announced $5.1 million in Series A venture funding to take that product to market.

DataRPM is aiming to liberate data and analytics by empowering regular business users to get the information they need out of data warehouses. The company says users are frustrated with writing complex queries in SQL, futzing around with OLAP, or building complex data models that are obsolete before they make it into production. Why can’t querying a data warehouse be as simple as writing a Google Web search query?

That’s essentially what the company claims it has done with its first offering, which combines natural language processing (NLP) and machine learning (ML) algorithms with a graph processing engine and a bitmapped search index based on the Lucene search engine on the back-end, and a Web-based user interface on the front-end that serves up results of queries in a graphical manner.

DataRPM’s approach lets a user write a query in plain English, such as “What is my last year revenue by location?” A user must have a good idea what’s in their database before using the tool, of course. They must know, for example, that revenue is tracked by time and by location But instead of writing complex SQL with joins and filters, they can simply bust out those eight simple words, and DataRPM’s algorithms will do the hard work of figuring out how to complete the query.

 “The whole genesis of the application it that it just takes way too long to make data available and to get insights and take actions on data in the traditional world,” says DataRPM co-founder and CEO Sundeep Sanghavi. “When you have a business question as a business analyst or C-level executive, you don’t want to sit there and learn SQL and built complex reports. You just want to ask questions of your data. And we use machine learning to figure out the computation behind the scene to give you the visualization in a very easy and intuitive manner.”

DataRPM was founded two years ago by Sanghavi  and his two partners, chief product officer Ruban Phukan who was formerly a data scientist at Yahoo, and CTO Shyamantak Gautam, who previously worked at IBM. The company has more than a dozen customers to date, with dozens more in the pipeline.

DataRPM runs on Linux, either on-premise or in the cloud. The software can pull data from traditional data warehouses, such as Teradata, and newer big data platforms like Hadoop to store their data, or any other source of log files and CSVs. As the data is pulled in, the graph engine models the data. It looks to see whether the data is date and time oriented, whether it’s dimensional in nature, or whether it’s a measurement. The data is compressed into a bitmapped index, and presented to the user via the NLP layer.

It’s all about enlivening the data. “We genuinely believe that data warehouses are where data goes to die,” Sanghavi says. “If you look at traditional data warehouses, you sit with teams, figure out what they want to see. You design universe or an OLAP cube. You come back, and by the time you come back, the requirements have already changed. And on and on and on.”

This approach caught the eye of investors, including InterWest Partners and CIT GAP Funds, which led the Series A round. Khaled Nasr, a partner at InterWest, says DataRPM is making business intelligence “a no-brainer.” “Until now, BI solutions and big data have largely ignored the data modeling process,” Nasr says. “DataRPM uses sophisticated algorithms to automate what is otherwise a heavy manual lift. Their combination of affordability and ease-of-use creates the opportunity for companies of all sizes to get meaningful insights from their data.”

If NLP is the future of generating data analytic queries, will we be able to literally speak into our PC microphones to retrieve data from Hadoop? The answer is yes. The company is using Google Speak to convert audio into text that can be parsed by the NLP algorithms.