Arcadia Emerges with Visual Analytics Running Directly On Hadoop
Arcadia Data today emerged from stealth mode by announcing $11.5 million in Series A funding and the limited availability of its flagship product: a Web-based visual analytics tool served directly from Hadoop.
In Hadoop, most visualization and BI tools work by sending SQL queries to data stored in HDFS and then moving the results over the network to a separate box that drives the analytics and visualizations. While this works, it introduces limitations in the amount of data that can be visualized, and can be expensive and time-consuming too. The folks Arcadia Data, who hail from tier-one IT firms like IBM, Oracle, and Teradata, decided there had to be a better way.
“When we were at Aster Data and Teradata, we’d see customer who have these Hadoop systems in place, where they would store all the data in initially in Hadoop as a data lake, and then they’d use traditional data marts to move the data into and use traditional BI tools to connect to that information,” says Arcadia co-founder and CEO Sushil Thomas. “We thought that was a fractured and inefficient architecture, so we started Arcadia to create a unified visual analytics and BI platform that works inside Hadoop.”
The YARN-compliant software is designed to help non-technical business make sense of Hadoop-resident data without involving expensive programmers or data engineers. There are two elements to the product, including processes to help model the data, as well as visual tools for doing data discovery and analytics.
Nobody’s data is perfectly clean and labeled, especially in the Hadoop world. Arcadia picks up where Informatica and other ETL tools leave off by providing an intuitive drag-and-drop interface that guides the user through the all-important data munging and modeling process, which can’t be avoided but can be automated to a certain extent.
“You build data-driven applications completely within the browser,” Thomas tells Datanami. “Underneath the covers, we have an active data layer as part of the data platform. When you’re dragging and dropping within the visual tool, we have rich semantic information about exactly what’s relevant to your business, because you’re telling us which measures and dimensions are relevant. We use that information to model out the data, to store the right form of the data, so that we can accelerate that access and give you interactive speed of analysis on very, very large data sets.”
Once the data is modeled into a star or snowflake schema, then users are ready to interactively explore and analyze their Hadoop-resident data, without first extracting it from HDFS. That gives Arcadia an advantage over traditional BI tools that require the data to first be moved.
“There’s no subset of the data that you ever copy out of Hadoop. That’s one of the big differentiators,” says Priyank Patel, co-founder and head of product for Arcadia. “They are sucking the data from Hadoop and moving it to a different system before providing the analysis. But that’s where the highest granularity of data is present. As soon as move it out, you’re forced to summarize, you’re forced sample, and you lose the fidelity of the data. That’s something that is very core to the value proposition of Arcadia.”
The responsiveness of the Arcadia product was evident during a product demonstration last week. Thomas was able to ask questions of Hadoop data interactive, in a visual drag-and-drop manner. The screen would automatically refresh as the parameters of the query were changed, clearly showing patterns that would be tough to spot using non-visual approaches.
It’s all about abstracting away the complexity of doing BI on Hadoop, and getting it as close to the data as possible, Thomas says. “We don’t have a thin SQL pipe into the system. We sit right next to the data,” he says. “We own the access to the raw data, and we converge the BI functionality with the visual analytics to give customers a unified experience that gives them very fast access, very high concurrency, and net new analytics that they couldn’t do with the traditional tools that still connect to these very thin SQL pipes.”
While the likes of Tableau, Qlik, and Microstrategy probably are not quaking in their boots at the moment, they would do well to keep an eye on this San Mateo, California startup, which was in stealth mode for two years. Arcadia used this week’s Hadoop Summit in San Jose as its coming out party and to announce that it has $11.5 million in the bank, courtesy of Mayfield, Blumberg Capital, and Intel Capital.
Jason Waxman, vice president of Cloud Platforms Group at Intel, says he’s excited to invest in Arcadia. “Organizations understand the value of using data-driven insights, but existing analytics and business intelligence tools are built on outdated, fractured models that leave users guessing the answers for critical business questions,” he says.
Arcadia has just a handful of customers so far, but some of them are big shops that are putting Arcadia through the paces. According to Thomas, one of the Fortune 200 companies that’s using the tool is using it against more than 100 billion rows of data in Hadoop.
The company has two products, including Arcadia Instant, which is free software that includes the visual analytics. The enterprise version of the product includes more extensive capability and is slated for availability in late 2015.