Stealth Startup Looks to Crack the Time Barrier
One of the challenges of traditional databases and analytic tools is they don’t natively factor time into the equation. But tracking how things change over time has emerged as a critical need in today’s data-deluged world. Now a Silicon Valley startup called Interana has emerged with a database designed specifically for asking these types of questions.
The concept of time may seem like a relatively simple thing to account for in a database. But it’s actually quite complex and not something that the original developers of SQL and relational databases were trying to solve so many years ago. The multi-dimensional databases that emerged in the 1990s allowed users to slice data by time, but the OLAP approach just doesn’t fly in today’s big data world.
Interana co-founder and CTO Bobby Johnson had a front-row seat for how intractable the time problem could be while he was director of engineering at Facebook. The social media giant had a monster SQL query that would calculate which of its users were active six out of the last seven days. The code alone was several pages in length, and the query would run all night, every night, to give Facebook the answer.
It was Johnson’s job to make sure Facebook’s infrastructure could scale, and by all accounts he did a great job helping Facebook grow from 10,000 users to 1 billion. But he realized there had to be better solutions to some of these big data problems, in particular the time- and event-based data.
“As storage gets cheaper and people are able to collect more data, they collect everything that happened to everything, so you have these sequences over time,” Johnson tells Datanami. “There’s really rich stuff you can find, and that’s been the promise of big data so far. The problem is, taking these old relational tools and trying to shoehorn them onto the data is cumbersome.”
So in late 2012 Bobby Johnson co-found Interana with his wife, Ann Johnson, and Lior Abraham, another former Facebook developer who is best known as the creator of SCUBA, a visual interface to SQL. The Palo Alto, California company has been in stealth mode until today, when it officially launched its first product.
The Interna product actually has two parts. On the back-end is a column-oriented, scale-out database designed specifically for analyzing large collections of time-based data, such as Web clickstreams, call detail records, sensor data, or other data sources that can generate a timestamp. On the front end is a visual interface for SQL designed to facilitate data discovery. The company has included a number of pre-built dashboards to enable users to view and explore time and event data.
Time is the priority for Interana, which is short for “interactive analytics.” “Time is a first-order principal,” says CEO Ann Johnson, who is an electrical engineer. “Everything is ordered by time. We actually save it physically on disk by time, and secondarily by users, so you can see how users change over time. By doing that, we get a lot of efficiencies that allow us to never have to do sorts over the entire network, which are really slow and hinder your ability to scale the system.”
Because it was designed from the get-go to answer time-related questions, the Interana database is very fast for time-related queries. For starters, the database uses disk scans instead of disk seeks, which are slower and less efficient. The data is stored in a proprietary format, and compressed for efficiency. The software doesn’t yet run on Hadoop, but that is on the roadmap.
The combination of these techniques enables the database to scan 100 million lines per second per core, significantly faster than other approaches. When you spread that level of performance out across multiple nodes in a cluster (Interana runs on Linux), you suddenly have the ability to sift through billions of event-based records in an interactive manner, as opposed to programming ETL jobs to run overnight.
“The way we built it, it’s like we’re buttoning down for a hurricane,” Ann Johnson says. “We’re preparing for a lot of data. Big data in the future is going to be so much larger than it is today.”
So what can you do with a super-fast, time- and event-based database? The early use cases revolve around understanding product uptake and customer behavior on mobile devices and the Web. The initial target customers are social media sites, e-commerce companies, media and entertainment outfits, gaming, telecommunications and SaaS vendors.
The database lets you ask really detailed questions, such as showing the users who did A and B but not C, and then correlating that with the length of their sessions. “Things like that that you can just barely ask of a lot of systems, but it requires programmers to
write programs and it’s typically run as an ETL job overnight,” Bobby Johnson says. “With Interana, you can ask a lot of questions about those things. And then also you can visually dive in and really explore that rather than spending a lot of time thinking about what’s the one thing that I always want to know and want to calculate every night.”
While the time- and event-based database represents the secret sauce, the SCUBA-like visual interface developed by Abraham is also critical for enabling customers to run ad hoc queries and perform exploratory analytics. The company aims to sell a complete package.
This is the first joint venture for the Johnson, who met while both were studying at CalTech. The trio of co-founders also pulled in Ivo Dujmovic, a former Oracle BI product manager, and Joe Adler, a data scientist from LinkedIn who wrote the O’Reilly book on R. The company has racked up $8.2 million in Series A funding from Battery Ventures, Data Collective, SV Angel, Fuel Capital and YCombinator.