Why This Spreadsheet Interface for Cloud DWs Is Turning Heads
There’s often no escape from complexity in big data analytics. For better or for worse, it’s just part of the game. But at the same time, there are places where a simpler approach to complex problems can yield positive results. The new spreadsheet-style user interface for cloud data warehouses from Sigma Computing just might be one of those cases.
Sigma Computing emerged last fall with a new product that essentially delivers an Excel-style interface for cloud data warehouses, including Snowflake, AWS Redshift, Microsoft Azure SQL Warehouse, and Google Big Query.
“We love the technology of a cloud data warehouse,” Sigma Computing co-founder and CEO Rob Woollen told Datanami at the Strata Data Conference last week. “My co-founder [Jason Frantz] and I were at Sutter Hill Ventures right when Snowflake was founded. We saw the technology. We saw how that side was going.
“But the pain we felt is that that technology is really only available to programmers,” Woollen continued. “Our goal is to essentially figure out, how do we expose that technology to a mass set of people? And so we married the interface of the spreadsheet with that warehouse. That’s what Sigma is.”
Microsoft Excel is, hands down, the most widely used analytics tool in the history of computing. Numerous studies over the years have shown that. People love Excel (and other spreadsheet products that are similar) because it provides a clean, simple interface where they can see the data and interact with it. The experience is so addictive that users have been labeled “Excel junkies.”
The problem, of course, is that Excel doesn’t scale. Even the most powerful laptop lacks the disk, memory, and compute power to serve the data analyses needs of even small companies these days. There’s a good reason why businesses adopted column-oriented, massive parallel processing (MPP) relational databases like Teradata, Vertica, and IBM Netezza, and more recently, why Hadoop-oriented SQL offshoots like Hive, Impala, and Presto have emerged to process big data.
But the MPP momentum has shifted and now we see cloud data warehouses gaining market share at a rapid clip. Woollen and his colleagues at Sigma Computing saw that cloud data warehouses were suffering from the same old complexity of SQL programming as the MPP and Hadoop databases that came before them, so they decided to do something about it.
The solution was straightforward: Marry the simplicity of the Excel user interface with the scalability of the cloud data warehouse. As Woollen explains, Sigma users can view and work with data residing in the cloud data warehouse through their spreadsheet-like interface delivered from a Web browser.
“Everything you do in our spreadsheet, we translate those formulas you write in to SQL and run it against the warehouse,” Woollen says. “You don’t need to write any SQL. But you can build the equivalent of any SQL query.”
Analysts can play around with the data from the cloud data warehouse, sorting it as they see fit through Sigma’s spreadsheet-like interface. They can sort the data, create new columns, apply transformations, and write custom formulas just like they’re used to doing in Excel, but
“Essentially you no longer have to download little bits of data into your PC. You can directly work on live data in your warehouse,” Woollen says. “The data stays in the warehouse. The IT team can still watch the warehouse and they know it’s secure. They can control who has access.”
An analyst working with Sigma can easily create and submit 400 to 500 queries to a cloud data warehouse per day, Woollen says. That’s going to outperform just about any human analyst writing SQL code by hand (Jeff Dean notwithstanding).
“I used to work at Salesforce for many years,” Woolen said. “I wrote thousands of production queries. There’s no chance that I could beat you using Sigma writing queries by hand. It’s fast. A machine competing against a programmer — it’s not fair.”
Sigma’s product handles about 99% of ANSI SQL. That includes lots of complex aggregations and windowing functions. Not all activities in the product require the generation of SQL on the backend. But users will see no difference when they go from playing around with data and trying out deafen formulas to submitting massive SQL queries that touch billions of rows of JSON.
“Our view is, if you’re a spreadsheet user, you should sit down and feel comfortable. You should feel like this is something I can figure out quickly,” Woolen said. “You can build virtually any query in our interface. You just get the data and you essentially start playing, figuring out what question you want to ask.”
Marrying the simplicity of the spreadsheet interface with the complexity of SQL jobs running at massive scale was part of the design criteria that drove Woollen and his colleagues to work hard to get it right.
“It’s easy to say. It’s quite hard to actually do it. That was the technology that we had to figure out,” he said. “It took us a lot of effort. We have been steadfast that the interface [would allow you] to be able to build any query. Some of these interfaces say, you can build a simple queries in this interface, and then if you switch over to this other pane you can write SQL. We were one of the first to say, we’re going to build an interface, and we’re not ever going to tell you have to stop. You can do whatever you want with this interface. That’s core to us to really build out that model.”
Sigma’s offering runs against the four big cloud data warehouses, in addition to Postgres in the cloud. Pricing was not disclosed.