Combating the High Cost of Cloud Analytics
Cloud providers are growing very quickly at the moment, as companies flock to utilize their various storage and processing services. However, while some companies are finding cloud platforms save them money, others are finding the opposite to be true.
One big data company that’s seeing plenty of cloud action of late is AtScale, which develops a data aggregation and query acceleration layer that sits between data warehouses on the one side, and traditional BI tools on the other.
“Every enterprise you talk to, it’s all focused on ‘How do I get to the cloud?'” says Scott Howser, chief product officer AtScale. “They say ‘I want to get out of the IT business. I’m tired of spending $20 million or $30 million a year with Teradata.'”
The Great Cloud Migration has been a boon for AtScale, since customers look to the company and its software to streamline data access to cloud sources. Some Fortune 50-sized firms have relied on AtScale to smooth their migrations to cloud infrastructure. That migration is possible because AtScale acts as a virtualization layer that sits between customers’ BI stacks and the physical platform where the data resides, Howser says.
“It lets them maintain a logical presentation of the data, no matter where it lives or how it’s materialized,” he continues. “So maybe today it’s sitting on prem, whether it’s Teradata or Oracle or Hadoop or whatever and tomorrow it’s on Google Big Query or Snowflake or AWS Redshift. The users don’t know, because their experience from a data presentation hasn’t changed. But in most cases, what they see happening is the queries get faster, they get access to more data, and with a better SLA.”
It might seem like a no-brainer to pair an experienced Tableau power user with a massive Redshift data warehouse, and tell her to just go at it and find insights that make the company money or customers’ experiences better. It might sound like a win-win-win — good for the customer, good for the cloud provider, and good for the tools vendors. But there’s the potential for a big “L” in there for the customer if they’re not careful with how they put it all together, says Matt Baird, AtScale’s co-founder and chief technology officer.
“You make self-service data available and guess what happens? People self serve,” he says. “It’s not a zero-sum game. There’s an unlimited appetite for data and analytics.”
The cloud providers are well positioned to capitalize on that growing appetite for analytic workloads, Baird says, even if it turns out costing customers more in the long run.
“Every enterprise you talk to says ‘I’m spending $30 million a year on all this infrastructure, the cloud is going to be cheaper.’ And what they find out pretty quickly is that it’s not cheaper,” he says. “In many cases, it’s a lot more expensive.”
AtScale’s approach to reining in cloud costs involves optimizing the queries. Instead of allowing a query on Redshift to conduct raw table scans, the company’s software presents a layer of aggregated data that in many cases can eliminate the need for raw table scans, Baird says.
“We’ve been doing that with Hadoop since day one,” says Chris Oshiro, vice president of engineering at AtScale and one of the company’s first employees. “The reason why we can accelerate query performance in a Hadoop world is because we’ve figured out a way to take raw data, understand the heuristics of queries, and be able to do intelligent aggregations and maintain those intelligence aggregations over time.”
The whole point is to avoid conducting scan table scans when it’s completely unnecessary, he says. “Not only does that cost you from a performance perspective,” he says, “but in the cloud that can actually cost you money.”