Follow Datanami:
May 24, 2021

Do Customers Want Open Data Platforms?

(corund/Shutterstock)

Snowflake turned some heads in the big data market with recent blog posts and articles that cast doubt on the benefits of open data architectures. Customers should “choose open wisely,” the data warehouse giant says, and avoid “fall[ing] into the trap of letting the means get confused with the end.”

Snowflake lobbed the opening salvo in its latest battle against open platforms with a 3,800-word blog post on March 31 written by founders Benoit Dageville and Thierry Cruanes, along with two other executives. Dageville followed that up with a 2,300-word opinion piece in Infoworld’s Insider, which is not open to the public and requires reader registration (although the first taste is free).

In summary, Snowflake cautions against embracing open in all its permutations–open data standards, open format, and open source software–without careful consideration of the possible downsides.

“We see strong opinions for and against open, we see table pounding demanding open and chest pounding extolling open, often without much reflection on benefits versus downsides for the customers they serve,” Snowflake execs write in the blog piece. “For many organizations who have fallen into the trap of assuming that open is synonymous with innovative and cost-effective, they have learned the hard way that neither is the case.”

Open Enough?

To be sure, there have been some colossal implementation failures with open data platforms. It wasn’t long ago that the big data ecosystem was dominated by Hadoop, which many saw as the open answer to long closed data warehouse platforms. But the stunning fall of Hadoop, along with the meteoric rise of cloud data lakes and warehouses–Snowflake included–shows how quickly ideas about data architecture can change.

Snowflake was an early detractor of Hadoop. Former CEO Bob Muglia had an especially harsh take on the failures of that open data platform. Under new CEO Frank Slootman, Snowflake has found tremendous success with its cloud data warehouse. This success can be traced, at least in part, to Snowflake offering much simpler user experience for analytics customers, at least compared to setting up and running a Hadoop-based system.

Snowflake supports SQL-powered data analytics, as well as data science and machine learning, in its cloud

But how much of Snowflake’s success is tied to the fact that it stores and processes data in a proprietary data format? That’s an open question, but one that the company clearly felt compelled to answer. In the blog post, the company makes a compelling case that shielding user from technical complexity is worth giving up that control.

“At first glance, the idea of any data consumer or any application being able to directly access files in a standard, well-known format sounds appealing,” the Snowflake executive write. “Of course that is until a) the format needs to evolve, b) the data needs to be secured and governed, c) the data requires integrity and consistency, and/or d) the performance of the system needs to improve.

“What about an enhancement in the file format that enables better compression or better processing?” they ask. “How do we coordinate across all possible users and applications to understand the new format? Or what about a new security capability where data access depends on a broader context? How do we roll out a new privacy capability that reasons through a broader semantic understanding of the data to avoid re-identification of individuals?”

The challenges of pursuing transactional integrity, performance optimization, and application coordination further add to the misery of the open data architect, they add.

“Decades of experience navigating through these very trade-offs give us a strong understanding of and conviction about the superior value of providing abstraction and indirection versus exposing raw files and file formats,” the executives write. “We strongly believe in API-driven access to data, in higher level constructs abstracting away physical storage details. It’s not about rejecting open; it is about delivering better value for our customers. We balance this with making it very easy to get data in and out in standard formats.”

An Open Defense

But not everybody agrees. For Dremio co-founder and Chief Product Officer Tomer Shiran, Snowflake’s ode to delivering superior customer value is just a veiled attempt to lock customers into the platform, which in turn creates greater shareholder value.

“To use Snowflake, you have to load data into Snowflake, and it’s stored internally in their cloud, in their system, in proprietary formats that you can’t access with Dremio or Databricks or Athena or with Spark or Dask or anything,” Shiran says. “You have to use Snowflake to access that data.  The only functionality you’re going to get is what Snowflake is going to provide you. And you’re never going to be able to get off. Just like nobody can ever get off Teradata.  You basically get locked in, and that’s it.”

via Shutterstock

In Shiran’s view, Snowflake is waging its preemptive assault on open data platforms because the company views them as a threat. In the big picture, the war being waged is between open data lakes versus closed data warehouses, he says. He points out that this week, Databricks is kicking off its Data + AI Summit, and the theme for the conference is “The future is open.” That’s the opposite of what Snowflake is calling for, he  says.

“Why are they going to war? They’re starting to see a lot of companies choose this modern architecture, which is the non-data warehouse approach,” he says. “It’s between the warehouse and the lakehouse and there’s starting to be a big shift toward the open architecture.”

Openness is a fundamental element of Databricks’ view of the world, says Joel Minnick, vice president of marketing for Databricks.

“When it comes to data, getting value out of data is about getting access to the broadest sets of data that I can, and being able to bring as many tools and innovations to that data as I can,” he tells Datanami. “I think continuing to try to drive a mantra out there that openness around data is a bad thing–I think that’s on the wrong side of history.”

Hadoop Hangover

Customers want a lot of things when it comes to their data and their data platforms, and sometimes those wants are at odds with one another. We saw this play out with the Hadoop experiment, where openness and flexibility sometimes came at the expense of performance, integration, and security. Cloud providers, particularly data warehouse companies like Snowflake, have capitalized on those poor Hadoop experiences by providing a much simpler, secure, and highly performance environment for analyzing big data. It just so happens that, in Snowflake’s case, the new environment is not open.

Getting the Hadoop farm animals to play nicely with an on-prem cluster was a huge drain on productivity

But according to Shiran, customers don’t have to trade openness to get performance, integration, and security. Technological advances in the data ecosystem, particularly with what is occurring in open cloud-based data ecosystems, have rendered moot those old Hadoop-era compromises.

“What made Hadoop difficult was really an on-prem complexity, as opposed to data lakes or lakehouses,” says Shiran, who was employee number five at MapR Technologies (now part of HPE) and was its vice president of product. “People don’t go and run Cloudera in the cloud, a big Hadoop distribution where you run 25 different open source projects on the same set of nodes and have to figure out how to make it work.”

Hadoop was the right vision for an open data ecosystem, but the execution had problems. With the cloud, it’s easier to execute on that vision without compromising on openness, he says.

“What’s happening now if you think about how these data lakes look in the cloud with the separation of compute and data…these services are much easier to operate and manage,” he says. “Different services are not running on the same hardware. So if a customer use Databricks and Dremio together, they’re each running on their own set of instances, which is very different from the on-prem Hadoop stuff, where you have these 25 things running together on the same boxes, with all the conflicts and version issues. It was just a nightmare.”

For Databricks’ Minnick the most compelling evidence that openness is a virtue to strive for and not a drawback to be avoided is that customers desire openness. In a recent study that Databricks’ ran with MIT Tech Review, the two groups asked CDOs and other data leaders what they would do differently with their data investments, if they could do it again.

“The number one answer to that question was I would have invested more in open source,” Minnick says. “I don’t think people are scared of open….The wind right now in the market is open solutions are where organizations want to go. I think that is what customer want.”

Related Items:

Data Lakes Are Legacy Tech, Fivetran CEO Says

Open Source Still Rolling, But Roadblocks Loom

Hadoop Has Failed Us, Tech Experts Say

 

 

Datanami