Newly ‘Headquarterless’ Snowflake Makes a Flurry of Announcements
Snowflake is best known as a cloud data warehouse, but it’s delivering capabilities that go beyond delivering fast answers to SQL queries. These wider “data cloud” ambitions are on display this week, as the newly “headquarterless” company holds its first Snowflake Summit since its massive IPO last year.
According to Christian Kleinerman, Snowflake’s SVP of product, the biggest announcement to come out of Snowflake Summit this week revolves around Snowpark, the new development tool and runtime it unveiled last November at the Data Cloud Summit.
Snowpark gives customers the ability to develop and run Java-based programs against data they are storing in Snowflake. These programs could perform ETL/ETL, data transformation, or feature engineering tasks that are needed for data analytics, data science, and data engineering workflows.
“It’s an alternative to Spark or Dask or all those frameworks that exist to program to data in Java or Python,” Kleinerman tells Datanami. “It’s a programing model on top of the Snowflake entity.”
Snowpark will support Scala (a JVM compatible language) first. All Snowpark customers on AWS will have it by next Monday, according to Kleinerman. Support for Java, Python, and related libraries and routines is expected later this year.
On a related note, Snowflake is bringing to Snowpark a new Java user defined function (UDF), which will enable users or partners to bring their custom Java code and implement it within the Snowflake paradigm. This is still in private preview; a public preview is expected soon.
About 50 partners have already adopted Snowpark or have committed to adopting it, which is proof that Snowpark is getting traction, Kleinerman says. “[Snowpark] has been rolling out to different customers and partners for the last three months or so, and right now it’s ramping up,” he says. “We have customers and partners talking benefits, performance, throughput, and cost.”
Snowflake is also announcing support for unstructured data, such as images, video, and texts. According to Kleinerman, this will help complete the data analytics picture for customers with diverse data ambitions.
“Snowflake was born with structured data and semi-structured data as first-class capabilities,” the product manager says. “I hear customer say, I like the no-silo story. But I want all my data there, not just structured and semi-structured. So now we’re bringing full support for unstructured data in the form of file support.”
Customers can now store any file in Snowflake, and the company will provide the same guarantees around data governance, management, and replication atop that data, Kleinerman says. What’s more, with Snowpark providing support for Java-based programs (and soon Python-based programs, like PyTorch and Tensorflow), customers can start to do analytics atop that data.
For example, customers could perform sentiment analysis on text data or voice data,” Kleinerman says. “I have some speech. I can use some library to convert it to text. Then I can use some other library to extract sentiment from it.”
Snowflake is a central player in the ongoing battle that pits cloud data warehouses and cloud data lakes against each other. Proponents of cloud data warehouses, like Snowflake, proclaim that customers are better off using more closely managed (and proprietary) data warehouse to analyze data, whereas data lake supporters, such as Dremio, argue that customers are better off using less closely managed (and open) data lakes. Features like support for unstructured data and the ability to bring Java and Python-based functions to bear on that data indicate that Snowflake is responding to these customer concerns, at least in part.
Snowflake is also announcing that customers are benefiting from an across-the-board increase in the compression rate, in some instances by up to 30%. Kleinerman said this is exactly the type of improvement that users can expect because Snowflake closely manages its data format.
The 30% increase, which comes atop compression rates that are already around 10x for some data types, actually led Snowflake’s CFO to announce on the analyst call last quarter that its annual revenue will decline by $13 million , Kleinerman says. “It’s direct money that we are not recognizing because the economics are better for customers,” he says. “Each time we make the system faster, we hurt our topline a little bit. But we’re in this for the long run.”
Snowflake also is making news on the data marketplace front. Buyers and sellers using the company’s Data Marketplace, which it launched in 2019, can now complete their transaction within the marketplace instead of completing their deals offline. Snowflake is implementing a user-based pricing model for its data marketplace, which will calculate costs based on the compute time associated with a given piece of data.
The marketplace has doubled in size in the past year, and now has about 500 data listings from 160 providers, the company says. “It’s growing quite well,” Kleinerman says. “We’re trying to lower the bar on how easy it is for organizations to monetize their data.”
Selling or sharing data in the marketplace can be done more securely, thanks to steps that Snowflake has taken to prevent sensitive data from leaking. This includes a new sensitive data classifier that can automatically spot potentially problematic combinations of data, Kleinerman says.
Researchers have shown that, even in data that has been aggregated and isn’t explicitly tied to individuals’ identity, people can be re-identified by linking together disparate pieces of data. “If you take anyone’s date of birth, gender and ZIP Code, you can pretty much uniquely identify them,” Kleinerman says. “Our classifier not only will tell you, this is sensitive, but it also has concept of quasi identifier, so it will help customers identify combinations of data that are might be potentially identifying.”
The company has also launched something called anonymized views, which is an anonymized version of a data set that reduces the risk of re-identification, but still provides analytic value. The technology uses the k-anonymity and differential privacy algorithms, Kleinerman says. “We think this is going to accelerate more the confidence that people have to share data with one another,” he says.
Last but not least, Snowflake today is announcing its “Powered by Snowflake” program to help build and grow its data cloud. Snowflake has worked with partners for years, but the new program will more clearly lay out the benefits that partners receive as it pertains to application development, go-to-market strategies, and tech support, among others.
Snowflake CEO Frank Slootman will be speaking at 9 a.m. PT today at the Snowflake Summit. The event will be virtual, just like the company, which has abandoned its Silicon Valley headquarters and announced it has become fully distributed, or “headquarterless,” save for its “principal executive office” in Bozeman, Montana, which is where Slootman and CFO Mike Scarpelli share a ZIP Code. For more info and the conference agenda, see www.snowflake.com/summit/agenda/.
Editor’s note: This story has been corrected. Snowflake’s annual revenue will take a $13 million hit as a result of the up-to 30% data compression that Snowflake just implemented, not $13 million per quarter as first reported. Datanami regrets the error. It was also updated to reflect the timing of support for Scala, Java, and Python in Snowpark.