What Is a Data Cloud? And 11 Other Snowflake Enhancements
Snowflake has changed how the industry thinks about data warehouses with its cloud-native offering, which has been adopted by 4,000 organizations, including 2,000 in the last year alone. Now the company is taking the concept one step further with the introduction of a data cloud, which the company is positioning as a one-stop shop where organizations can execute a full range of data-oriented tasks – not just data warehousing and SQL analytics, but also machine learning, data engineering, and monetization of third-party data.
According to Snowflake CEO Frank Slootman, the origins of the data cloud concept began with the rise of public clouds from Amazon Web Services, Microsoft Azure, and Google Cloud, which Snowflake sits atop. Simultaneously, the rise of software as a service (SaaS) applications, or “application clouds,” from the likes of Salesforce, Workday, and SAP, have provided transactional data to process on the clouds.
But getting the transactional data into the public cloud so they can work together in a harmonious manner is much easier said than done, Slootman says.
“All these different data center and application clouds have not helped to solve a problem we have endured for generations, something we call the siloing of data,” Slootman said in a pre-recorded keynote released today.
The problems with data silos are numerous, he says.
“First it’s difficult to join, augment, and integrate data from different silos, preventing us from gaining critical insight and understanding,” he says. “Second, fragmented and proliferated data is of course a data governance nightmare, escalating security and privacy concerns that simply cannot be mitigated away. For companies that value their brand beyond anything, this is an intolerable scenario.”
Lastly, the economics of cloud-based processing of siloed data just don’t add up. “We end up paying way more than we need because it’s impossible to manage to use the capacity in such a fragmented environment,” Slootman says.
These data storage and integration challenges gave rise to the data lake, including solutions based on Hadoop that were popular in the early days of the big data revolution. More recently, as Hadoop’s influence has waned, we’ve seen momentum build for data lakes based upon cloud-based object stores.
However, these data lakes didn’t solve the data silo challenges, Slootman says. Instead, the proliferation of data lakes just moved the data silo problem to a different level of the stack. “With data stored in their original formats, the silos live on,” he says.
In Snowflake’s vision of a data cloud, the Snowflake Cloud Data Platform becomes the hub of an organization’s data activities. Once they move data into the Snowflake cloud, they can say goodbye to the integration and governance challenges that existed when data was siloed. Or at least that’s the theory.
But customers don’t have to move all of their data into Snowflake to benefit from its vision of the data cloud, according to Snowflake. With a range of enhancements unveiled today, the company is attempting to extend its influence outside of the cloud itself and into its customers’ data centers.
- Snowsight. Introduced last year, Snowsight now includes a new schema browser that lets users discover and understand their data. It also gains auto-complete for SQL queries, as well as the capability to share queries while maintaining control with granular permissions, Kleinerman says.
- Partition-aware exports. Currently in public preview, this feature is designed to make it easier for users to consume Snowflake data by tools outside of Snowflake, without worrying about the file structure.
- External functions. Snowflake is enabling customers to bring functions are executed outside of the Snowflake environment, such as an external ML-based scoring system, to bear on Snowflake-resident data through a REST Web service. Currently in public preview.
- User-defined functions. Snowflake is adding support for Java-based UDFs to give customers more freedom in how they construct Snowflake workflows, such as data transformation pipelines (which are traditionally developed in SQL in Snowflake). Support for Python will likely be next. The Java-based UDF support is in public preview.
- Integration with Salesforce and Tableau. The two companies (Salesforce owns Tableau) are working to enable direct queries of Snowflake resident data from Salesforce’s Einstein Analytics, eliminating the need to move the data. This function is in open beta. They’re also working to build an outbound connector to synchronize data from Salesforce into Snowflake. The status of the connector was not immediately available.
- Search optimization. This feature can be activated on a table-by-table basis, Kleinerman says, giving customers the benefit of fast search results by eliminating the need to perform full table scans.
- 5XL and 6XL clusters. The satisfy the needs for bigger cluster to handle the biggest data workloads, the company has rolled out 5XL and 6XL clusters. In Snowflake’s product naming convention, each successive digit represent a doubling of capacity, so the 6XL cluster has four times the computing power of 4XL.
- Geospatial data. Snowflake is adding a new geography data types that uses a round earth coordinate system to store geospatial data. Kleinerman says the new data type will requires no tuning knobs or spatial indexes to get good performance.
- Data governance. A new data masking policy will allow customers to mask data residing in specific columns. It’s also adding support for external tokenization services.
- Data Marketplace. Snowflake is changing the name of the Snowflake Data Exchange, which was unveiled last year, to the Snowflake Data Marketplace.
“Snowflake is aggressively developing its cloud data platform to ever better support its use as a data cloud,” Slootman says. “That means many more scale and performance enhancements, a greatly expanded scope of workload types, and a world-class data governance capability.”