Follow Datanami:
February 13, 2018

Snowflake Taps Qubole for Deep Machine Learning in the Cloud

Organizations storing big data in Snowflake’s cloud data warehouse can now run machine learning and deep learning algorithms against that data thanks to a new partnership with Qubole.

The two companies today announced a partnership that will allow Qubole’s big data processing engines, including Apache Spark and TensorFlow, to read and write data to Snowflake’s data warehouse.

While Snowflake customers ostensibly had access to the Apache Spark framework through a data connector, the integration with Qubole‘s platform will make it much easier for customers to access Spark capabilities, says Davis Hsieh, Qubole’s senior vice president of marketing.

“First of all we think we have the best Spark engine in the cloud,” Hsieh says. “They have a generic Spark connector technology, but the way it was implemented required a lot of deployment and configuration effort on the part of the customer.”

Handling security and sign-on credentials was the major bugaboo, while the need to stand-up and manage a Spark cluster was also a concern. “Snowflake wanted something as easy to use as their cloud data warehouse but to be able to stand up a Spark cluster and use it for machine learning,” he says.

The key to the integration between Qubole’s big data service and Snowflake’s data warehouse  is the addition of Snowflake credential management functionality in the Qubole platform. “Now it’s a completely turnkey capability,” Hsieh says. “You literally just point your notebook where you’re going to use Spark as a processing engine to Snowflake as a data source, and everything else happens seamlessly behind the scenes.”

The partnership gives joint customers an “optimized” environment for using Spark and Snowflake together, says Snowflake Vice President of Alliances Walter Aldana. “Customers can now benefit through our integration with Qubole to transform an organization’s decision-making abilities through more advanced analytics,” he stated in a press release.

At a technical level, the integration uses Qubole’s Dataframe API for Apache Spark, and is exposed to Scala and Python. Both companies already exist in the cloud, so there is minimal data movement required on behalf of joint customers. Snowflake is available on Amazon Web Service and Microsoft Azure, while Qubole runs on AWS, Azure, Google Cloud Platform, and Oracle Cloud.

Hseih sees Snowflake customers using Qubole in two main ways. The first one is accessing machine learning and deep learning capabilities in Spark and TensorFlow. The second one involves accessing more advanced ETL capabilities.

“If you have a complex ETL workflow and you want to have access to Airflow and you basically want processing capabilities of Spark to do more advanced types of ETL – data wrangling, data augmenting – that’s the bread and butter for Qubole,” Hseih says.

Just as Snowflake customers could find some way to bring Spark and Tensorflow to bear against their cloud data warehouse (such as AWS Redshift), Qubole customers can pit their big data computational power against other cloud data warehouses. But Snowflake is the only cloud data warehouse where Qubole has “an active partnership,” Hseih says.

The integration between the two environments is available now as a feature in Qubole Data Service, Enterprise Edition.

Related Items:

Anatomy of a Hadoop Project Failure

Workload-Aware Auto-Scaling; A New Paradigm for Big Data Workloads

Why the Cloud and Big Data? Why Now?