February 13, 2018

Snowflake Taps Qubole for Deep Machine Learning in the Cloud

Alex Woodie

Organizations storing big data in Snowflake’s cloud data warehouse can now run machine learning and deep learning algorithms against that data thanks to a new partnership with Qubole.

The two companies today announced a partnership that will allow Qubole’s big data processing engines, including Apache Spark and TensorFlow, to read and write data to Snowflake’s data warehouse.

While Snowflake customers ostensibly had access to the Apache Spark framework through a data connector, the integration with Qubole‘s platform will make it much easier for customers to access Spark capabilities, says Davis Hsieh, Qubole’s senior vice president of marketing.

“First of all we think we have the best Spark engine in the cloud,” Hsieh says. “They have a generic Spark connector technology, but the way it was implemented required a lot of deployment and configuration effort on the part of the customer.”

Handling security and sign-on credentials was the major bugaboo, while the need to stand-up and manage a Spark cluster was also a concern. “Snowflake wanted something as easy to use as their cloud data warehouse but to be able to stand up a Spark cluster and use it for machine learning,” he says.

The key to the integration between Qubole’s big data service and Snowflake’s data warehouse is the addition of Snowflake credential management functionality in the Qubole platform. “Now it’s a completely turnkey capability,” Hsieh says. “You literally just point your notebook where you’re going to use Spark as a processing engine to Snowflake as a data source, and everything else happens seamlessly behind the scenes.”

The partnership gives joint customers an “optimized” environment for using Spark and Snowflake together, says Snowflake Vice President of Alliances Walter Aldana. “Customers can now benefit through our integration with Qubole to transform an organization’s decision-making abilities through more advanced analytics,” he stated in a press release.

At a technical level, the integration uses Qubole’s Dataframe API for Apache Spark, and is exposed to Scala and Python. Both companies already exist in the cloud, so there is minimal data movement required on behalf of joint customers. Snowflake is available on Amazon Web Service and Microsoft Azure, while Qubole runs on AWS, Azure, Google Cloud Platform, and Oracle Cloud.

Hseih sees Snowflake customers using Qubole in two main ways. The first one is accessing machine learning and deep learning capabilities in Spark and TensorFlow. The second one involves accessing more advanced ETL capabilities.

“If you have a complex ETL workflow and you want to have access to Airflow and you basically want processing capabilities of Spark to do more advanced types of ETL – data wrangling, data augmenting – that’s the bread and butter for Qubole,” Hseih says.

Just as Snowflake customers could find some way to bring Spark and Tensorflow to bear against their cloud data warehouse (such as AWS Redshift), Qubole customers can pit their big data computational power against other cloud data warehouses. But Snowflake is the only cloud data warehouse where Qubole has “an active partnership,” Hseih says.

The integration between the two environments is available now as a feature in Qubole Data Service, Enterprise Edition.

Related Items:

Anatomy of a Hadoop Project Failure

Workload-Aware Auto-Scaling; A New Paradigm for Big Data Workloads

Why the Cloud and Big Data? Why Now?

Applications: Artificial Intelligence, Enterprise Analytics

Technologies: Cloud

Sectors: Financial Services

Vendors: qubole, Snowflake Computing

Tags: cloud computing, cluster, deep learning, machine learning, Spark, TensorFlow

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

May 15, 2024

May 14, 2024

May 13, 2024

May 10, 2024

May 9, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024