February 24, 2020

Streamsets Expands Databricks Partnership Extending Ingestion Capabilities for Delta Lake

SAN FRANCISCO, Feb. 24, 2020 — StreamSets, provider of the industry’s first DataOps platform, today announced an expansion of its partnership with Databricks by participating in Databricks’ newly launched Data Ingestion Network. As part of the expanded partnership, StreamSets is offering additional functionality with a new connector for Delta Lake, an open source project that provides reliable data lakes at scale. With it, users can configure their pipelines to write data from any source moving in batch or streaming mode directly into Delta Lake. Now, data teams can deliver all of their data in a shorter time frame, driving BI, analytics and ML.

Today, companies require systems for diverse data applications like real-time monitoring, machine learning and data science — and that can process unstructured data like text, images, video and audio. A decade ago, data lakes replaced data warehouses as the best repositories for this raw data; however, they neither support transactions nor enforce data quality. In addition, they lack consistency, making it almost impossible to mix batch and streaming jobs and appends and reads.

Leveraging the best of data warehouses and data lakes, lakehouses remedy the above limitations, but friction ingesting fresh data remains. With this partnership, Databricks users will now be able to capitalize on the new lakehouse paradigm without the friction previously encountered. They can easily connect into StreamSets Cloud and leverage out-of-the-box connectors to load batch, change data capture (CDC) or streaming data from any source (such as cloud applications, relational data, on-premises data lakes and warehouses) into Delta Lake. With StreamSets, data engineers can easily build and operate data pipelines for modern and legacy data sources to migrate to a lakehouse and continuously refresh with relevant data.

Specifically, the new StreamSets connector for Delta Lake enables several key benefits for even greater operational control over the full life cycle of data:

Faster migration to the cloud with fewer data engineering resources
Drag-and-drop interface to simplify data movement from multiple disparate sources
Improved management of operations and performance for lakehouses
Change-data-capture capability from several data sources into Delta Lake
Built-in Kubernetes containerization and native cloud scaling

Combined with Delta Lake which provides ACID transactions, the connector also makes it possible to unify batch and streaming data to support the timeliness of transactional operations.

“Databricks Ingest brings an opportunity for organizations to build a central lakehouse without worrying about repetitive data movement,” said Michael Hoff, senior vice president of Business Development and Partners at Databricks. “With StreamSets’ expanded support for Delta Lake, small and midsize companies now have an easy way to ingest data from their cloud-based service into Delta Lake so they can maximize their analytics efforts with fresh data in their lakehouse.”

“This connector is another step forward in our alliance with Databricks to deliver more data, faster, to drive traditional BI and machine learning initiatives — which is critical to the survival and success of today’s organizations,” said Jobi George, general manager of Cloud Business at StreamSets. “We’re excited to continue our work with Databricks to drive innovation in the industry.”

The connector is currently available for Databricks customers.

To learn more, save a spot in Databricks’ upcoming webinar Accelerate building lakehouses for Business Intelligence and Machine Learning.

About DataOps

Analytics has modernized in our always-on, always-changing world. How you deliver data to drive analytics has to modernize, too. DataOps is a set of practices and technologies that operationalizes data management and integration to ensure resilience and agility despite ceaseless change. It combines the DevOps principles of continuous delivery with the ability to tame data drift (unexpected and undocumented changes to data). By embedding these principles, DataOps makes it possible to deliver the continuous data needed to drive modern analytics and digital transformation.

About StreamSets

StreamSets built the industry’s first multi-cloud DataOps platform for modern data integration, helping enterprises to continuously flow big, streaming and traditional data to their data science and data analytics applications. The platform uniquely handles data drift, those frequent and unexpected changes to upstream data that break pipelines and damage data integrity. The StreamSets DataOps Platform allows for execution of any-to-any pipelines, ETL processing and machine learning with a cloud-native operations portal for the continuous automation and monitoring of complex multi-pipeline topologies.

Source: StreamSets

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Streamsets Expands Databricks Partnership Extending Ingestion Capabilities for Delta Lake

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Streamsets Expands Databricks Partnership Extending Ingestion Capabilities for Delta Lake

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link