October 7, 2019

Data Lakes Get Structured

George Leopold

The explosion of unstructured and partially structured data has made traditional data lakes harder to manage. Adding to the challenge are “brittle” data pipelines that are time-consuming to create as well as ephemeral.

Or to put it another way, “Pipelines Suck,” asserts autonomous dataflow startup Ascend, which is rolling out a “structured data lake” designed to connect existing data processing engines, business intelligence tools and notebooks on its data management platform.

The startup based in Palo Alto, Calif., emerged in July with its dataflow service designed to allow data engineering teams to build and scale Apache Spark-based data pipelines. Ascend claims its service enables pipeline creation with 85 percent less code and reduces the time from prototype to production by 90 percent.

The unstructured data lake is touted as addressing the dataflow challenges that often sink AI and big data deployments via a tool for accelerating data development across managed storage. The idea is to provide development teams access to more organized data. “We are eliminating siloed access based on preferred tools or skills,” said Sean Knapp, Ascend’s founder and CEO.

Ascend’s structured data lake is implemented on Amazon Web Services’ (NASDAQ: AMZN) Simple Storage Service (S3) API, an interface the startup said it best suited to working with external data processing platforms. Along with Apache Spark for handling S3 data paths, the new data lake uses the MinIO open-source protocol designed for AI object storage to simplify implementation of the S3 API layer.

In Ascend’s framework, MinIO handles processing of the S3 API protocol, so only logic need be implemented to map virtual data paths and the underlying objects.

Other capabilities included in the structured data lake include automated storage maintenance, de-duplication of redundant storage and operations along with tighter management of all data and updates.

“Managed data is unified and dynamically synchronized with the pipelines that operate on it,” the company noted in a blog post. Those capabilities would allow data scientists and engineers to “build on top of a common data lake that automatically ensures data integrity, tracks data lineage, and optimizes performance.”

Ascend announced a Series A funding round in July led by Accel with participation from Sequoia Capital, Lightspeed Venture Partners and 8VC. Among the startup’s advisors are Scott McNealy, former CEO of Sun Microsystems and Microsoft (NASDAQ: MSFT) CTO Kevin Scott.

Recent items:

Ascend Launches from Stealth With $19M

Four Ways Automation Can Rescue Your Data Lake

Applications: Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Manufacturing, Other, Retail

Vendors: Ascend, Ascend.io

Tags: Amazon S3, apache spark, data lakes, data pipelines, Minio, Sean Knapp, structured data lake

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Data Lakes Get Structured

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Data Lakes Get Structured

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link