October 23, 2019

Apache Arrow Takes ‘Flight’ with Big Data Net

George Leopold

via Shutterstock

A data transport framework released by the Apache Arrow community aims to alleviate some of the pain associated with accessing large data sets over networks.

Apache Arrow Flight is described as a general-purpose, client-server framework intended to ease high-performance transport of big data over network interfaces. A recent release of Apache Arrow includes Flight implementations in C++ and Python, the former with Python bindings.

“One of the biggest features that sets apart Flight from other data transport frameworks is parallel transfers, allowing data to be streamed to or from a cluster of servers simultaneously,” Apache Arrow evangelist Wes McKinney explained in a blog post unveiling Arrow Flight. “This enables developers to more easily create scalable data services that can serve a growing client base.”

Developers note that the performance of standard network protocols can vary significantly depending on use case. Flight is designed as a new protocol for data services using the Apache Arrow columnar format as a data representative as well as a public API for developers. The approach seeks to reduce serialization penalties associated with data transport while increasing the overall efficiency of distributed data platforms, McKinney said.

Flight libraries allow developers to roll out networking services capable of sending or receiving data streams. Among the request types provided by a Flight server are lists of available data streams, data stream schema and sending requested data streams to a client.

Early benchmark testing of the C++ version of Flight delivered throughput performance ranging between 2-3 Gb/s, with data transfer rates of about 12 gigabytes in roughly four seconds.

Flight’s proponents note that many distributed database systems transport data sets multiple times in delivering them to clients. That approach “presents a scalability problem for getting access to very large data sets,” McKinney said. “We wanted Flight to enable systems to create horizontally scalable data services without having to deal with such bottlenecks.”

Flight libraries are deemed sufficiently mature for beta users, though developers expect some “minor” API or protocol changes as Flight is wrung out. Examples of a Flight client and server using the Python API are here.

Meanwhile, Arrow community member and data lake specialist Dremio has developed a connector based on Arrow Flight that delivered as much as a 50-fold performance increase over the Open Database Connectivity standard API. McKinney said a data source implementation aimed at Apache Spark users connects to Flight-based network endpoints.

Future development is also expected to focus on creating data services enabled by the data transport scheme. “Since Flight is a development framework, we expect that user-facing APIs will utilize a layer of API veneer that hides many general Flight details and details related to a particular application of Flight in a custom data service,” McKinney added.

Recent items:

Dremio Donates Fast Analytics Compiler to Apache Foundation

Arrow Aims to Defrag Big In-Memory Analytics

Applications: Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Manufacturing, Other, Retail

Tags: Apache Arrow, Apache Arrow Flight, APIs, Arrow Flight, C, network interface, python, Wes McKinney

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Apache Arrow Takes ‘Flight’ with Big Data Net

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Apache Arrow Takes ‘Flight’ with Big Data Net

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link