July 22, 2019

LinkedIn Unleashes ‘Nearline’ Data Streaming

George Leopold

via Shutterstock

LinkedIn is releasing its Brooklin data ingestion service to the open source community.

Brooklin has been running in production on the social media platform since 2016. The stateless and distributed service is used primarily for streaming data in near real time—also known as “nearline”—at scale. LinkedIn estimates the service handles more than 2 trillion messages per day as well as thousands of data streams.

The impetus for developing the data service was driven in part by growing demand for low-latency data pipelines that would scale. “Moving massive amounts of data reliably at high rates was not the only problem we had to tackle,” said LinkedIn’s Celia Kung.

“Supporting a rapidly increasing variety of data storage and messaging systems has proven to be an equally critical aspect of any viable solution,” she added in a blog post announcing the open-source release of Brooklin.

LinkedIn has been using Brooklin to stream across a variety of data sources, including Expresso and Oracle along with messaging systems ranging from Amazon Web Services Kinesis and Microsoft Azure Event Hubs to Kafka.

A hypothetical example of a single Brooklin cluster being used as a streaming bridge to move data from AWS Kinesis into Kafka and data from Kafka into Azure Event Hubs. (Source: LinkedIn)

Among the use cases for Brooklin are as a “streaming bridge” and an upgraded version of Kafka mirroring. In the first use case, LinkedIn touts Brooklin as a means of streaming data across different cloud services like AWS Kinesis or Azure. It can also move data between different clusters within a datacenter or across different datacenters.

That feature allows application developers to focus on data processing rather than data movement. The service can also be configured to stream incoming data in a specified format while encrypting outgoing data, the company said.

In the Kafka mirroring scenario, LinkedIn said it previously used Kafka feature called MirrorMaker to shift data among different Kafka clusters. Brooklin allowed developers to consolidate Kafka mirroring into a single steaming data service. The Microsoft unit also uses Brooklin to move large volumes of Kafka data between its internal cloud and Azure.

The tool is used to mirror trillions of LinkedIn messages each day, the company noted.

A related multi-tenancy capability in Brooklin addresses a limitation in Kafka MirrorMaker in which each cluster can only be configured to mirror data between two Kafka clusters. Brooklin is designed to handle several independent data pipelines concurrently, meaning a single Brooklin cluster can synchronize multiple Kafka clusters.

Further, Brooklin’s mirroring feature can detect errors at a partition level and automatically pause mirroring when errors arise.

The data streaming systems also is promoted as providing better isolation of computing resources among applications and online storage. That feature is said to allow applications to scale independently of databases, thereby reducing the risk of a database failure.

The source code for Brooklin is available now on Github.

Recent items:

Kafka in the Cloud: Who Needs Clusters Anyway?

No Time Like the Present for AI: The Journey to a Successful Deployment

Applications: Complex Event Processing, Enterprise Analytics

Technologies: Cloud, Frameworks, Storage

Sectors: Other

Vendors: AWS, LinkedIn, Microsoft

Tags: Brooklin, Celia Kung, data streaming, GitHub, Kafka, LinkedIn, MirrorMaker, near line

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

LinkedIn Unleashes ‘Nearline’ Data Streaming

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

LinkedIn Unleashes ‘Nearline’ Data Streaming

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link