Follow Datanami:
December 11, 2020

LinkedIn’s Translation Engine Linked to Presto

An SQL translation engine unveiled this week by LinkedIn is integrated with other open-source SQL query engines like Presto in a combination aimed at bulging data lakes.

The Microsoft unit’s Coral engine handles analysis and rewrite along with translation duties. Along with Presto, Coral integrates with Spark and Pig. LinkedIn said Thursday (Dec. 10) Coral is also integrated with Dali Catalog, LinkedIn’s data access tool that defines and “evolves” a data set.

LinkedIn describes Dali Catalog as its common data layer enabling high-velocity data while abstracting the details of data access from compute engines.

The catalog includes Dali tables and views. The latter is a “relation” that that refers to logic applied on base tables. Dali views enable data transformation, cleaning and aggregation from multiple sources as well as adding semantic meaning to data, LinkedIn said in a blog post unveiling Coral.

Dali views also are readable in Hive, Spark Pig, and Presto.

Coral is designed to make Dali views “more user-friendly, agile, secure and portable,” its maintainers added. Portability, meaning the query definition language is not tied to the underlying engine, is achieved through view virtualization, view transition and rewrite as well as Coral’s integration with Presto, Spark and Pig.

For view virtualization, the Coral module is used to access database, table and view information. The Dali Catalog uses a Coral Hive module to interface with view definitions stored in Hive. The module also houses a parser, validator and converter to handle representations of Hive query language view definitions.

Source: LinkedIn

“Coral rewrites view definitions into a number of engine-compliant languages and SQL dialects” LinkedIn said. “During that rewrite, it maps functions to their equivalent ones in the target engines so they are semantically equivalent.”

Meanwhile, integration with Presto rewrites Dali view definitions to a Presto-compliant SQL query. Similarly, the Coral Spark implementation rewrites to the Spark engine.

LinkedIn said it has worked with the Presto community to integrate Coral functionality into the Presto Hive connector, a step that would enable the querying of complex views using Presto.

Ongoing enhancements include developing more “frontend” query APIs, including those suitable for querying graph data and defining machine learning features. LinkedIn said work is also underway to support Dali views in streaming and online query engines.

The Coral code repository on GitHub is here.

Recent items:

Will the Presto Community Ever Be United Again?

Ahana Goes GA with Presto on AWS

How Facebook Accelerates SQL at Extreme Scale