March 11, 2020

Alluxio Unveils its Alluxio Structured Data Service

SAN MATEO, Calif., March 11, 2020 – Alluxio, the developer of open source cloud data orchestration software, announced the availability of Alluxio Structured Data Service (SDS) featuring a data Catalog Service and Transformation Service, two new major architectural components of its Data Orchestration Platform. Data engineers, architects and developers can now spend less resources storing data and more time delivering data to analytical compute engines.

As users and enterprises leverage widely-available analytics engines such as Presto, Apache Spark SQL or Apache Hive, they often run into inefficient data formats and face performance challenges. Typically, those engines consume structured data in different databases with “tables” consisting of “rows” and “columns”, rather than “offset” and “length” in files or objects. This gap creates multiple challenges and inefficiencies, such as mappings or creating converted copies of the data. With this announcement, users benefit from a more simplified data platform that enables connections to different catalogs for access to structured data, with less copies and pipelines and more compute-optimized data.

“Alluxio now provides just-in-time data transform of data to be compute-optimized, independent of the storage format for OLAP engines, such as Presto and Apache Spark,” said Haoyuan Li, founder and CTO, Alluxio. “These schema-aware optimizations are made possible with the new Alluxio Catalog Service which abstracts the widely-used Apache Hive Metastore, so regardless of how the data was initially stored – CSV and text formatted files, for example – the data is now transformed into the generally recognized compute-optimized parquet format. Almost every organization has a surprising amount of data in CSV or other text formats and this removes the manual work to make that data more usable. A second type of transformation will coalesce many smaller files, enabling the data to be combined into fewer files, which is more efficient to process for SQL engines. And yet a third type of transformation is for sorting, enabling table columns to be sorted adding to the efficiency of queries, newly available in our Enterprise Edition. ”

“We can thank Kubernetes for distributed compute; and Alluxio for distributed data. The combination of these technologies offers tremendous promise for our data-driven hybrid and multicloud future,” said Eric Kavanagh, CEO, Bloor Group.

Alluxio Structured Data Service

With Alluxio Structured Data Service, Alluxio can expose the data to be effectively accessed by the SQL engines, independent of how and where the data is stored. New capabilities and services include:

Presto Connector for Alluxio – A new Presto connector for Alluxio is now available. This allows easy integration and configuration of Alluxio with Presto.
Catalog Service – The new Alluxio Catalog Service manages the metadata of structured data in the system. It is responsible for all the database, table, and schema information, as well as the location of all the stored data. There is no longer a need to change any table locations in the Hive metastore, or to restart or reconfigure any Hive services. The Alluxio Catalog Service enables schema-aware optimizations for any type of structured data. For example, once the Hive metastore is attached to the Alluxio Catalog Service, the catalog service will automatically mount the appropriate table locations, and automatically serve the table metadata with the Alluxio locations.
Transformation Service – The new Alluxio Transformation Service transforms data into a compute-optimized representation of the data, which is independent from the storage-optimized format. This enables physical data independence. Three types of transformations are available for tables: coalesce, format conversion, and sorting. While results depend on the specific formats and workloads, internal tests have shown increase in query performance by over 2.5x.

Availability

Alluxio 2.2 Community and Enterprise Edition with Structured Data Service are generally available for download here: https://www.alluxio.io/download/

About Alluxio

Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to big data and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects. Intelligent data tiering and data management deliver consistent high performance to customers in financial services, high tech, retail and telecommunications. Alluxio is in production use today at seven out of the top ten internet companies. Venture-backed by Andreessen Horowitz and Seven Seas Partners, Alluxio was founded at UC Berkeley’s AMPLab by the creators of the Tachyon open source project. For more information, contact [email protected] or follow us on LinkedIn, or Twitter.

Source: Alluxio

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Alluxio Unveils its Alluxio Structured Data Service

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Alluxio Unveils its Alluxio Structured Data Service

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link