Nexla Announces Support for Apache Parquet
MILLBRAE, Calif., Nov. 28, 2017 — Nexla, the inter-company Data Operations platform announced support for Apache Parquet, a free and open-source column-oriented data store. With this release, companies can convert nearly any data into Parquet for highly optimized, cost-effective queries in Amazon Athena and Redshift Spectrum.
Nexla’s revolutionary platform can connect to almost any data source containing CSV, XML, EDI, Avro, JSON or any arbitrary text-delimited data, transform it using an intuitive point and click interface, and convert it to Parquet. This allows Nexla users to immediately start querying and gaining insights on data leveraging Amazon Athena and Redshift Spectrum— without engineering effort.
This has important implications for companies leveraging Amazon Athena or Redshift Spectrum, which both allow running queries right out of files on S3. While multiple file formats are supported by these technologies, leveraging Parquet has a significant cost and performance benefit. Both Athena and Redshift Spectrum pricing is based on amount of data scanned for executing a query. For example, if a query runs across 1TB of CSV files and performs a sum on one of the 20 columns, then the entire CSV file is scanned which means the user is billed for the full 1TB of data— even though only a fraction of data was relevant to computing the result.
“If the same data was in columnar format such as Parquet, then only the relevant column from the files would be scanned,” explained Saket Saurabh, CEO & Co-Founder of Nexla. “We are pleased to offer companies the ability to automatically convert data into Parquet, resulting in 45% lower query costs, on average.” Nexla can perform additional optimization by appropriately partitioning data so that, once again, only the relevant data for a query is scanned. Additionally, Nexla’s ability to automatically combine incoming data with disparate structures into a single Parquet format without custom code saves time and money.
Looker, a data analytics platform and Nexla partner, connects directly to Redshift Spectrum and Athena to perform analytics in-database. “Nexla’s ability to convert many data formats into Parquet enables Looker users to run faster, more advanced queries on Athena and Redshift Spectrum. Now, rather than needing a data engineer to maintain ETL scripts and an EMR cluster, data analysts can convert data to Parquet in Nexla’s graphical user interface, meaning a wider audience can leverage the speed of Parquet. In several use-cases, we’ve seen query speed increase at least 25x! We’re thrilled to work with Nexla’s DataOps platform to help our customers drive more advanced business insights with Looker,” said Dillon Morrison, Looker’s Data Platform Lead.
Parquet conversion is available now in the Nexla inter-company Data Operations platform.
Nexla is a scalable Data Operations platform that can manage inter-company data collaboration securely and in real-time. Nexla automates DataOps so companies can quickly derive value from their data, with minimal engineering required. Our secure platform runs in the cloud or on-premise. It allows business users to send, receive, transform, and monitor data in their preferred format via an easy to use web interface. Learn more at nexla.com.