Follow Datanami:
February 2, 2016

Redis Connector Aims to Boost Spark Performance

Adoption of the Apache Spark big data framework continues to build momentum with the release of NoSQL leader Redis Labs’ integration with Spark SQL along with the release of its Spark connector package.

Redis Labs said Tuesday (Feb. 2) its Spark connector package is being released as an open source option that includes a library for writing to and reading from a Redis cluster. The release includes access to all data structures from Spark as a resilient distributed dataset (RDD) API. Redis, Mountain View, Calif., also said its connector package provides closer alignment between Spark and Redis clusters, which is intended to reduce network overhead while improving processing performance.

The company claimed its performance benchmark based on time-series data revealed that Spark running on Redis as a data store yielded processing speeds as much as 135 times faster than Spark using the Hadoop Distributed File System. Redis said Spark ran as much as 45 times faster on its platform than the Tachyon in-memory file system with Spark storing data in an on-heap data structure.

Redis said other advantages of using Spark with its platform include a more than 100-fold increase in Spark performance in applications such as Spark time-series used to gather a large sequence of measurements over time. The company also said its data structures allow data elements to be accessed individually, thereby reducing serialization/deserialization overhead. That feature also reduces requirements for transferring large data batches.

Yiftach Shoolman, cofounder and CTO of Redis Labs, emphasized that the company’s Apache Spark connector addresses the growing demand to extract big data insights in real time. Hence, the company focused on fine-tuning its distributed memory capabilities to accelerate Spark performance.

“Our goal is to make Redis the de-facto data store for any Spark deployment,” Shoolman noted in a statement.

Hence, the Redis cluster can be used as a distributed memory infrastructure for Spark. The company also said the combination would enable its data structures when exposed via Spark RDD and the DataSet API. Databricks Inc., which was founded by the creators of Spark, included the DataSet API in the 1.6 release of Spark.

San Francisco-based Databricks said it worked closely with Redis Labs to develop the connector package with the goal of delivering real-time analytics.

Redis also said its integration with Spark would enable Spark SQL support via the DataFrame and DataSet APIs as a standard query interface.

Future enhancements to the Spark-Redis connector package include using it new use cases such as graph computation and machine learning, Redit added.

Meanwhile, the Apache Spark community gathers in New York City from Feb. 16-18 to convene a summit focusing on advances in the open source processing engine, Spark SQL and Spark streaming.

Recent items:

Lifting the Fog of Spark Adoption

3 Major Things You Should Know About Apache Spark 1.6

Spark Streaming: What Is It and Who’s Using It?

Datanami