January 10, 2017

Spark Gets In-Memory Boost

George Leopold

(ami mataraj/Shutterstock)

Apache Spark is getting an open source computing and storage boost with its integration with a widely used in-memory data platform.

Hazelcast Inc. said Tuesday (Jan. 10) its in-memory data grid adds connector support for Spark, giving developers access to open source tools for data storage and computing that the company says go beyond the limits of a single Java virtual machine.

The Hazelcast connector leverages a Spark API called Resilient Distributed Dataset (RDD), a distributed collection of data elements partitioned across cluster nodes to, among other things, provide parallel access to data. The combination of RDD with the Hazelcast in-memory grid are designed to serve as the basis for improved distributed computing required for large datasets.

“Any big data solution needs to be able to distribute processing and storage across machines whilst maintaining a flexible and convenient programming interface,” Hazelcast CEO Greg Luck stated in introducing the Spark integration. “Without these functionalities, it becomes impossible to build enterprise applications which are expected to process more and more data.”

Hence, the in-memory specialist based in Palo Alto, Calif., is positioning its Spark entry as an open source alternative for boosting data storage and distributing computing for data streaming, machine learning or crunching SQL workloads. All require fast iterative access to large datasets.

The Spark integration comes one year after Hazelcast released what it described as the “platform” version of its data grid that incorporates support for cloud management and application containers. The grid allows users to share and partition application data across installed clusters and servers.

The company also developed a sports betting application as a way of demonstrating the performance advantages of integrating Spark with its in-memory grid. The “bet engine” was designed to scale across multiple Java virtual machines with events shared across data grid partitions. The query engine used Spark to provide real-time risk and analytics. The combination of in-memory computing and distributed storage along with Spark’s query and analytics capabilities formed the basis for a future gaming application, the company claimed.

The code for the sports betting application is here.

Hazelcast touts the interoperability of its in-memory data grid with a range of programming languages, including Java, Python, R and Scala, which are also supported by Spark. Hence, the company said the combination of Spark and its in-memory data grid could be used across stacks based on multiple programming languages.

The integration also underscores how platform developers focusing on big data applications are gradually shifting from current technology such as Hadoop and Storm to Apache Spark’s real-time streaming data capabilities. At the same time, Hazelcast claims its data grid boosts the in-memory performance of applications running in Hadoop clusters.

The company said its in-memory data grid is being shipped as an open source connector in version 3.7 for use as a storage medium for Spark.

Recent items:

Unraveling Hadoop and Spark Performance Mysteries

Overcoming Spark Performance Challenges in Enterprise Hadoop Environments

Applications: Enterprise Analytics

Technologies: Frameworks, Storage

Sectors: Financial Services, Government, Healthcare, Retail

Vendors: Hazelcast

Tags: apache spark, data grid, in-memory, streaming data

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Spark Gets In-Memory Boost

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Spark Gets In-Memory Boost

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link