MemSQL Adds Spark Pipeline
A Spark “Streamliner” introduced this week by in-memory database vendor MemSQL aims to provide Spark users quick access to real-time analytics and transactions.
San Francisco-based MemSQL said Wednesday (Sept. 24) its new platform addresses the growing enterprise need to “synthesize varied data types,” including historical data. Hence, its real-time data pipeline between Apache Spark and MemSQL is intended as an easier way to deploy multiple pipelines as a way to keep up with dynamic data flows.
The Streamliner tool is designed as a single-click deployment of integrated Apache Spark “to eliminate the pain of batch ETL,” the company said. A web-based user interface is designed to allow for multiple real-time data pipelines.
The tool also capitalizes on Apache Spark’s inroads in the enterprise. In June, for example, IBM said it would integrate the open source in-memory processing framework into the “core” of its analytics and commerce platforms. It also said it would work closely with Databricks, the company formed by the creators of the analytics engine.
Databricks released Apache Spark 1.4 in June.
Eric Frenkiel, CEO at MemSQL, said the company’s Spark integration would allow enterprise to move beyond “many narrow purpose solutions to fewer multi-purpose solutions.” Frankiel added in a statement: “Our vision is to operationalize Spark for a wide range of use cases so customers and partners can easily take advantage of the data processing framework available in Spark and spend their time gaining actionable insights from data.”
The ability to deploy and manage multiple real-time pipelines with a single interface and shared resource pool is expected to benefit applications ranging from trading analytics and cyber-security to omnichannel retail and Internet of Things use cases, the company said.
MemSQL said its Spark Streamliner could support thousands of simultaneous users running real-time analytics queries. The platform also is touted as reducing latency to stream data directly into the MemSQL database across memory-based row store or disk-based columnar store.
MemSQL is among an emerging class of in-memory relational databases gaining momentum for their capability to ingest and analyze large amounts of data in near real time. Unveiled last year, MemSQL 3.0, added a new flash-based columnar store designed to add storage and analysis of historical data.
The in-memory database is intended to address a familiar problem: organizations have previously relied on big data warehouses or Hadoop to crunch large volumes of historical data and to create data models. After being created by a Hadoop cluster or a Teradata warehouse, these data models are then used by operational systems, such as NoSQL databases, to make real time decisions.
However, simply moving data through batch ETL and CDC processes can take many hours, if not days. Moreover, data models contain older data that could translate into missed opportunities. Hence, MemSQL integrates both pieces of the data analytics puzzle–the data model that informs analytic decision making and the operational data store that acts on those decisions–in the same place.
The company said this week its Streamline tool could integrate Apache Spark to provide immediate access to real-time analytics. MemSQL said its Spark Streamliner is available as open source on GitHub. The open source approach is intended to spur development of applications based on real-time data and easy access via transactional SQL.