Put a Data Warehouse In Your Operational Data Store, MemSQL Says
MemSQL is one of a new class of in-memory relational databases that’s gaining momentum for its capability to ingest and analyze large amounts of data in near real time. With today’s unveiling of MemSQL 3.0, the NewSQL database gets a new Flash-based columnar store designed to store and analyze historical data too. It’s all about enabling organizations to make the best decisions as fast as possible.
The new column-oriented data store will complement and co-exist with the row-based storage mechanism that MemSQL used through the first two versions of its product, says MemSQL’s director of product marketing, Mark Horton.
“In memory is really good for rapid ingestion of data, and columnar is good for compression and deeper analysis,” Horton tells Datanami. “Being able to create a single platform where you can run queries on both the row and column store, and be able to get results back in seconds, adds business value.”
MemSQL’s new dual-pronged approach makes sense when one considers the latencies involved in traditional data analytic scenarios. Currently, organizations often rely on big data warehouses or Hadoop to crunch vast reams of historical data and to create data models that describe, for example, online customer behavior. After being created by a Hadoop cluster or a Teradata warehouse, these data models are then used by operational systems, such as NoSQL databases, to make real time decisions.
The problem with this approach is two-fold. First, just moving the data through batch ETL and CDC processes can take many hours, if not days. Secondly, because the data is older, the data model is not as current as it could be, and that could translate into missed opportunities. This is why MemSQL wants to house both pieces of the data analytics puzzle–the data model that informs analytic decision making and the operational data store that acts on those decisions–in the same place.
“We’re really looking to energize peoples’ data warehouses,” Horton says. “We’re more or less the front-line, the point of ingestion where you’re going to run all your mission-critical analytics, operational analytics, or data processes. And then as that data cools down, you can seamlessly transfer that data out of MemSQL into an existing data warehouse or a Hadoop instance, where you can do long-time archival, or even do additional analysis that maybe’s not time-sensitive.”
The columnar data store is a logical extension for MemSQL, which former Facebook engineers Eric Frenkiel and Nikita Shamgunov founded in 2011 with the purpose of delivering a simple, SQL-based data store that could run analytics and transactional workloads. Both MemSQL data stores are accessible through traditional SQL and run on commodity X86 boxes, which the company says gives it an advantage over bigger and more complex relational databases from IBM, Oracle, and SAP that blend OLTP and OLAP workloads.
The company has always focused on analytics, Horton says, but the new column-oriented store brings that focus to new heights. The benefits of such a system were demonstrated with CPXi, a company that holds auctions for Web-based advertisements and which was an early adopter of MemSQL.
According to a newly released case study on CPXi, the latency involved with moving critical data among different systems was hurting business. The old process involved ingesting billions of records collected from Web logs into MemSQL’s row store. The date would then be processed and converted into a flat-file format that was then stored in EC2. The Web log data was then moved into a separate columnar store, where it was used to create pricing models that fueled the real-time bidding for targeted ads.
This data distribution and processing took from 12 to 24 hours to complete, which meant prospective ad buyers were not working with the freshest data. Following the implementation of MemSQL’s column-based data store, this whole process was reduced to a matter of seconds, says Horton, which increased the accuracy of the advertising purchases.
MemSQL is a company to keep an eye on. In late January, it completed a $35 million Series B round of investments led by Accel Partners, the Silicon Valley VC firm that has also has invested in Cloudera and Couchbase. Kevin Efrusy, general partner for Accel, says MemSQL has a place in a “very broad” database landscape.
“We’ve backed …Couchbase on the transactional side and Cloudera, with its enterprise data hub, on the analytic side, but we also believe there is a very distinct place for MemSQL in real-time analytics,” Efrusy said in a statement. “No executive wants to wait for insights anymore, and MemSQL’s in-memory, SQL technology means they don’t have to.”