Follow Datanami:
January 20, 2014

Taking NoSQL to the Next Level with In-Memory Computing

Nikita Ivanov

Real-time insight is quickly becoming less of a differentiator, and more of something that you need in order to simply keep pace. Your customers have too many easy options–you can’t afford to keep anyone waiting. Furthermore, the digital landscape has dramatically reduced the time it takes for the world to find out about a misstep, and for negative sentiment to snowball into a full-blown crisis. Companies can’t afford to sit on top of mounds of data–they need to get immediate, real-time insight from it.

Can NoSQL keep up with the data demands that most organizations either now face, or will shortly face? It’s an interesting question. NoSQL was designed with a more elastic consistency model, making it an option of preference for those seeking design simplicity, horizontal scaling and more granular control over data. Though NoSQL databases can be optimized to perform well enough to support the big data and real-time applications of the recent past, the realities of business today, in terms of sheer volume and velocity, may yet overwhelm NoSQL-based infrastructure.

Every day businesses create 2.5 quintillion bytes of data; emanating virtually any part of an organization where information can be analyzed. For most people, this number is so large that it is difficult to even fathom. For most organizations it’s much more than they can possibly analyze, and winds up being overwhelming rather than helpful. But, if an organization could use NoSQL to harness this data, it would gain incredible insight into its customers, its market and just about everything else that informs business decisions. It could identify problems, opportunities and trends in real time and take immediate action.

Overcoming the Natural Limits of NoSQL

Industries that must increasingly make real-time, data-derived decisions–such as financial services, logistics and global supply chain, oil and gas, and others–may only have a fraction of a second to analyze a wide range of enormous data sets. Processing and deriving live action from streaming data

is a major challenge for most organizations, largely due to the fact that conventional computing has reached its natural limits. It’s too slow and rigid to enable companies to gain the real-time insights that they need from the massive amounts of data they’re taking in.

As a result the majority of NoSQL usage typically falls into the “non-critical” realm. However, its use does not need to be limited in this manner, and recent moves of MongoDB make it clear that many anticipate NoSQL will play a major role a data-driven economy. Despite NoSQL’s limitations, they may in fact be correct. How so? Some forward-thinking companies are beginning to employ in-memory technology to analyze massive quantities of data in real-time. And in-memory technology, it turns out, may be the key that enables organizations to unlock the potential of NoSQL.

Kirill Sheynkman of RTP Ventures defines in-memory computing as: “…based on a memory-first principle utilizing high-performance, integrated, distributed main memory systems to compute and transact on large-scale data sets in real-time – orders of magnitude faster than traditional disk-based systems.”

In laymen’s terms, in traditional computing, you take the data to the computation. In the ‘old days’ this was a somewhat time-consuming, resource-intensive process, but nevertheless something that a competent IT department could pull off, because they were dealing with relatively small amounts of data.

Fast forward today, and add a series of zeros to the size of your data sets, and the old model doesn’t work. It’s like comparing flying to Pittsburgh to traveling to another solar system. In-memory computing reverses the process by bringing the computation to the data. This new model, which is orders of magnitude faster and frees resources, provides a leap forward in performance similar to a jet engine vs. a combustion-driven propeller. In-memory processing ranges between 100x to 1000x faster than disk-based processing.

Pimp My NoSQL

The TV show “Pimp My Ride” takes ordinary automobiles and turns them into super cars. Using in-memory data grids you can do essentially the same for NoSQL, accelerating it to work with in-memory technology to achieve a performance that makes it suitable for mission critical, real-time functions.

Let’s examine, for example, how a natively distributed in-memory database with support for MongoDB driver protocol can be “pimped” to meet extreme demands:

●          Enable configuration on the database level for much easier integration by allowing you to skip code change to user applications. For that matter, ideally, you’ll want user applications to require no changes, continuing to use their native MongoDB drivers for target languages.

●          Keep all data in RAM with a secondary on-demand disk persistence. This allows you to sidestep the resource drains of memory-mapped file paging.

●          Configure your system to work with specific databases or individual collections. By passing through collections that aren’t processed, you’re freeing computing power, rather than tying it up with needlessly redundant processing.

●          Even with unstructured data, field name repetition between documents may create excessive storage overhead. A field compaction algorithm can save up to 40 percent of memory space by internally indexing every field.

●          By allowing an in-memory database to divide its elements into distinct parts through data partitioning and reducing computational overhead by distributing natively, you can scale up to several thousands of nodes in production settings and avoid many of the pitfalls of sharding.

●          Retaining index and data in-memory gives you the ability to run on large commodity clusters, significantly increase performance and scalability for most operations.

While traditional models of computing won’t make a dent in the avalanche of data most companies need to compete, with natively distributed In-memory architecture, organizations can achieve scale-out partitioning, increased performance and improved scalability.


About the Author:

Nikita Ivanov founded GridGain Systems, started in 2007 and funded by RTP Ventures and Almaz Capital. Nikita has led GridGain to develop distributed in-memory data processing technologies.