DataTorrent
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Leverage Big Data'14

February 04, 2013

New Ropes for Scaling the SQL Wall


Surveying the database landscape requires more than one pair of binoculars. Two camps with established roots in enterprise soil—SQL and increasingly, NoSQL—have expanded and encroached on one another’s territory, in part because they are still somewhat reliant on one another to survive. While one boasts features honed over years of development, the other flies the bright flag of scalability.

In the war being raged against big data, however, there might not be a clear victor. In fact, some have made the argument that companies need both camps to scale into a new era and colonize the uncivilized masses of teeming, wild data—much of it galloping into real-time application engines.

One such company keen to make such an argument is Splice Machine, which is fresh off a funding round and ready to ride into the sunset with what it calls “the first SQL-compliant database” designed specifically for big data applications.

The rollout of their Splice SQL Engine was set to coincide with the demand for a massively scalable database that chucks the “compromises” of NoSQL or traditional relational databases. With an HBase backbone supporting big data access without requiring a clunky rewrite of existing SQL applications, the company feels it’s able to address the big problems with big enterprise data struggles; scalability, inflexible schemas, high availability, transaction capabilities and trusty SQL optimizations.

We’ve heard talk about these database limitations before, but in this case, it’s worthwhile to see the bigger picture from the standpoint of a company that teeters on the edge of the SQL/NoSQL bridge. In a recent interview, the Splice Machine CEO and co-founder, Monte Zweben, put forth his own definition of the big data applications his company is targeting. The former NASA Ames AI Deputy Branch Chief and software startup soldier claims that his definition is more specific than general “big data” definitions because it is based on a very particular set of needs from users. Specifically, that many companies already have extensive investments in SQL (everything from a number of existing applications to trained personnel), but are hitting the SQL wall on the data volume and complexity front.

“The NoSQL community threw out the baby with the bath water. They got it right with flexible schemas and distributed, auto-sharded architectures, but it was a mistake to discard SQL,” said Zweben. “The Splice SQL Engine enables companies to get the cost-effective scalability, flexibility and availability their Big Data, mobile and web applications require – while capitalizing on the prevalence of the proven SQL tools and experts that are ubiquitous in the industry.”

The Splice Machine definition of big data application is worth paying attention to since it breaks from the tired old framework we’re all used to hearing in the BD conversation.  They point to the new breed of enterprise apps that require the sharding or distribution of data across a commodity cluster. Zweben says these applications require the ability perform all CRUD (create, read, update and delete) operations, scaling from a few terabytes into petabytes. Just as important, they need to be able to scale to the petabyte level and beyond without losing the all-important features of a time-tested SQL approach.

With that in mind, their focus on the term “SQL-compliant” has a bit more context. They refer to the database features that developers expect from traditional relational databases, including real-time updates, full SQL support, secondary indices, as well as transactional and join capabilities. The goal is to help developers avoid having to develop these features (often sub-optimally) in their own application code while taking advantage of the benefits SQL provides.

For instance, Zweben says that when it comes to real-time updates, analytic databases that require a re-run of their batch ETL to make a single update aren’t appropriate for most real-time applications. Similarly, many users need the ability to create secondary indices on any column in order to run flexible and high performance queries.

The company is seeing that terabyte-scale, read-only analytical applications are more prevalent, the folks they’re speaking with are looking at incredible data growth and are dreading an impending “forklift upgrade” to keep pace, particularly for performance-hungry real-time applications.

With the users they’re working in mind who are up against a SQL wall, there aren’t any truly workable solutions on the level of what they’ve been cooking. And further, they claim that nothing out there is really SQL-compliant while still addressing the brick wall. For instance, they claim that traditional RDBMSs are obviously SQL-compliant, but they often fail to scale past a terabyte without resorting to manual sharding or specialized hardware. The “big data” databases out there can indeed move past the petabyte barrier, but Zweben says like NoSQL databases, they often have poor SQL-compliance because there are large gaps on the transactional, real-time updating and full SQL language support sides. And when it comes to the NoSQL camp, well the name bars any likeness to SQL and further, says Zweben, these do not—contrary to popular opinion—have the ability to scale at the petabyte field.

“For instance,” argued Zweben, “consider Cassandra, probably the most scalable NoSQL database. It has limited SQL compliance, no joins, no transactions, and weak (eventual) consistency.” He also notes that the largest known Cassandra cluster has over 300 TB of data in over 400 machines, a fact that Apache shares on its Cassandra page.

In the end, however, Zweben says it’s not a simple matter of customers choosing SQL over NoSQL. The brains behind Splice Machine says that it’s more like an issue of customers wanting SQL and NoSQL. “Customers mostly want NoSQL for its scalability (and sometimes schema flexibility). However, since customers have huge investments in SQL already—existing applications, BI tools, SQL analysts and SQL developers—they also want the [SQL] capabilities like joins, strong consistency and transactions that are invaluable and very expensive and risky for each developer to implement individually.”

With this in mind, they’re making the case that companies want to tap into the scalability of NoSQL but with the familiarity and reliability of SQL. “Since we’re built on top of a NoSQL database, we’re bringing the best of both worlds,” adds Zweben.

The team is trying to tap into those two worlds in enterprise IT at the moment—Hadoop (and the companion database pieces) and trusty SQL. They claim they are drafting off the market momentum of Hadoop with their platform’s HBase foundations. This makes decent sense, since there are a number of companies that have climbed aboard the Hadoop express and have an existing HBase deployment but want to find a captain-like interface that speaks SQL.

Further, this blend of the two approaches to big data could hold appeal for companies that are tapping into NoSQL databases that are finding themselves re-implementing what Zweben calls a “poor man’s version” of features like transactions and joins in each application. In fact, the company says they’ve been encountering this half-eared attempt far more than they would have expected.

Other than being rid of the manual sharding for the RDBMS users, there is another piece the company says needs considered, at least from the industry perspective. They pointed to a number of applications, especially real-time personalization in web commerce, personalized treatments through electronic medical records and smart meter applications that require the balance between the scalability and functionality that they’re hoping to provide.

Related Articles

Top Funded Big Data Startups

MapR Traces New Routes for HBase

HadoopWorld Special: This Week's Big Data Top Ten

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

ISC'14

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event