Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report


February 04, 2013

New Ropes for Scaling the SQL Wall


Surveying the database landscape requires more than one pair of binoculars. Two camps with established roots in enterprise soil—SQL and increasingly, NoSQL—have expanded and encroached on one another’s territory, in part because they are still somewhat reliant on one another to survive. While one boasts features honed over years of development, the other flies the bright flag of scalability.

In the war being raged against big data, however, there might not be a clear victor. In fact, some have made the argument that companies need both camps to scale into a new era and colonize the uncivilized masses of teeming, wild data—much of it galloping into real-time application engines.

One such company keen to make such an argument is Splice Machine, which is fresh off a funding round and ready to ride into the sunset with what it calls “the first SQL-compliant database” designed specifically for big data applications.

The rollout of their Splice SQL Engine was set to coincide with the demand for a massively scalable database that chucks the “compromises” of NoSQL or traditional relational databases. With an HBase backbone supporting big data access without requiring a clunky rewrite of existing SQL applications, the company feels it’s able to address the big problems with big enterprise data struggles; scalability, inflexible schemas, high availability, transaction capabilities and trusty SQL optimizations.

We’ve heard talk about these database limitations before, but in this case, it’s worthwhile to see the bigger picture from the standpoint of a company that teeters on the edge of the SQL/NoSQL bridge. In a recent interview, the Splice Machine CEO and co-founder, Monte Zweben, put forth his own definition of the big data applications his company is targeting. The former NASA Ames AI Deputy Branch Chief and software startup soldier claims that his definition is more specific than general “big data” definitions because it is based on a very particular set of needs from users. Specifically, that many companies already have extensive investments in SQL (everything from a number of existing applications to trained personnel), but are hitting the SQL wall on the data volume and complexity front.

“The NoSQL community threw out the baby with the bath water. They got it right with flexible schemas and distributed, auto-sharded architectures, but it was a mistake to discard SQL,” said Zweben. “The Splice SQL Engine enables companies to get the cost-effective scalability, flexibility and availability their Big Data, mobile and web applications require – while capitalizing on the prevalence of the proven SQL tools and experts that are ubiquitous in the industry.”

The Splice Machine definition of big data application is worth paying attention to since it breaks from the tired old framework we’re all used to hearing in the BD conversation.  They point to the new breed of enterprise apps that require the sharding or distribution of data across a commodity cluster. Zweben says these applications require the ability perform all CRUD (create, read, update and delete) operations, scaling from a few terabytes into petabytes. Just as important, they need to be able to scale to the petabyte level and beyond without losing the all-important features of a time-tested SQL approach.

With that in mind, their focus on the term “SQL-compliant” has a bit more context. They refer to the database features that developers expect from traditional relational databases, including real-time updates, full SQL support, secondary indices, as well as transactional and join capabilities. The goal is to help developers avoid having to develop these features (often sub-optimally) in their own application code while taking advantage of the benefits SQL provides.

For instance, Zweben says that when it comes to real-time updates, analytic databases that require a re-run of their batch ETL to make a single update aren’t appropriate for most real-time applications. Similarly, many users need the ability to create secondary indices on any column in order to run flexible and high performance queries.

The company is seeing that terabyte-scale, read-only analytical applications are more prevalent, the folks they’re speaking with are looking at incredible data growth and are dreading an impending “forklift upgrade” to keep pace, particularly for performance-hungry real-time applications.

With the users they’re working in mind who are up against a SQL wall, there aren’t any truly workable solutions on the level of what they’ve been cooking. And further, they claim that nothing out there is really SQL-compliant while still addressing the brick wall. For instance, they claim that traditional RDBMSs are obviously SQL-compliant, but they often fail to scale past a terabyte without resorting to manual sharding or specialized hardware. The “big data” databases out there can indeed move past the petabyte barrier, but Zweben says like NoSQL databases, they often have poor SQL-compliance because there are large gaps on the transactional, real-time updating and full SQL language support sides. And when it comes to the NoSQL camp, well the name bars any likeness to SQL and further, says Zweben, these do not—contrary to popular opinion—have the ability to scale at the petabyte field.

“For instance,” argued Zweben, “consider Cassandra, probably the most scalable NoSQL database. It has limited SQL compliance, no joins, no transactions, and weak (eventual) consistency.” He also notes that the largest known Cassandra cluster has over 300 TB of data in over 400 machines, a fact that Apache shares on its Cassandra page.

In the end, however, Zweben says it’s not a simple matter of customers choosing SQL over NoSQL. The brains behind Splice Machine says that it’s more like an issue of customers wanting SQL and NoSQL. “Customers mostly want NoSQL for its scalability (and sometimes schema flexibility). However, since customers have huge investments in SQL already—existing applications, BI tools, SQL analysts and SQL developers—they also want the [SQL] capabilities like joins, strong consistency and transactions that are invaluable and very expensive and risky for each developer to implement individually.”

With this in mind, they’re making the case that companies want to tap into the scalability of NoSQL but with the familiarity and reliability of SQL. “Since we’re built on top of a NoSQL database, we’re bringing the best of both worlds,” adds Zweben.

The team is trying to tap into those two worlds in enterprise IT at the moment—Hadoop (and the companion database pieces) and trusty SQL. They claim they are drafting off the market momentum of Hadoop with their platform’s HBase foundations. This makes decent sense, since there are a number of companies that have climbed aboard the Hadoop express and have an existing HBase deployment but want to find a captain-like interface that speaks SQL.

Further, this blend of the two approaches to big data could hold appeal for companies that are tapping into NoSQL databases that are finding themselves re-implementing what Zweben calls a “poor man’s version” of features like transactions and joins in each application. In fact, the company says they’ve been encountering this half-eared attempt far more than they would have expected.

Other than being rid of the manual sharding for the RDBMS users, there is another piece the company says needs considered, at least from the industry perspective. They pointed to a number of applications, especially real-time personalization in web commerce, personalized treatments through electronic medical records and smart meter applications that require the balance between the scalability and functionality that they’re hoping to provide.

Related Articles

Top Funded Big Data Startups

MapR Traces New Routes for HBase

HadoopWorld Special: This Week's Big Data Top Ten

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

Big Data, Big Brains – Sponsored By NetApp

04/22/2013 | NetApp

Big data has proven to be one of the most promising yet challenging technologies for both government and industry. But, before IT leaders can harness the full potential of big data, there are key issues to address surrounding infrastructure, storage, personnel, and training.
MeriTalk surveyed 17 visionary big data leaders to find out what they see as the big data challenges and opportunities as well as how government can best leverage big data. Download the “Big Data, Big Brains Report”.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event