Follow Datanami:
December 9, 2013

DataStax Puts Big Database in Google’s Cloud

Alex Woodie

When it comes to big databases running in the cloud, Amazon Web Services (AWS) has a virtual lock on the market. However, today Cassandra DB promoter DataStax and mega datacenter operator Google unveiled a partnership that will see the pair picking away at Amazon’s dominance when it comes to cloud-hosted NoSQL databases for big data workloads.

DataStax says it worked with Google engineers to test and validate its DataStax Enterprise NoSQL database running on the Google Compute Engine (GCE), which is Google’s Linux-based infrastructure as a service (IaaS) offering. Google launched the GCE beta in the summer of 2012 to go up against AWS, and announced the general availability of GCE last week.

Engineers for the two companies tested a variety of configurations of DataStax Enterprise (DSE) running on GCE, to ensure the scalability, reliability, and performance of the setup. In one test, they ran DSE on 100 nodes located in two geographically dispersed Google data centers, and trickle-fed data into the distributed database over a period of 72 hours. Other tests were a bit more stringent, and pushed the limits of disk performance under load, as well as simulating the failure of a node. You can read more about the technical aspects of the DSE tests on GCE at the DataStax Developer Blog.

There are engineering as well as marketing components to the partnership between DataStax and Google, says DataStax vice president of business development Dave Kloc. “A big part of our strategy at DataStax is to build partnerships with cloud and infrastructure as a service providers. This announcement with Google Compute Engine is the first initiative in this strategy,” he told Datanami in an interview.

Executives and engineers alike were enthusiastic over the persisted disk capabilities of GCE. “As I understand it, that is a differentiator for them,” Kloc said. “For our engineers doing the testing, it was one of the things they were very encouraged about and they thought was a good competitive advantage for GCE.”

Being able to use persisted disk is an important piece of what a DSE customer would want to see, Kloc said. Persisting the data to disk eliminates much of the administrative hassle of expanding a distributed NoSQL database cluster or restarting a node after it has failed, without having to replicate data from all over the world. Persistence is especially critical for large-scale transactional applications, and DataStax recommends that most of its big customers have that capability in their underlying infrastructure.

“One of the advantages of the GCE platform is its use of persistent disks,” DataStax engineer Quentin Conner says in the blog post. “When an instance is terminated the data is still persisted and can be re-connected to a new instance. This gives great flexibility to Cassandra users. For instance, you can upgrade a node to a higher CPU/Memory limit without re-replicating the data or recover from the loss of a node without having to stream all of the data from other nodes in the cluster.”

Google, obviously, isn’t the only vendor to support persisted disk in an IaaS offering. Amazon, for example, offers data persistence through its Amazon Elastic Block Storage (EBS) offering, which supports the availability of data across AWS data centers in different geographical regions for the purpose of providing greater scalability and reliability. Kloc did not want to talk about Amazon’s cloud services, which is understandable considering the freshly minted partnership with what is arguably the most dominant Web 2.0 property the world has ever known. However, Kloc hinted that GCE will likely be able to offer enterprise-level persistence at a lower price than Amazon can.

Most of DataStax’s 300-plus customers run their NoSQL database on premise. However, it has a few customers running in the cloud. One of its biggest customers, NetFlix, runs DSE on AWS. For what it’s worth, DataStax does not have a formal partnership in place with Amazon, although that could change. According to Kloc, DataStax has held talks with all of the big IaaS providers recently, including IBM SoftLayer, Microsoft Azure, OpenStack, and RackSpace.

So far, no other IaaS providers have made so much as a dent in the AWS cloud juggernaut. According to Gartner’s latest magical vendor ratings for IaaS, AWS has “more than five times the compute capacity in use than the aggregate total of the other 14 providers” rated in the Magic Quadrant. And no, GCE wasn’t one of the other 14. An outfit called the Synergy Research Group found that AWS increased its quarterly revenue by 55 percent to more than $700 million in the third quarter of 2013, compared to less than $400 million for the aggregate of revenue of Salesforce, IBM, Windows Azure, and Google.

The GA of GCE may be a harbinger of change. “I think the GA of GCE is a demarcation of market eras,” Gartner analyst Lydia Leong wrote in a blog post. “We’re now moving into a second phase of this market, and things only get more interesting from here onwards.” Kloc agrees. “We believe truly…Google being who they are and the way they’ve constructed their Google compute environment, we think they have a really big chance to make a big dent in Amazon’s market share,” he said. 

The GCE is just part of Google’s arsenal of cloud computing products, which also includes Google Application Engine, an application development cloud, and various SQL and NoSQL storage options. There is also BigQuery, a big data analytics tools. Now that GCE is up and running, you can expect more NoSQL databases and possibly even some Hadoop environments to be running on it, and providing much-needed competition to Amazon.

Related Items:

Datastax Seeks to Put NoSQL Clusters on Autopilot

Datastax Gives Startups Free Production Cassandra DB

DataStax Rakes $45 Million; Schemes Growth

Datanami