Struggling Under Multiple Databases? Get Used To It
Programmers are a fickle bunch. One day they’re all using Java, and the next day Python is the hot new thing. While developers indelibly will dabble in different languages, we’re now seeing them do similar things with databases, particularly those of a NoSQL bent. The database explosion benefits developers, but it’s a headache for the admins tasked with managing the stuff.
Time was, you installed the Oracle database and were happy with it, damn it. Oracle defined what you could do with the data, and your developers worked within those bounds. While the department across the hall may have used another database, such as Postgres or DB2, it was not radically different. It may have done some things differently and had its own particular schema, but it was still SQL at the end of the day. Everybody was on the same page.
My, how times have changed. Thanks to the rise of big unstructured data, social media, and the changing nature of Web-based consumer apps, the capabilities and requirements of databases have mutated beyond what relational databases can comfortably handle, and as a result, both the type and number of databases has exploded.
New ‘Systems of Intelligence’
Wikibon analyst George Gilbert says the database proliferation is being driven by new systems of intelligence that are handling big and fast unstructured data. Until about 10 years ago, most databases were relational, and were used to power systems of record, such as ERP systems. These systems were rigid and weren’t built to be changed.
“Just adding a single field to a screen form, equivalent to adding a new column in a [MySQL] table, could reduce the server to thrashing and take 5-8 hours for several million records,” Gilbert writes in a recent series titled “Systems of Intelligence Are Driving Database Proliferation.”
Something shifted around 2005, just as the LAMP stack helped unleashed Web 2.0-style apps and Google and Yahoo were exploring new data stores like BigTable and Hadoop. Since then, Gilbert argues, we’ve seen an influx of systems of intelligence designed to “continually evolve and keep track of ever-changing information.” Not surprisingly, these new systems were built on models provided by the “the consumer Internet services vendors.”
“The core of Systems of Intelligence is their ability to anticipate and influence consumer interactions in real-time across distribution channels and touch points,” Gilbert writes. “That capability requires profile data about those consumers’ interactions as well as ambient intelligence about what’s going on around them. In order to continually get smarter about the consumers, enterprises have to collect ever more data about them.”
The ever-changing nature of these apps require databases with flexible schemas. That has powered the growing popularity of MongoDB, which Gilbert dubs “the anti-MySQL.”
Kurt Mackey, CEO of cloud database management vendor Compose, says developers are driving database proliferation by the development choices they make. “Today you’ll see applications that use three or four different databases, just depending on the features they need to deliver to a customer,” he says.
Developers are increasingly drawn to NoSQL databases to take advantage of compelling new capabilities. “Redis is a really good data Swiss Army knife to go with another database,” says Mackey, whose company was just bought by IBM. “We see an awful lot of people using Redis alongside a canonical data store, so they’ll be using MongoDB or Postgres or even something like DynamoDB on Amazon and putting Redis alongside it.”
Companies like Oracle are trying to keep up with the NoSQL Jonses by adding features such as support for time-series data, in-memory processing, and JSON data types. However, it’s a losing proposition for the relational database in the long run because of the compromises that must be made to support the new features, Mackey says.
“Something like time-series is tough because you need an entirely different storage mechanism, an entirely different way of thinking about it. It’s architecturally substantial,” he says. “So while you can do time-series and full text search in most of these database, as a developer you run into these sharp edges very often.”
Those sharp edges weren’t such a big deal in the old days, because there just weren’t as many databases flouting awesome new functionality in developers’ faces. These days, with so many open source NoSQL databases basically giving away their wares, it’s tough for developers to resist.
“If you look at the information you can get from a mobile device these days, such as time-series data or sensor data–all kinds of great stuff that nobody really had access to 10 years ago,” Mackey says. “In the past it was OK because it was so difficult to get another database going. But [not using new databases] definitely is not an ideal development experience. It limits the features and the speed at which you can build applications.”
Paying the Piper
But all that capability coming out of databases today–NoSQL and NewSQL and OldSQL alike—comes at a cost.
“It’s operationally complicated,” Mackey says. “While you might want to add a time-series database for a particular feature, you have to deal with the reality that you may not be able to run the thing. So what can be a quick win for a developer feature-wise can actually be very hard on an organization that doesn’t have the operational chops to run these things.”
Mackey’s solution to this, the Compose pitch as it were, is to acknowledge that individual data silos will probably multiply in the coming years, and to help organizations keep it all in synch. IBM bought Compose in part to get ahold of its lightweight, developer-focused ETL tool that keeps databases synched up without a lot of muss and fuss.
The Compose tool currently supports MongoDB, Postgres, Elasticsearch, and Redis databases, and runs on popular cloud platforms. The company is looking to add support for additional databases (the Neo4j graph database intrigues them) as well as support for IBM’s BlueMix cloud.