Ebay: NoSQL and RDBMS Playing Well Together
In another sign that the NoSQL and the relational database worlds will end up playing well together, Ebay spoke last week revealing details of its data warehouse, which it says has one of the largest implementations of Teradata in production.
The story comes courtesy of Matthew Finnegan of Computerworld UK, who reports that at the Financial Information Management (FIMA) event in London last week, Mark Uksusman, senior manager of data architecture at Ebay explained how the auction giant was leveraging both traditional relational database technologies, and NoSQL technologies for its operations.
According to the report, Ebay has two major Teradata clusters. “One a normal data warehouse for its traditional reporting system, involving very structured data, and the other a bespoke platform, called Singularity, developed for deep analytics and data discovery.” In addition, Finnegan reports that Ebay is using open source database software such as Hadoop, MongoDB, and Cassandra.
Claiming to have one of the largest Teradata implementations in the world, Uksusman said that Ebay is processing 90PB of data, and noted that while they may not be optimized 100 percent, most of that data is being crunched by Teradata, with NoSQL databases getting consideration for certain use cases.
“We are one of the biggest Teradata implementations in the world, we are processing 90PB of data,” Uksusman told the conference audience last week. “But are we optimized 100 percent? Maybe not, and maybe we need to think a bit about optimizing our data warehouse and offload Teradata to Hadoop environments which is more flexible and more developed for data discovery.”
“I don’t want to say that it is a 100 percent right solution [using non-relational databases],” said Uksusman according to Finnegan. “If you want to talk about secure transactions, you have to ensure data governance and that your records are accurate. [For this] we are still using relational database management systems, Oracle, Teradata and so on.”
“But if you are looking at something around data discovery,” he continued, “you would like to do some very quick processing and analyze information on the fly, and do some analysis of non-structural information, this is why NoSQL technology is there.”
These comments echo a rising theme in the big data universe that is answering the question as to whether NoSQL technologies, such as Hadoop, can supplant the traditional RDBMS paradigm thanks to its relative cost curves and ability to process large amounts of both structured and unstructured data. And that answer increasingly is, not any more than a Phillips head screwdriver can supplant the need for a flat head screwdriver.
We saw this in practice two weeks ago when Facebook analytics boss, Ken Rudin told the Strata + Hadoop world audience that big data is more than Hadoop, and shared that Facebook, a company that started with relational technologies and moved heavily towards Hadoop, is now beating a path back to relational databases. In an interview with Enterprise Tech’s Timothy Prickett Morgan, Rudin referred to himself as a “born again SQL fan.”
“Way back when the company was just getting started, Facebook was relational,” Rudin told TPM. “The interesting part was when Facebook got serious and started using Hadoop. And we have found that Hadoop is not optimal for everything – and in retrospect that is an obvious conclusion. But there really was an attitude internally, fed by what is going on in the industry as well, that relational has had its day in the sun and everything now needs to move to Hadoop. It is just faster and better, more flexible with no rigid schemas, and so on. We bought into this quite a bit and it took us very, very far, and we could not have gotten to where we are today if we had stayed purely on relational. But every day we want to look at daily and monthly active users by geography and by the type of phone they are using, and Hadoop – particularly with MapReduce – was designed as a generic parallel processing framework and is not optimized for doing these kinds of queries. When you run these traditional business queries – which are not going away – that is the kind of query that relational was optimized for.”
While no vendor in the Hadoop arena ever made the claim that their Hadoop distro would ever be the last word in the enterprise datacenter, it remained an open question due to the meteoric rise of the NoSQL technologies, their relative costs, and the fact that it’s cheaper and easier for startups to get going with a non-relational technology, especially as Hadoop has proliferated into the clouds (not to mention the fact that so many tech journalists like to tweak the noses of establishment giants).
Facebook’s retreat into relational seems to answer the question once and for all: there is no relational database killer; just another tool in the expanding toolkit, in the expanding world of data.