Hadoop Grabs the Spotlight at Teradata User Confab
Teradata and its customer are embracing Hadoop, the open source platform that’s helping to redefine the field of big data analytics. But the tension between the analytic old guard and the big data upstart were palpable at Teradata’s big annual user conference held in Anaheim this week.
Teradata built its stellar reputation with an enterprise-class data warehousing platform used by thousands of large companies over the past 30 years. Pick a Fortune 100 company, and chances are very good that it relies on an enterprise data warehouse (EDW) from the San Diego, California company to turn raw data into operational insights.
But lately, the rise of Hadoop and the disruptive forces of open source big data software have taken some of the shine off Teradata (NYSE: TDC), which hasn’t significantly increased revenue since 2012 and whose stock has lost 64 percent of its value from a mid-2012 high.
For better or for worse, Teradata customers are considering the feasibility of offloading some of the work they do on Teradata EDWs to commodity clusters of cheap X64 servers loaded with the Hadoop stack, and Teradata’s prospects are exploring Hadoop as well. At less than $1,000 per TB of storage, Hadoop has a huge cost advantage over the flagship Teradata EDW, which in 2014 had a list price of between $34,000 and $69,000 per TB.
It doesn’t matter that the comparison between the Teradata 6800 series to SQL on Hadoop offerings is apples to oranges, that Hadoop can’t do many of the things that Teradata does so well out of the box, and that Teradata simply doesn’t do some of the things that Hadoop does too; people are looking at doing it anyway.
Driving Hybrid Hadoop
That’s created a fair bit of tension within Teradata, as it simultaneously seeks to embrace the still-emerging Hadoop stack while protecting its core business from it. That tension was on display at the 2015 PARTNERS show this week, which attracted more than 3,500 customers, partners, and employees to the Anaheim Convention Center.
Among the Teradata customers talking about their Hadoop journeys were retail giant Target and Enterprise Holdings, the massive private company behind Enterprise Rent-A-Car and other car rental businesses. The blue chip companies have been longtime Teradata EDW customers, and are hearing the siren call of Hadoop just like everybody else.
Nicholas Polizzi, a system architect with Enterprise Holdings, shared his
Hadoop lessons during a session at the Teradata conference. As Polizzi explained, the company was looking to complement its existing data warehouse environment with a Hadoop implementation in hopes of delivering better business agility.
In addition to doing data discovery on unstructured and semi-structured data, Polizzi also put the seven-node Hadoop Appliance that Enterprise bought from Teradata to use as a storage repository for “cold” data more than a month old. Previously the company stored that data on tapes, but restoring the data from tapes for reporting purposes was tedious and time-consuming, he said.
Enterprise hooked its Hortonworks distribution of Hadoop up to the 6700 series enterprise data warehouse (EDW) that Enterprise uses for reporting. At a technical level, the company utilized the SQL-H capabilities in version 14.1 of the Teradata Database to extract data from Hadoop into the Teradata EDW; this QueryGrid functionality has been upgraded to support bi-directinal data movement and push-down query processing in version 15.
Getting everything running well was a matter of trial and error for Enterprise, as the company ran into a variety of issues, including with data caching, partition pruning, and complex joins. “It’s currently not a direct replacement for an established relational database management system,” Polizzi said. “‘Don’t assume you can take the same data model you have in really good relational database and forklift that to Hadoop.”
Customers are better off bringing the data down raw instead of trying to flatten the data prior to moving it into the EDW. “That’s important to know,” he said. “You go into there with the assumption that can you dive into relational models on it. [But] it’s not efficient at this time. This stuff is changing so fast that may improve later. But this is what we found on our platform.”
Lost in Translation
Target also shared some of its experiences with Hadoop. The company began scoping out its Hadoop plans in the early summer of 2013, and went live on a small cluster about a year later, according to James Vollmer, a lead engineering consultant with Target who presented at this week’s Teradata show.
As Vollmer explained, Target took a slow and methodical approach to developing new solutions on Hadoop, which the company wanted to adopt for ETL offload and data science purposes involving clickstreams and inventory data.
“Things that we take for granted in the Teradata world, you really have to build your own in this world, for now,” Vollmer said. “What we would do in a BI/EDW space [with] point and click tools don’t really translate to what we want to do in open source Hadoop space.”
Instead of rushing into something new and getting tripped up by “shiny object syndrome,” the Target team really tried to “ask the hard questions” and anticipate potential problems, especially in the areas of change tracking, testing, monitoring, alerting, security, and availability, he said.
And whatever you do, don’t overlook data governance, Vollmer warned. “Make sure you have a strategy,” he said. “There’s at least 45 different ways to bring that same data set in. You probably want one.….Make sure you’re prescribing what people should do and how that will work with other things that you want within your framework, such as your metadata set.” Target is participating in the Apache Atlas project to improve metadata management on Hadoop.
But above all, Vollmer advised, if you seek to be agile, you must communicate. “Publish and communicate what you’re doing,” he said. “We kind of kept it internal at first and suffered as a result of that. When we started demonstrating what we’re doing every two weeks, people got it, they understood. And having a showcase is a good morale booster. “
Teradata kicked off the show by making two sizable Hadoop-related announcements. First it launched a new product, called Listener, designed to route Internet of Things (IoT) data into Hadoop. Secondly, it announced it’s running its Aster big data discovery tool right in Hadoop.
“Obviously, we’re embracing Hadoop in our ecosystem,” said Chris Twogood, Teradata’s vice president of product and services marketing, in an interview with Datanami on Wednesday. “We think it makes a lot of sense and a lot of our customers are deploying it.”
However, Hadoop won’t be a good fit for everything a customer might want to do with it, particularly when it comes to operational analytics, Twogood said. “We try to advise them as to what are the issues you’re going to get into,” he said. “Most of the time, it has more to do with getting consistent SLAs on your queries. You’re not going to get it in Hadoop.”
Teradata Labs president Oliver Ratzesberger has some strong opinions on the matter. Before joining Teradata several years ago, Ratzesberger worked at EBay, where he helped design analytic applications that ran on Teradata EDWs and Hadoop clusters.
According to Ratzesberger , Hadoop’s supposed cost and scalability advantages vanish when you start peeling back the layers and seeing what you’ve really got. “We ran Teradata 1000 series systems out there [at EBay] larger than the largest Hadoop cluster you can build that hold five to 10 times the data,” he said. “It sounds great that you can put thousands of nodes together, but at 2,000 nodes, it’s at the end of the road.”
Ratzesberger said EBay runs 90 percent of its analytics on the 1000 series appliances, while using the Hadoop cluster for more specialized algorithms, such as image and text-to-speech processing. The 1000 series is Teradata’s low-end data warehouse appliance that uses large and relatively slow hard drives, compared to the smaller but faster hard drives and SSDs that you find in the flagship 6000 series appliances.
At the end of the day, Teradata’s strategy for Hadoop and similar technologies involves making open source software more palatable to its customers. You saw that theme with Listener, which is based entirely on open source software, as well as the work it’s doing around Presto, the SQL-on-Hadoop offering developed by Facebook that Teradata is adopting.
“Our strategy in a lot of ways is to do the fit and finish around the open source stuff,” Twogood said. “Don’t get me wrong. Open source software is hot. But not every customer in the world wants to go code their own stuff or integrate all this stuff together. We are a technology company, but fundamentally we help customers solve problem with data. A lot of people think about us as just a database technology. That’s how we do it, but what we do is help them solve problems.”