May 8, 2015

How Hadoop Solved BT’s Data Velocity Problem

Alex Woodie

Like most large corporations with millions of customers, BT (British Telecom) has an extensive collection of databases, and is constantly moving data in and out of them. But when data growth maxed out a critical ETL server, it found a solution in a distributed Hadoop system.

With £18 billion (about $30 billion) in revenue last year, BT is one of the largest telecommunications provider in the world, and serves more than 18 million consumers and nearly 3 million businesses in the UK. Its £12.5-billion acquisition of the mobile phone network EE Limited will expand its reach even further.

Several years back, BT ran into a little snag with its data transformation processes for its business customers (the consumer data is managed separately). Data about new UK businesses would flow in from Dun & Bradstreet, and BT would dole it out to the requisite line-of-business systems accordingly.

“Being an old company, we’ve managed to spread our customer data over 12 different significant customer databases,” BT’s chief data architect Phillip Radley said during this week’s Strata + Hadoop World conference in London. “So every night we had a massive ETL job, a billion records that have to be compared and contrasted, to reconcile the updates and make sure that all of the systems are up to date.”

BT chief data architect Phillip Radley

The batch system worked well enough, but eventually the batch window maxed out. In other words, it took 24 hours to process 24 hour’s worth of data, Radley said. Since the process was running on an old ETL software package on an old server in an old data center that BT wanted to close, it started looking to phase out the old stuff with newer replacements.

Initially, BT considered adding a new relational system to take over the ETL work. Then Radley had another idea. “I sat down with some colleagues, and they said, ‘You know what, that thing would work on Hadoop. It’s basically a data velocity problem. We need to process that data faster and increase the volume,'” he said.

BT contracted with Cloudera to put a production Hadoop cluster together, and replace the batch ETL application with Java-based MapReduce routines. “We went from PowerPoint to production in nine months–pretty good by our standards,” Radley said. “And the net result is we now achieved a velocity increase of a factor of 15. So the data cycle now takes eight hours and we process five times the amount of data.”

What’s more, BT implemented its 50-node CDH cluster in its Sheffield data center without hiring a bunch of “Hadoopy” experts, Radley said. “We took some of the world’s best Linux admins and upskilled them to run Hadoop,” he added. “They thought that was great. They got new skills. They’re very happy with that.”

It all seemed well and good, but BT’s CFO was a skeptic who wouldn’t be won over easily. “Our CFO said ‘Big data is big hype. We don’t believe you. Don’t keep coming back in here unless you deliver a return on investment,'” Radley said. So after tabulating costs and benefits, Radley figures his one-year return on investment is in the 200 percent to 250 percent range.

Hadoop may carry a bit of hype, but if it can deliver the goods at the end of the day, who cares? “The business sponsor doesn’t know it runs on Hadoop, and frankly they don’t care,” Radley says. “All they know is now they’re working with today’s data rather than yesterday’s data. And also we’re saving them a lot of money. Putting it on Hadoop was much cheaper than doing it with a [standard] system.”

As the saying goes, no good deed goes unpunished, and soon BT was calling upon its Hadoop cluster to do new things. The first of those is tracing network connections in support of BT’s efforts to sell fast Internet connections (from 40 Mbps to 300+ Mbps). As Radley explained, determining whether a given customer in a given part of the UK has the copper wiring necessary to support the delivery of super-fast Internet is no simple task.

“Whether you can get that service is very dependent on how long the copper is, how good the copper is, how many joints there are in the copper, and it means we have a lot of very detailed network analyses,” Radley said. “You can imagine 24 million homes, individual wire segments connecting each of them together. That’s a lot of data to compare and contrast. You have to get all the inventory and all the performance and test data, and join that up and figure it out.”

The company–which is working with Hive and has some Spark stuff going too–has similar plans to use Hadoop in support of maintaining quality of service (QOS) for its TV service. What’s more, it’s also counting on Hadoop to assist in the roll-out of its “4G In the Home” initiative, in which BT will essentially place shrunken cell towers directly in customers’ homes.

“That means, instead of managing 70,000 to 80,000 base stations around the UK, we have to manage 5 million tiny base stations, one in every home. That’s an awful lot of telemetry data,” Radley said. “Previously it wouldn’t have been cost effective for us to do that efficiently. But with something like Hadoop, we can [handle] all that telemetry, hundreds of measures every 10 minutes from millions of devices.”

Why Big Data Isn’t Changing Everything (At Least Not Yet)

Does Hadoop Need a Reality Check?

Applications: Complex Event Processing, Enterprise Analytics

Technologies: Frameworks, Middleware, Storage

Sectors: Retail

Tags: British Telecom, bt, ETL, Hadoop, mapreduce

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

How Hadoop Solved BT’s Data Velocity Problem

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

How Hadoop Solved BT’s Data Velocity Problem

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link