January 30, 2015

How NoSQL Drives Analytic Agility at Nielsen

Alex Woodie

For many years, Nielsen used a standard relational database to power the big data analytic offerings used by thousands of its customers around the world. But the heaviness of the infrastructure dragged on Nielsen, until finally it switched to a NoSQL database that was better suited to the task.

Most people know Nielsen best for its television rating system, and the so-called “Nielsen Families” who have agreed to be part of a massive ongoing TV survey. But in addition to gauging TV (and radio, online, and billboard) viewership, the company also collects, aggregates, and disburses information about the goods that people buy.

Major consumer processed goods (CPG) manufactures like Kraft and Proctor & Gamble and retailers like Safeway use the Answers on Demand component of Nielsen’s Global Buy program to figure out what products people are buying. This information is critical in determining not only what products to sell, but how to price and market them too.

The Answers on Demand program is fueled by three primary sources of data, including:

Aggregated and anonymized point of sale (POS) data that Nielsen collects from nearly every major retailer in the world;
Data from retailer’s loyalty card programs;
And panel data from a group of volunteer customers who scan every UPC barcode on every product they buy.

Obviously, there is a ton of data involved in Answers on Demand. During any given week, this system handles billions of POS records, hundreds of millions of loyalty card records, and millions of UPC scans. Nielsen’s ability to efficiently manage and present this data to customer is a big factor in this program’s success, as well as the success of its customers.

But Nielsen’s effort to serve insights efficiently was increasingly being hindered by an aged infrastructure. The system has many moving parts, including Netezza data warehouses from IBM and a TIBCO messaging infrastructure. But the layers that were causing the most problems were the relational database from Oracle, a bulky J2EE code layer, and unwieldy XML transformations. In the fall of 2013, the company shifted to a more modern and nimble data stack based on NoSQL, JavaScript, and JSON, which kept Nielsen’s program nimble and performant.

Stretching the Limits of Relational

Darrell Pratt, former principal architect at Nielsen, explained the efforts to simplify the Answers on Demand architecture during a Couchbase conference held in September 2013. During the event, Pratt (who now works at Cars.com) says one of the biggest problems was the way the relational database stored data.

“The relational data model overload was one of our big issues,” Pratt said. “Our database, for something you would think is something fairly easy, was 12 tables to define a report. How do you put it back to together and get performance out of that?”

Former Nielsen architecture lead Darrell Pratt during his Couchbase presentation in September 2013

Former Nielsen principal architect Darrell Pratt during his Couchbase presentation in September 2013

Besides the complex data schema, a heavy reliance on data transformations was also hurting Nielsen. While the company would store the data in the Oracle database as XML-based CLOBS, everything is JSON from the end-user’s point of view. Nielsen gives its customers extensive power to customize their reports, and all those configurations and filtering criteria are stored in native JSON documents.

“We really wanted to get out of the business of all those data transformations,” he said. “We’re serving up JSON. Why are we breaking it all apart into tables and different columns? It just doesn’t sense.”

The product data changes constantly for Nielsen, and keeping on top of those changes was becoming a challenge. For example, if one little characteristic of one product changed (such as changing a bottle size from 11 oz. to 13 oz.), the change could impact tens of millions of reports. To deal with it, Nielsen would kick off full table scans to ensure that the change was fully implemented (or mark the data as suspect so customers could avoid it).

“The complexity of those objects and how much they change causes so much churn,” he sid. “You’re re-compile everything, you’re re-generating everything with little changes to the database. Over time this gets to be way too much of a hassle.”

Running those full table scans was a real chore, especially considering that Pratt’s team typically performed them during system downtime on the weekend. “We kept joking that we have to come up with a new day,” he said. “Saturday and Sunday aren’t generally enough.”

A New Data Architecture

Adding another day to the weekend probably wasn’t going to fly with Nielsen’s CIO, so Pratt looked to other solutions—namely, re-architecting the database layer. The company selected Couchbase Server, a distributed document-based NoSQL database, to replace the Oracle layer. Couchbase’s native support for JSON and flexible data schema meant that Nielsen could get rid of much of the data transformations, and respond much more quickly to changing data and report attributes.

The fact that Nielsen is dealing with JSON natively in Couchbase is a huge efficiency boost. “We’re not doing all the transformations,” Pratt said. “We’re not building up Java objects here and then marshalling them out to JSON and sending them over the wire. We’re getting rid of stuff there, which is very important.”

Leaving the data in its native format, and having all the user-facing applications call the data in Couchbase and then put it back, has simplified the workflow for Nielsen. “XML to me is horrible. So we’re trying to get very far away from that, by pushing JSON further and further down the stack,” he said.

The new Couchbase-based system runs 50 percent faster than the old Oracle-based system, according to Arvind Jade, the current architecture lead at Nielsen, who was quoted in a recent Baseline Magazine article. “By moving the metadata to Couchbase, we were able to dramatically improve the efficiency of the system and speed data delivery,” Jade said today in a Couchbase press release. “We are able to query against the index and target specific documents, something we were not able to do previously.”

Related Items:

9 Places to Get Big Data Now

Couchbase Eyes IoT with Mobile NoSQL Database

Couchbase Doesn’t Take $60M Round Lying Down

Applications: Data Mining, Enterprise Analytics

Technologies: Cloud, Middleware

Sectors: Manufacturing, Retail

Vendors: Couchbase, IBM, Nielsen, Oracle, TIBCO

Tags: Couchbase, NoSQL

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

How NoSQL Drives Analytic Agility at Nielsen

A New Data Architecture

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

How NoSQL Drives Analytic Agility at Nielsen

A New Data Architecture

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link