October 20, 2014

Trevor Mason and Big Data: Doing What Comes Naturally

Sponsored by MapR Technologies

In conjunction with MapR, Datanami presents IRI with this month’s “Big Data All Star” award.

For Trevor Mason, working with Big Data just came naturally.

Mason is the vice president for Technology Research at IRI, a 30 year old Chicago-based company that provides information, analytics, business intelligence and domain expertise for the world’s leading CPG, retail and healthcare companies.

“I’ve always had a love of mathematics and proved to be a natural when it came to computer science,” Mason says. “So I combined both disciplines and it has been my interest ever since. I joined IRI 20 years ago to work with Big Data (although it wasn’t called that back then). Today I head up a group that is responsible for forward looking research into tools and systems for processing, analyzing and managing massive amounts of data. Our mission is two-fold: keep technology costs as low as possible while providing our clients with the state-of-the-art analytic and intelligence tools they need to drive their insights.”

Big Data Challenges

Recent challenges facing Mason and his team included a mix of business and technological issues. They were attempting to realize significant cost reductions by reducing mainframe load, and continue to reduce mainframe support risk that is increasing due to the imminent retirement of key mainframe support personnel. At the same time, they wanted to build the foundations for a more cost effective, flexible and expandable data processing and storage environment.

The technical problem was equally challenging. The team wanted to achieve random extraction rates averaging 600,000 records per second, peaking to over one million records per second from a 15 TB fact table. This table feeds a large multi-TB downstream client-facing reporting farm. Given IRI’s emphasis on economy, the solution had to be very efficient, using only 16 to 24 nodes.

“We looked at traditional warehouse technologies, but Hadoop was by far the most cost effective solution,” Mason says. “Within Hadoop we investigated all the main distributions and various hardware options before settling on MapR on a Cisco UCS (Unified Computing System) cluster.”

The fact table resides on the mainframe where it is updated and maintained daily. These functions are very complex and proved costly to migrate to the cluster. However, the extraction process, which represents the majority of the current mainframe load, is relatively simple, Mason says.

“The solution was to keep the update and maintenance processes on the mainframe and maintain a synchronized copy on the Hadoop cluster by using our mainframe change logging process,” he notes. “All extraction processes go against the Hadoop cluster, significantly reducing the mainframe load. This met our objective of maximum performance with minimal new development.”

The team chose MapR to maximize file system performance, facilitate the use of a large number of smaller files, and take full advantage of its NFS capability so files could be sent via FTP from the mainframe directly to the cluster.

Shaking up the System

They also gave their system a real workout. Recalls Mason, “To maximize efficiency we had to see how far we could push the hardware and software before it broke. After several months of pushing the system to its limits, we weeded out several issues, including a bad disk, a bad node, and incorrect OS, network and driver settings. We worked closely with our vendors to root out and correct these issues.”

Overall, he says, the development took about six months followed by two months of final testing and running in parallel with the regular production processes. He also stressed that “Much kudos go to the IRI engineering team and Zaloni consulting team who worked together to implemented all the minute details needed to create the current fully functional production system in only six months.”

To accomplish their ambitious goals, the team took some unique approaches. For instance, the methods they used to organize the data and structure the extraction process allowed them to achieve between two million and three million records per second extraction rates on a 16 node cluster.

They also developed a way to always have a consistent view of the data used in the extraction process while continuously updating it.

By far one of the most effective additions to the IRI IT infrastructure was the implementation of Hadoop. Before Hadoop the technology team relied on the mainframe running 24×7 to process the data in accordance with their customers’ tight timelines. With Hadoop, they have been able to speed up the process while reducing mainframe load. The result: annual savings of more than $1.5 million.

Says Mason, “Hadoop is not only saving us money, it also provides a flexible platform that can easily scale to meet future corporate growth. We can do a lot more in terms of offering our customers unique analytic insights – the Hadoop platform and all its supporting tools allow us to work with large datasets in a highly parallel manner.

“IRI specialized in Big Data before the term became popular – this is not new to us,” he concludes. “Big Data has been our business now for more than 30 years. Our objective is to continue to find ways to collect, process and manage Big Data efficiently so we can provide our clients with leading insights to drive their business growth.”

And finally, when asked what advice he might have for others who would like to become Big Data All Stars, Mason is very clear: “Find and implement efficient and innovative ways to solve critical Big Data processing and management problems that result in tangible value to the company.”

Vendors: MapR

Tags: big data all stars, IRI, mapr

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Trevor Mason and Big Data: Doing What Comes Naturally

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Trevor Mason and Big Data: Doing What Comes Naturally

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link