Follow Datanami:
March 24, 2014

How Dannon Upgraded Its Reporting Culture with Big Data Tooling

Alex Woodie

Like many large companies, Dannon is awash in data. Manufacturing and distribution data. Sales and marketing data. Internal data from SAP, and external data from third-party providers. Historically, getting all this data cleaned and prepped for analysis has taken a lot of time, effort, and money. But about a year ago, Dannon adopted an automated data transformation tool that has the potential to rewrite the company’s relationship with data.

As the North American arm of the €19-billion French food giant Group Danone, the Dannon company employs a small army of business analysts whose jobs is find actionable pieces of information from myriad data sources. These business analysts are experts in their fields, whether it’s the supply chain behind its Evian water business or the sales and marketing channels that support the organic Stonyfield Farm brand.

While the analysts are well-versed in end-user tools like Excel and QlikView, they don’t have the skills or expertise required to perform the types of advanced data management and standardization tasks that a successful business intelligence operation requires. Dannon’s IT department has traditionally supported the company’s business analysts with the necessary tools and technology to support these tasks, which involved heavy doses of ETL tooling and SQL scripts.

However, this regimented approach was beginning to take a toll on Dannon’s corporate culture, which has eschewed a formal data warehouse in favor of a lighter, more agile data manipulation strategy. In recent years, Dannon CIO Timothy Weaver has implemented an initiative to improve the interaction between business analysts and the company’s IT department.

“Our journey started on a much more fundamental level, which was we were looking to reinvent the way that IT engages with business analysts,” Weaver tells Datanami. “One of the things we needed to do was to find new technology that would enable those business analysts to self-service themselves in many of the areas they traditionally relied on IT to provide that capability.”

Weaver may have found a big part of that self-service goal in Paxata, the developer of automated data transformation tools. Paxata is one of a handful of tech outfits that is focusing on the big challenges that dirty data poses to business intelligence and analytics projects. The company uses a combination of machine learning algorithms and data visualization techniques to help analysts identify and fix anomalies in their data in a fraction of the amount of time it would normally take.

Dannon adopted Paxata for a pilot project in late 2012, and the software proved its worth in short time. The software has helped streamline Dannon’s sales reporting process, which was previously a big challenge for Dannon because the data comes from different places and there are different business models involved. In some parts of the country, Dannon sells its products directly to retailers, while in others it goes through a wholesale distributor.

“In order to get full view of what your total sales are to that retailer, you need to bring those two data sets together, and they look very different. They come from different data providers, from didn’t sources, and they’re in very different structured formats,” Weaver says. It’s not so much a big data problem as a fast data problem, he adds.

The company uses SAP Business Suite to manage internal transactions, but most of the sales data that Dannon analyzes actually comes from third-party data bureaus, including Neilson and IRI. The company generally assumes that this third-party data is cleaned and free of flaws, but that’s not always the case.

“Nobody has the time and mechanisms in place to really identify that the data isn’t clean. You build in an assumption when you’re buying data like that, that it’s already property cleaned,” Weaver says. “With Paxata…..we’re able to see very quickly where there are discrepancies we need to consider, if they’re valid discrepancies or something we need to go back to the data provider.”

Weaver knew that Paxata was saving his analysts time, but he didn’t know by exactly how much. So he decided to time it. He found it took about 40 hours under the old model to acquire, analyze, transform, and re-structure the data so it could be analyzed in QlikView. “Then we turned around and did exactly the same thing using the Paxata data preparation platform to do the data side of things, and it took about two hours of effort from beginning to end,” he says.

“One of the great things about the Paxata platform is the fact that it’s a console the user is actively using to bring together data sets in an ad-hoc manner, and they visually get that immediate indicator that something may not be right,” Weaver says. “Instead of having an IT technical person checking data and its validity, is the business analysts who actually knows the data and what it should look like.”

Dannon CIO Timothy Weaver

Since the initial pilot projects, Dannon has expanded its Paxata adoption. The White Plains, New York-based company makes Paxata available to all of its business analysts, although it doesn’t require them to use it. Instead, it offers training in the new product, in the hopes that the analysts will see some value in adopting it–and potentially open the door to a whole new level of data exploration.

“The way a lot of the analysts are working today is they have a specific question they want to answer and then they try to obtain the data sets which might help them answer that question,” Weaver says. “We’re trying to move the organization past that by saying look, if you have access to the right tools and technologies which would allow you to rapidly answer the ad hoc questions you have, this will free up the time you were previously using to answer those questions to do a lot more data exploration. Instead defining the questions up front, you can just start exploring the data to see what it tells you in its raw format.”

Paxata isn’t displacing Dannon’s QlikView or Excel tools. It’s not likely to eliminate the need for the occasional one-off SQL query, or to replace all ETL routines. Data quality should be better with Paxata, but nobody expects it to be 100 percent perfect all the time. As Weaver sees it, Paxata is both a timesaver and a stepping stone to doing bigger and better things with data.

“There’s a lot of things we don’t do because we literally don’t have the bandwidth or the time to do, because in the traditional model it would just take too long,” he says. “So we’re able to do things not only much faster than previously, but we’re able to do a lot more things.”

Related Items:

Automating the Pain Out of Big Data Transformation

Paxata Debuts Data Quality Tools at Strata