Neustar Finds the ‘Last Mile’ for Hadoop Analytics
Like many companies in the marketing analytics business, Neustar uses Hadoop to store and process large amounts of highly granular data on behalf of its clients. In fact, the company has used Hadoop for over a decade, a success story in its own right. However, despite the cost advantages the platform provided, Neustar struggled to efficiently turn that big data into insights — until it found a solution from Arcadia Data that bridged the “last mile” from the data lake to customers.
Neustar is a privately held $900-million company that was spun out of Lockheed Martin in 1998 to provide information services to clients across several industries, including telecom providers and entertainment companies. It claims to be the most authoritative identity provider in the U.S., and also serves as the domain name registry for several Internet domains, including .biz, .us, and others.
Neustar also provides marketing analytics services to major global corporations, such as Ford, Macy’s, and AT&T. About 70 Fortune 100 firms rely on Neustar to help it connect the dots between advertising investments and the real-world impact on actual people. This division is based in part on its 2015 acquisition of MarketShare, which in turn acquired JovianDATA in 2010.
When JovianDATA was founded in 2008, it had the advantage of aligning itself with the start of a great wave of technological progress. Apache Hadoop was just beginning to turn heads with its novel approach to storing and processing large amounts of data on clusters of commodity X86 servers. (It’s something we take for granted now in 2018, but let’s not forget how big this was back then.)
For a startup like JovianDATA, utilizing new technology like Hadoop was a no-brainer. According to Satya Ramachandran, who was one of the founders of JovianDATA and currently is the vice president of engineering for Neustar, the company continues to be big believer in the Hadoop ecosystem, and is a big user of data science tools like H2O and the language Scala.
While Hadoop provided reliable storage and processing for Neustar, the company faced challenges in other parts of its analytics assembly line. Specifically, as it grew its marketing analytics business from about $6 million in revenue in 2010 to almost $40 million by the end of 2015, it struggled to efficiently convert ever-bigger sets of advertising data into actionable insight for its clients.
Ramachandran explains the situation:
“The challenge was, to create these reports, we had to hire an army of application engineers and then we also had to have a bunch of database administrators who tuned them,” he tells Datanami during an interview at Hortonworks‘ DataWorks Summit last month. “This was getting expensive. It would take us two to three weeks to build a report, and then on an ongoing basis we needed to manage the performance as new data comes in.”
For a couple of years, the company managed to get by with the largely manual method, which relied heavily on batch-oriented MapReduce and Hive jobs on the Hadoop cluster, as well as an external Oracle database and the BI/visualization tool from Tableau.
However, each new client to Neustar brought with it 30 to 40 TB in new data, plus a custom data schema to master. The amount of data stored on behalf of customers had grown more than 100x since the outfit’s founding, and was approaching an exponential growth curve, Ramachandran says. Eventually, by about 2012, the scale problem grew to the point where it could no longer be ignored.
The company sought alternate solutions. Frustrated with the need to extract data from of its Hadoop cluster (Neustar was using Elastic MapReduce from Amazon Web Services), Ramachandran instead sought a “native” Hadoop solution that could eliminate data movement.
A friend of Ramachandran’s suggested that he look into the BI offering from Arcadia Data. The San Mateo, California company pairs a Hadoop-native OLAP “cube” of pre-aggregated data with a visualization/BI layer that’s designed to query the data very quickly without moving any data off the cluster.
Ramachandran decided to give Arcadia Data a try. Neustar signed up for a three-month proof of concept for its flagship product, Arcadia Enterprise, and the initial results were positive. “We were able to get the report up and running in a few days,” he says. “And our engineering team was not involved. That was the big eye-opener for us.”
In 2014, Neustar bit the bullet and became an Arcadia customer while Arcadia was still in stealth mode. Instead of paying an army of engineers to build the workflows and write the marketing analytics applications for its clients, the company relies on Arcadia Enterprise to calculate all the metrics that go into its clients’ marketing reports and dashboards.
Getting set up in Arcadia still requires a bit of work, including mapping the source data into the Arcadia environment. Getting Neutstar’s clients’ data initially mapped and the parameters set up took about three weeks, Ramachandran says.
But once this work was completed, Neustar was able to use this application as the basis for all of its clients’ reports, with only minor tweaks necessary for each new client. This has allowed Neustar to replace an army of engineers creating reports on a one-off basis with a dozen business analysts who assist clients with the creation of the reports and dashboards in the Arcadia environment.
The combination of these data-savvy analysts and the Arcadia software gives Neustar clients the ability to get actionable insights from custom reports and dashboards six times faster than before, according to this case study on the Arcadia Data website.
It’s all about being flexible and adaptable to customer needs, Ramachandran says. “How I’m able to create these applications and deploy them across all of these different customers seamlessly without engineering involvement is amazing,” he says.
The performance aspect of Arcadia is also good, he adds. Every day, Neustar’s clients are utilizing Arcadia Data software to track their own marketing initiatives, such as comparing media performance, analyzing conversion success, or tracking customer journeys. Thanks to the availability of anonymized data that tracks customer behavior at the individual level, Neustar’s clients are able to build highly targeted marketing campaigns that have a higher likelihood of success than the less-precise techniques of old.
Today, when Neustar signs a new client, it relies on Arcadia Enterprise to do the heavy lifting. The average Neustar customer has 150 to 200 concurrent users hitting Arcadia Enterprise running against EMR on AWS, and some customers have far more. So far, Arcadia has proven up to the task.
“That’s a big part of Arcadia: being able to run on large volumes of data,” Ramachandran says. “The other is to run with all these concurrent session, which is a big limitation in Hadoop today.”
‘Last Mile’ Problem
This approach to Hadoop-based big data analytics has worked well for Arcadia, which has raised $11.5 million in venture funding since being founded in 2012. The company has since branched out to analyzing data stored in other platforms, including Apache Kafka through its KSQL connector, but Hadoop remains a core focus.
It’s about delivering the “last mile” of connectivity into Hadoop, says Arcadia Data Vice President of Marketing Steve Wooledge. “If you think about it, Hadoop initially was a place where data engineers and data scientists did work,” he says. “The negative hype we’ve seen around Hadoop data lakes has been arguably that nobody thought about that last mile. That’s why we founded Arcadia Data, to deliver the value to the business.”
With petabytes of data under management and a mandate to deliver timely insights to drive clients’ marketing activities, Neustar is under constant pressure. By effectively outsourcing a good chunk of the technological complexity to Arcadia, it found a way to alleviate some of that pressure.
“From a reporting perspective, having Arcadia there means we don’t have to solve the problem,” Ramachandran says. “That problem is already solved by Arcadia.”