Follow Datanami:
June 15, 2015

How TrueCar Uses Hadoop to Deliver Price Transparency

If you’re in the market for a new car, you might be using the car pricing service from TrueCar to figure out how much you should pay. With just a few clicks on a mobile app, you can pull find what a car is worth on the company’s mobile app. It all seems very simple, but behind the scenes at TrueCar is a sophisticated big data system powered by Hadoop.

TrueCar was founded about 10 years ago with a premise that price transparency is good for everybody in the car business. It’s obviously good for the car buyers, who have historically been on the short end of the information stick in deals for cars, which is typically the second biggest purchase consumers make (after home purchase).

But TrueCar also realized that price transparency can be good for car dealers too. Instead of spending time and money on marketing and other activities that don’t give value to consumers at the end of the day, TrueCar recognized that dealers have an incentive to make the car-shopping process as easy and as straightforward as possible. When people are less hesitant to make decisions because they’re afraid of being taken advantage of, then more transactions will take place, the company theorized.

While the premise is simple, actually delivering on the promise of pricing transparency in the automotive business is hard. That’s because it requires manipulating data–a lot of data–and doing it quickly enough and accurately enough to ensure that it’s relevant and actionable to all parties in a vehicle transactions.

Russell Foltz-Smith, the general manager and senior vice president of data products at TrueCar, shared his experiences in building the system that delivers pricing transparency during last week’s Hadoop Summit conference in San Jose, California. As Foltz-Smith recalls, his colleagues questioned his desire to invest in Hadoop by attending the Hadoop Summit in 2012.

“I distinctly remember the chuckles I got when I put in the budget to have everybody fly up here and learn all this Hadoop stuff,” he told the audience during his keynote. “They said ‘Russ, who are you kidding? We’re only using about 20 TB of data. We can save that into the different data warehouses we’re using.'”

Russell Foltz-Smith, the general manager and senior vice president of data products at TrueCar,

Russell Foltz-Smith, the general manager and senior vice president of data products at TrueCar.

TrueCar had been running its various systems on a collection of five different data warehouses and more than 200 distinct databases, Foltz-Smith said. It worked well enough, but the technology executive could foresee a day when this new data-centric way of computing would become necessary. “You don’t see where this is going,” he told his colleagues. “The data is going to demand that we think completely differently around how we collect, analyze, and distribute data.”

At Foltz-Smith’s urging, TrueCar jumped into Hadoop with both feet. The executive, who hates proof of concepts (“Pick a real problem. Do not do POCs.”) got the okay to invest in a 2 PB cluster, licensed the Hadoop distribution from Hortonworks, and they were off and running.

Twenty-four training sessions and several hires later, the company successfully migrated its core Vehicle Intelligence System over to Hadoop.

A Real-Time Vehicle

The VIS is critical to TrueCar’s mission of delivering data transparency. The system is the central hub that stores raw data coming in from thousands of sources; cleans and transforms the data; processes it according to business rules; and distributes it out to consumer-facing applications–or pretty much anybody else who wants it, per its transparency precepts.

All sorts of  data goes into the VIS, not just new and used car prices obtained from car makers, dealers, third-party data providers, and consumers (TrueCar allows users to self-report what they paid). In addition to pricing, the VIS stores lists of features and fees associated with cars. Each state treats its fees differently, so every record must be carefully deconstructed before it can be put back together.

Unstructured data is also critical, including detecting car-buying trends that could show up in the clickstreams (the company uses Apache Spark for machine learning and data manipulation). This information is important to determining the “true value” of the car on the market. It’s also critical to have a picture of the car, Foltz-Smith says. “If there’s no vehicle image, the car doesn’t exist,” he said.

Truecar_1Before moving the VIS to Hadoop, TrueCar was able to update its car information once per day. That was sufficient for giving customers some idea of the market. But since moving to Hadoop, TrueCar is now able to update its database of 200 million cars every 30 minutes., which is just about as close to real-time as you can get in the car business.

Today, TrueCar is processing 65 billion pieces of data every day from more than 12,000 data sources. Thanks to Hadoop, it’s able to price 200 billion separate vehicle configurations every day. The amount of data stored on Hadoop has grown by factor of 24 over the past 12 months, Foltz-Smith said, while its inventory processing has been sped up by a factor of 72.

At 23 cents per gigabyte, Hadoop has succeeded in delivering cost-effective storage platform. But it’s much more for TrueCar, as Foltz-Smith explained.

Automotive Brain

The growth figures are impressive, but they don’t necessary tell the whole story of how Hadoop will impact the company. According to Foltz-Smith, Hadoop gives TrueCar a platform not only for amassing a huge database of car-related information data, but for building intelligent systems that allow users to act on that data.

“The idea is to be the brain of the industry,” he said. “Our goal is acquire everything—literally every piece of f data in the industry and synthesize it in within 15 minutes…. That could be a vehicle, a consumer, a loan, a lease, or an insurance policy. Identification is super important. You can’t be wrong in automotive. The transaction is way too complicated.”


Spark-based machine learning plays a role in TrueCar’s plans for predictive and prescriptive analytics

Once TrueCar has acquired and synthesized the data, it’s goal is to serve it back up in an easily digestible way. The company is banking on machine learning technologies built with Spark and search capabilities from ElasticSearch to deliver a dynamic end-user experience that changes with the user.

“We are moving to what I would consider a contextually aware intelligent search engine,” Foltz-Smith said. “There will not time for you to perfectly create a linear experience and linear data set that will deliver perfectly for every person. Instead you kind of have to open it up and let people search through the data and forage for what they need….You also need to be able to learn from that foraging and push contextually relevant information to people at the time they think they need it.”

TrueCar is at the cutting edge of data processing in the automotive industry. By betting the company’s future on Hadoop, Foltz-Smith realized that he asked the company to take a chance. (while the technology is solid, recruiting talented people was more difficult than he anticipated, Folts-Smith said).

But the way he sees it, the payoff from getting it right and helping to change the industry is worth the risk. “It’s important in the auto industry to lead the way even if it’s super risky,” he said. “I don’t have a budget that I sit there and worry about. What I worry about is the speed at which I can drive features.”

Related Items:

Report: Big Data Will Represent Billions in Automotive

The Big Data Behind Shell’s Super-Efficient Car Races

AutoTrader Zooms Ahead with Automated ETL