Follow Datanami:
April 15, 2016

How Intuit Personalizes TurboTax Experiences with Big Data

If you’re like millions of Americans who use TurboTax, you know how the program makes preparing your individual tax return easy—at least compared to actually reading and understanding the 70,000-page tax code. What you may not know is that your particular experience with TurboTax is unique, and that experience is driven by how Intuit crunches big data.

Income taxes aren’t due until April 18, thanks to the Federal Government’s wise decision not to let taxes ruin a perfectly good Friday. If everything goes as planned, Intuit will process billions of transactions over the weekend as people scramble to complete their income tax returns prior to Monday’s deadline.

Each of those transactions represents the culmination of peoples’ interaction with TurboTax, which Intuit tracks and analyzes using a big data storage and analytics system. Every time a TurboTax user clicks a field with a mouse, uploads a photo of a W-2, or opens a drop-down box on a touch-screen device, that information is collected and logged for future reference.

In addition to clickstream data, Intuit (NASDAQ: INTU) collects users financial data and analyses it in a big data stack that involves multiple layers. An ultra-scalable Hadoop cluster is used to efficiently ingest, store, and perform the first level of refinement upon huge volumes of clickstreams.

The aggregated data is then passed into a large data warehouse running the massively parallel processing (MPP) Vertica database software. This is where hundreds of analysts, data scientists, and other Intuit employees use tools like Tableau (NYSE: DATA) to extract insights and power dashboards showing how TurboTax customers are using the software.

Real-Time Learning

Intuit tracks all interactions with its TurboTax products, no matter if customers are using the smart phone version, the Web-based software as a service (SaaS) version, or downloadable version for Windows or Mac OS.


Intuit now offers a smart-phone version of TurboTax

Vertica is critical to understanding how TurboTax users interact with the product, says Jeff Healey, the director of product marketing for the Vertica business at Hewlett-Packard Enterprise (NYSE: HPE).

“They’re taking these analytical insights and working closely with engineering teams to customize the software on an individual level as quickly as possible,” Healey tells Datanami. “They’re constantly taking that information and saying ‘How can we streamline these processes?'”

Sometimes, that means identifying potential problem areas. For example, if a large number of TurboTax uses who breeze through most of the program suddenly get stuck at a particular point, that anomaly will bubble up through the analysis, and after looking at the situation, the Intuit team will implement changes to streamline how the program works.

Vertica lets Intuit create fine-grain analyses based on particular demographics of the TurboTax users, Healey says. “They can understand if the path from step 1 to step 30 is the most clear path for, say, a male in this region who has three children and was divorced and switched jobs, compared to, say, a female with no children who lives in New York City made over $200,000 last year and made a lot of stock transactions.”

In addition to spotting potential trouble spots that keep tripping up TurboTax users, the analytic tools are useful in making the whole tax-preparation process just go a lot faster, according to Joel Minton, the director of data science and engineering for TurboTax, who discussed some of these capabilities last November in an interview with Dana Gardner, the principal analyst at Interarbor Solutions.

“[A]s a customer goes though our application, they may ask us a question about a certain tax situation,” Minton tells Gardner in a Q&A. “When they ask that question, we know a lot more later on down the line about whether that specific issue is causing them grief. If we can bring all of those data sets together so that we know that they asked the question three screens back, and then they’re spending a more time on a later screen, we can try to make that experience better, especially in the context of those specific questions that they have.”

Intuit also uses the insights from Vertica and Tableau to feed machine learning models that continually seek to optimize the experience of TurboTax users, no matter what devices they’re using to access the software. For example, the software can help TurboTax users to decide whether to itemize their deductions or not, based on their specific situation. All told, this particular feature is said to save about 2 million hours of tax preparation time each year.

“We have millions of customers who have slightly different needs based on their unique situations,” Minton tells Gardner. “What we do is try to give them a unique experience that closely matches their background and preferences, and we try to use all of that information that we have to create a streamlined interaction where they can feel like the experience itself is tailored for them.”

A Need for Speed

Intuit adopted Vertica several years ago when its old data warehouse system was unable to meet the performance and concurrency demands that its TurboTax business was creating, especially this time of year. Most of Intuit’s business around TurboTax occurs during the 10-day window prior to tax day, but the old solution wasn’t up to snuff.

Last year, Inuit expanded its on-premise Vertica cluster from 14 nodes to 40 nodes composed of Dell PowerEdge R620, according to an August 2015 story in After some initial setup problems, including a misconfigured BIOS setting, the Vertica cluster has been chewing through data at an impressive clip.

vertica_logo_1“We moved away from our previous vendor that had some concurrency problems and we moved to HPE Vertica, because it does handle concurrency much better, handles workload management much better, and it allows us to pull all this data,” Minton tells Gardner.

Intuit has grown its data mart by 400 percent compared to the previous data warehousing solution, while shrinking query times by about 40 percent. In 2015, the company had nearly 200 active users, who were submitting up to 65,000 queries per day against the Vertica cluster.

The Vertica system is “head and shoulders over what we had previously,” Minton tells Gardner in the Q&A. “Mostly that’s because during those peak times, when we’re running a lot of traffic through our systems, it’s very easy for all the users to hit the platform at the same time, instead of nobody getting any work done because of the concurrency issues.”

During all of 2015, Vertica helped Intuit generate more than 2,000 individual insights out of their TurboTax data sets, according to Healey. With much more data going through the big data system, and more TurboTax users than ever, there’s an opportunity for Intuit to deliver to obtain even more insights from TurboTax users.

“During this time of year, when you have billions of transactions, they can’t have any blips,” Healey tells Datanami. “Because they’ve eliminated speed and performance problems…they can ask anything they want. And they do.”

Related Items:

Data Warehouse Market Ripe for Disruption, Gartner Says

HPE Doubles Down on Analytics, Machine Learning