Follow Datanami:
January 6, 2015

The Hadoop Behind Autodesk’s Cloud Ambitions

Autodesk is best known for AutoCAD, the powerful tool that architects and engineers use to design stuff on beefy workstations and PCs. But like most enterprise software companies, the cloud is beckoning for Autodesk in 2015. Thanks to a big data pipeline anchored by Hadoop, the company has a deeper understanding of customer activity, which positions it well for the cloud era.

The transition to a subscription-based business model is well underway at Autodesk. While sales of traditional software licenses for AutoCAD remain the big revenue generator for the San Rafael, California-based company, executives are banking on the cloud to drive sales growth. The shift to cloud is ahead of expectations, judging from the $2.2-billion company’s recent quarterly results.

For Autodesk, adopting a cloud business model means a more constant and predictable revenue stream, which is the dream of all enterprise software CFOs. But it also means a new source of data to understand how customers are using their products, and that’s where Hadoop comes in.

With on-premise software like the traditional AutoCAD offering, getting product-usage data was a hit-or-miss affair, says Charlie Crocker, the business analytics program lead for Autodesk. “We don’t necessarily get a lot of information out about that user,” Crocker says. “Some people send us the information if there’s a crash or if they’re willing to share information about their usage, but it’s not ubiquitous.”

That began changing when Autodesk started hosting AutoCAD and offering other products, such as the Autodesk A360 offering, as a subscription accessible over the Web and from mobile devices. “When people are running things on your server, you see every click. You see how every activity is performing,” Crocker says. “You can see the relationships between people who are using multiple products.”

A Pipeline for Big Data

Autodesk is harnessing that new source of product-usage data to help maximize revenues for itself and improve the experience for the user. Crocker and his team of data scientists in San Francisco are tasked with building a Hadoop-based big data pipeline that extracts valuable information from all the unstructured data.

That pipeline is based on a core technological underpinning tautodesk_cloudhat consists of Amazon‘s Elastic MapReduce distribution of Hadoop, Kafka, Splunk, Google Big Query, and Amazon Redshift. On top of that, Autodesk licenses tools from Trifacta, Tableau, and Qliktech to help cleanse and analyze data. It’s also exploring the benefits that offerings from Platfora, Paxata, and Datameer can provide. All of these tools help solve a piece of the big data puzzle.

The goal of Crocker’s big data pipeline is to enable non-technical personnel within Autodesk’s sales, marketing, and product development teams to detect patterns and glean insights from massive amounts of  data, without requiring low-level programming expertise.

“What we can do with our Hadoop environment is gather all that highly unstructured information and pull it into a common environment, and then we can start to generate value,” Crocker says. “The initial value right now has a lot more to do with monitoring and reporting as well as generating alerts if certain thresholds are not met…But we are starting to do work where we’re building out models against that data set to start doing predictive analysis.”

Mining for Data Gold

Autodesk’s big data pipeline is not engaging directly with end users at the moment. Instead, it’s used by more than 100 analysts within the organization. Autodesk is not strict about what tools analysts use, instead preferring to allow people to work with the tools they’re most comfortable with. Getting all of the unstructured data from clickstreams and log files into a shape that those analysts can comfortably and reliably work is not a simple task.

One of Crocker’s biggest challenges is transforming unstructured data into information that’s accessible and usable by the analysts in a self-service manner, while at the same time ensuring consistency of the metadata. The aforementioned tools are critical in helping to hammer the big unruly data down into something usable. The Trifacta offering, in particular, has proven quite good for doing the first pass on the big data and eliminating duplicates. Crocker is exploring some of the other tools to see how they can accelerate development and boost productivity.

charlie crocker

Charlie Crocker, the business analytics program lead for Autodesk

All told, the company is pumping upwards of 20 million hits from the cloud product site into Hadoop, via the Splunk-to-Hadoop connector. From there, analysts use Trifacta (or Hive or Pig scripts) to aggregate mostly JSON data down into the 50,000-file size range. Once aggregated, the data is then pumped out to several dozen Tableau views, although some analysts prefer QlikView.

Detecting churn signals is potentially one of the most valuable uses for the new Hadoop-based system. This classic Hadoop use case would not be possible in a traditional on-premise software environment, owing to the lack of insight into product usage, but it fits right into a cloud subscription model. Crocker is exploring what actions by a customer signal that he’s likely to churn, and then what actions Autodesk can take to help prevent them from letting their subscription expire.

“There are a lot of ways you can look at engagement and start to understand whether somebody is likely to move on or not,” he says. “You can detect patterns in the usage data, and also look at how active that person is in the forums and what their relationship to the broader community.”

A Platform for Growth

Crocker is about 1.5 years into the big data project, and already it’s paying dividends in terms of what the company knows about its cloud customers. It’s helping the company to provide better service and value to our customers in a multi-platform environment, he says.

“Our customers are all multi-platform. They have a desktop, laptop, tablet, and phone, and people expect now that all of their products are going to work well together across all of those platforms,” Crocker says. “Hadoop is one of the tools that allows us to understand how people are working across the platforms and our products, and how we can continually make that experience better.”

While there are no current plans to track consumer activity at an individual level, Crocker hopes to someday present more detailed info about user activity. To that end, Autodesk could generate a report that track a day in the life of an aggregated unknown user.

“Just because we think product X and product Y should be used together, that doesn’t mean that’s where people are finding the most benefit,” Crocker says. “I hope to be able to use this system to dig out those nuggets and find those surprise value propositions that weren’t necessarily obvious without the data.”

Related Items:

Mining for YouTube Gold with Hadoop and Friends

How Big Data Is Remaking Customer Loyalty Programs

Congratulations Hadoop, You Made It–Now Disappear