Trifacta Goes Back to the Future with Free ‘Wrangler’
Trifacta hearkened back to its roots in free software with today’s launch of Wrangler, a new data preparation tool for Windows and Mac desktops. The free tool is designed to automate much of the process of cleansing data, in preparation for analysis using Tableau and other visualization tools.
Trifacta’s technology was born out of a joint research project at Stanford University years ago to develop software that uses machine learning algorithms and an intuitive GUI to rapidly accelerate the data cleansing and preparation process, which is the most time consuming yet least productive aspect of data science.
The software that came out of that effort, called Data Wrangler, was posted to a computer science website, and within six months, it was downloaded more than 30,000 times.
The folks behind that effort–including then-Stanford Ph.D. candidate Sean Kandel and his Stanford advisor, Jeffrey Heer–teamed up with Cal Berkeley computer science professor Joe Hellerstein to found Trifacta, with the goal of bringing Data Wrangler into the burgeoning commercial market for big data software.
Since then, Trifacta has targeted the biggest data prep problems happening, which happen to be on Hadoop. The company’s eponymous product combines an easy-to-use Web browser interface with Apache Spark- and MapReduce-based data transformation routines, thereby enabling big companies to automate much of the task of transforming petabytes worth of messy and unstructured data sitting in Hadoop into something more worthwhile.
“We’ve done that successfully and have picked up significant momentum and really built a substantial installed base,” says Adam Wilson, who was hired earlier this year to lead Trifacta as its CEO. “But we’ve always felt that part of our mission is to ensure that the people who know the data best can do the work. They’re working on lots of data, not all of which is going to be in Hadoop.”
The company’s commitment to the democratizing of data wrangling led directly to the creation of Trifacta Wrangler, the free product it unveiled today at the Tableau Software conference in Las Vegas. The software enables users to prep and cleanse data residing on their local file system.
The software includes a Windows or Mac OS/X interface, which works very much like the Web browser-based interface for Trifacta’s full enterprise product. Like the full product, Wranger employs machine learning algorithms to take a first pass on the data, analyzing it for cleanliness and highlighting potential mismatches in volumes.
“We’re inferring some data types,” says Alon Bartur, Trifacta’s principal product manager. “We’re showing users how many values in a column are valid or missing or mismatched, based on those inferred types. We’re giving them a quick view of the distribution of the data–quick checks so people can understand what’s in their data, whether it’s correct, and what other issues” there may be.
Like the full product, Trifacta Wrangler gives users a preview of what their transformed data will look like, and lets users tweak and fine-tune how the transformations will actually run to meet their needs. After the data transformation routines are complete, the product returns the cleansed data to the local file system.
The biggest differences between the free Wrangler product and Trifacta’s commercial product (which carries a hefty license fee) are in scale and enterprise features. Wrangler is designed to work at desktop scale, with smaller data sets, not the multi-petabyte data sets commonly found in Hadoop. And it lacks enterprise features found in the full product, including security, collaboration, scheduling, and data lineage features.
The software supports many types of data formats, including JSON files, CSV files, and logs. One of the supported output formats includes Tablea Data Extract (TDE), which streamlines the process of getting the data into Tableau. (There are many parallels between Trifacta and Tableau, not the least of which is the fact that both emerged from Stanford Visualization Group projects.)
It was almost inevitable that Trifacta Wrangler happened, says Will Davis, Trifacta’s director of product marketing.
“There was much inbound interest to allow them to leverage Trifacta to work on the desktop that we just decided to go ahead and build it,” Davis says. “There’s so much interest in visualization, but all those tools need clean, well-prepared data to be able to work effectively. This is a product that user can interact with to prepare the raw data, then push it to Tableau for end visualization and analysis.”
It’s a coming-home of sorts for Trifacta, says CEO Wilson. “We’re really returning to our roots in terms of providing something to individual end users that allows them to wrangle data that’s on their desktop and allow them to do it for free and giving them an opportunity to see the power of Trifacta forgetting clean, structured enriched data ready for analysis,” he says.
In addition to Tableau, Trifacta Wrangler works with Excel, Qlik, Spotfire, and any number of other viz tools. To download Trifacta Wrangler, go to https://www.trifacta.com/start-wrangling/.