Mapping the Shape of Complex Data with Ayasdi
Machine learning has emerged as the most useful technology for analyzing big and complex data sets. But all too often, it takes a highly skilled data scientists to effectively wield machine learning tools. A company called Ayasdi is positioning a technique it calls Topological Data Analysis as a way to shortcut that machine learning skills gap. And if today’s news is any indication, it’s having tremendous success with this technique.
Ayasdi was created in 2008 when a Stanford University mathematics PhD. student named Gurjeet Singh joined his adviser Gunnar Carlsson to productize the work they’d done around Topological Data Analysis (TDA). Carlsson had been pursuing TDA since the 1970s, and in 2005 had received a $10 million grant from DARPA and the NSF to accelerate the project, upon which Ayasdi co-founder Harlan Sexton also worked.
With TDA, the researchers devised a method to use topology, or the study of shape, to extract insights from data. As an outgrowth of machine learning, TDA enables researchers to reduce big and complex data with a large number of dimensions and variables into a smaller and less complex data set with a fewer number of dimensions and variables, but without sacrificing the key topological properties.
In this regard, the proprietary TDA technology essentially gives big data practitioners a head start when it comes to extracting insight from unknown data. “You throw data into this machine and it automatically executes a large number of machine learning algorithms against the data and combines them together such that, the first time you look at the picture, you already have something to begin with,” Singh, the Ayasdi CEO, tells Datanami. “It discovers these insights in data automatically without any human intervention.”
If that sounds too good to be true, you’re not alone. The world is full of vendors selling all sorts of technology that purport to solve big data problems with a wave of a magic wand. Skepticism is the order of the day when venturing into unknown waters, which this most certainly is.
But here’s the thing: Ayasdi appears to actually do what it claims. Kleiner Perkins Caufield & Byers, the renowned venture capital firm, isn’t in the habit of wasting money on half-baked big data schemes, but it was impressed enough to lead a Series C round to the tune of $55 million, giving the company a total of $100 million in financing over the course of its lifetime.
Topological Data Analysis
The core tenet behind TDA is that every set of data (except, presumably, those generated by random character generators) has a shape. Once you figure out the shape, it’s much easier to select the appropriate algorithms to pinpoint the pattern behind the shape. “If you understand the shape of the underlying data, then you don’t have to ask all the queries,” Singh says.
For example, if data behaves linearly, a basic a regression algorithm will adequately describe what’s going on. When data appears more scattered, clustering algorithms can help find the best division of the different groups of data. It’s also common to find Cheerio-shape loops appearing in data sets; the U.S. GDP growth rate over time is an example of looping data, Singh says. Finally, some data sets, such as the tracking of lift and drag during an airplane flight, may express themselves two dimensionally as flares.
The insight gleaned from TDA helps Ayasdi to narrow the world of possibilities over what is causing the data to disperse in certain ways. Suddenly, the problem of “double exponentially worsening queries”–which Singh uses to refer to the twin problems of exponentially growing data and the exponentially growing number of possible queries–doesn’t hurt so bad. “You understand everything there is to know about your data, because it’s manifested in this shape,” he says.
Once TDA has given you a peek at the shape of the data, the Ayasdi Core software automatically picks the best machine learning algorithm to explain how the data was made. “The current set of machine learning algorithms that are commercial available are only able to explore a very small subset of these shapes,” Singh says. “What we have developed at Ayasdi is the ability to quickly access a large number of algorithms, and select the most insightful ones for extracting statistically significant sub-groups, values and anomalies in your data.”
What’s more, you don’t have to be a machine learning expert to use Ayasdi, Singh says. By comparison, other machine learning software companies may present a library of algorithms, but it’s up to the customers to select the appropriate one. “What that means is the customer has to be at least as smart as the company producing those algorithms to be able to use them,” Singh says. “What we’ve developed is wholly automated. They’re able to just throw their data into the system, and they don’t need to know any of this stuff. It just works.”
Real World Impact
TDA sounds great theoretically, but does it work in the real world? According to some of Ayasdi’s customers, the answer is an emphatic yes.
In addition to announcing the $55 million funding round and 400 percent bookings growth in 2014, the company also went public today with four new customers: the Mercy health system, Citigroup, Lockheed Martin, and Siemens.
Mercy expects to save $100 million over the next three years as a result of the standard care practices that it developed in part with Ayasdi Care, the version of Ayasdi Core that’s tailored to healthcare organizations. When it comes to knee replacement surgeries, for example, Ayasdi was able to isolate the key variables that determine whether the patient will have a strong recovery or stay in the hospital for months. That will save Mercy $1 million right off the bat, and the savings will add up as it creates more standard care practices. “We’re able to plug Ayasdi Care into Mercy’s EMR [electronic medical records system] and automatically discover these very complex clinical pathways from the data,” Singh says.
Lockheed Martin also expects to save more than $100 million as a result of its Ayasdi implementation, which is helping management identify projects that are threatening to “go off the rails.” While Citigroup didn’t put a number on its anticipated savings with Ayasdi, you can bet that it’s of a similar order of magnitude.
“Ayasdi’s big data technology simplifies and accelerates the analysis of thousands of discrete variables and delivers insights that enable Citi to tailor services to specific client needs, operate more efficiently and mitigate risk,” Deborah Hopkins, Chief Innovation Officer of Citi and CEO of Citi Ventures, said in a press release.
Ted Schlein, a general partner at KPCB, says this type of “machine intelligence” technology will be one of the breakthrough innovations that drive productivity over the next decade. “By combining many machine learning algorithms together with topological mathematics and artificial intelligence, Ayasdi developed an entirely new approach that simplifies complex data analysis for large organizations,” Schlein says.
Holding A Machine Learning Edge
Ayasdi Core includes both the traditional supervised algorithms that are commonly used to train and score predictive models, as well as unsupervised algorithms that are more widely used in data discovery. The company also maintains close ties to the math department at Stanford, “So we keep bringing algorithms into the fold that are hot off the press and not available anywhere else today commercially,” Singh says.
Most of Ayasdi’s customers run the in-memory software on-premise. Ayasdi Core is designed to use HDFS to store data, but it doesn’t run as a Hadoop application, Singh says. Hadoop, apparently, just isn’t fast enough. “The issue is that MapReduce and even Spark end up being just too slow to be able to process this data,” Singh says. “The main issue is the human is the bottleneck. It’s not the processors. Processors are cheap. But if you’re going to employ a data scientist, it’s going to cost you $200,000.”
In that regard, Ayasdi’s competitors are not machine learning software companies, but data scientists who would use machine learning technology. “There’s gap in the market for analytics and enterprise customers are trying to hire their way out of this problem and there just aren’t enough people,” Singh says.
Instead of trying to find a data scientist who’s a machine learning expert, Ayasdi has done the hard work of hammering unstructured data into a rough shape, and then hitting that data with a set of highly targeted machine learning algorithms to pick out the signal. This frees up the data analysts or business analysts to explore their data more quickly and more efficiently. At about $1 million per year, Ayasdi Core is not cheap. But considering the benefits some companies are getting, it’s generating a good return.
“Our software will tell you everything that’s statistically relevant about your data and then it’s up to you to tell if that’s actionable or not,” Singh says. “It sure as hell beats trying to ask a question and hoping to find something useful.”