Venturing Into the Great Unknown with YarcData
Those who practice the data sciences typically are not prone to waxing philosophically about the nature of knowledge. There is usually a clear cut business goal, and no time for head-in-the-clouds existential questions. The folks at Cray spin-off YarcData are not academics, but thanks to the massive data exploration machine they’ve built, they’re helping to cut data scientists free of the lines that keep their questions tethered close to the ground.
YarcData was carved off from supercomputer maker Cray in early 2012 after Cray had done some work for the U.S. government in the field of data discovery (the details are secret, and you probably know why). YarcData was created to bring that technology to market.
Its Urika line of appliances combine a NoSQL graph database and a shared-memory hardware architecture that scales up to 512TB of RAM, which tells you a lot about the size of datasets YarcData is helping customers to analyze. Customers typically bring their own front-end data visualization tool, such as those from Tableau or TIBCO, to bear on data stored in YarcData’s appliance.
The company has been in business for only about 18 months, but it has already captured customers in the fields of financial services, drug discovery, and cyber security, as well as positive reviews from IT analyst firms. YarcData president Arvind Parthasarathi recently discussed the state of data exploration and the potential for big breakthroughs with Datanami.
“What if I want to find something new for the very fist time? The challenge is, I don’t know what to look for. I don’t know what to ask. I don’t know how to get there. That’s because it’s all new,” Parthasarathi says. “That’s the difference between data discovery and more traditional analytics. We’re really focused on the things you don’t know and the questions you’re not asking.
The universe is an unfathomably large place, and all attempts to understand its nature must, by definition, start somewhere. We can’t change that. Brand new pieces of useful information do not just float down from the heavens. While serendipity undoubtedly has played a role in many of mankind’s greatest breakthroughs, it does require somebody to be there, and to be looking in the right direction at the right time and to have the capacity to notice it.
Nobody can predict when accidental breakthroughs will happen. YarcData’s approach is to grease the wheels for those who are actively pursuing data discovery, and to eliminate the barriers standing between data scientists and the discovery of new information. Creating what is, in effect, a Cray supercomputer with a proprietary ASIC and 512TB of shared memory is a great place to start.
The Urika appliance doesn’t eliminate all barriers–512TB is a lot of data, but it’s still 512TB at the end of the day. But it does open up the spectrum of questions that data scientists can ask of their data, and help to overcome some of the obstacles that are part and parcel of the business intelligence industry.
“If you just take a generic graph database, your first challenge is, how do you partition your data set? And the moment you partition it in certain way, you’re going to be presupposing what you can find,” Parthasarathi says. The YarcData approach “allows us to go after the problems where we don’t know the relationships in the data. You don’t have to partition the data. You don’t have to lay it out [in a certain way] and you don’t have to presuppose what you can find.”
This approach allows customers to test their hypothesis, and validate or invalidate them quicker than before. It also frees data scientists to ask questions that may, at first, appear to have a poor chance of being answered, but need to be crossed off the list anyway. “We don’t pay a penalty for following a random thought process or a random hypothesis,” he says.
“That’s our core hypothesis,” Parthasarathi continues. “If we can help you validate 1,000 hypothesis, one of them is going to be right. And that one could be a new brand new drug. It could be a new trading strategy. It could be a new fraud pattern. It could be a new terrorist. It could be a new cyber threat. It could be a new customer purchasing behavior.”
Last week, YarcData announced an update to the software component of its Urika appliance. The big news here is the company is taking a standardized approach to supporting the front-end data visualization tools that are necessary for interacting with the data stored in the appliance.
The decision to support W3C standards such as SPARQL and RDF eases the workload for the YarcData factory workers, because there are a plethora of data visualization tools that need to work with the appliance. “We stopped counting front-end tools at 57,” Parthasarathi says. “If we had to go out building point to point integrations with each of those 57 tools, we’re going to be sitting here for a long time building integrations.”
YarcData sells its appliance using a couple of models. Customers opting for the purchase approach can get started with a smaller appliance for about $200,000. As time goes on, they can upgrade to the 512TB monster. Alternatively, customers can rent a Urika appliance, and basically pay YarcData a monthly or quarterly subscription fee.