October 2, 2013

Venturing Into the Great Unknown with YarcData

Alex Woodie

Those who practice the data sciences typically are not prone to waxing philosophically about the nature of knowledge. There is usually a clear cut business goal, and no time for head-in-the-clouds existential questions. The folks at Cray spin-off YarcData are not academics, but thanks to the massive data exploration machine they’ve built, they’re helping to cut data scientists free of the lines that keep their questions tethered close to the ground.

YarcData was carved off from supercomputer maker Cray in early 2012 after Cray had done some work for the U.S. government in the field of data discovery (the details are secret, and you probably know why). YarcData was created to bring that technology to market.

Its Urika line of appliances combine a NoSQL graph database and a shared-memory hardware architecture that scales up to 512TB of RAM, which tells you a lot about the size of datasets YarcData is helping customers to analyze. Customers typically bring their own front-end data visualization tool, such as those from Tableau or TIBCO, to bear on data stored in YarcData’s appliance.

The company has been in business for only about 18 months, but it has already captured customers in the fields of financial services, drug discovery, and cyber security, as well as positive reviews from IT analyst firms. YarcData president Arvind Parthasarathi recently discussed the state of data exploration and the potential for big breakthroughs with Datanami.

“What if I want to find something new for the very fist time? The challenge is, I don’t know what to look for. I don’t know what to ask. I don’t know how to get there. That’s because it’s all new,” Parthasarathi says. “That’s the difference between data discovery and more traditional analytics. We’re really focused on the things you don’t know and the questions you’re not asking.

The universe is an unfathomably large place, and all attempts to understand its nature must, by definition, start somewhere. We can’t change that. Brand new pieces of useful information do not just float down from the heavens. While serendipity undoubtedly has played a role in many of mankind’s greatest breakthroughs, it does require somebody to be there, and to be looking in the right direction at the right time and to have the capacity to notice it.

Nobody can predict when accidental breakthroughs will happen. YarcData’s approach is to grease the wheels for those who are actively pursuing data discovery, and to eliminate the barriers standing between data scientists and the discovery of new information. Creating what is, in effect, a Cray supercomputer with a proprietary ASIC and 512TB of shared memory is a great place to start.

The Urika appliance doesn’t eliminate all barriers–512TB is a lot of data, but it’s still 512TB at the end of the day. But it does open up the spectrum of questions that data scientists can ask of their data, and help to overcome some of the obstacles that are part and parcel of the business intelligence industry.

“If you just take a generic graph database, your first challenge is, how do you partition your data set? And the moment you partition it in certain way, you’re going to be presupposing what you can find,” Parthasarathi says. The YarcData approach “allows us to go after the problems where we don’t know the relationships in the data. You don’t have to partition the data. You don’t have to lay it out [in a certain way] and you don’t have to presuppose what you can find.”

This approach allows customers to test their hypothesis, and validate or invalidate them quicker than before. It also frees data scientists to ask questions that may, at first, appear to have a poor chance of being answered, but need to be crossed off the list anyway. “We don’t pay a penalty for following a random thought process or a random hypothesis,” he says.

“That’s our core hypothesis,” Parthasarathi continues. “If we can help you validate 1,000 hypothesis, one of them is going to be right. And that one could be a new brand new drug. It could be a new trading strategy. It could be a new fraud pattern. It could be a new terrorist. It could be a new cyber threat. It could be a new customer purchasing behavior.”

Last week, YarcData announced an update to the software component of its Urika appliance. The big news here is the company is taking a standardized approach to supporting the front-end data visualization tools that are necessary for interacting with the data stored in the appliance.

The decision to support W3C standards such as SPARQL and RDF eases the workload for the YarcData factory workers, because there are a plethora of data visualization tools that need to work with the appliance. “We stopped counting front-end tools at 57,” Parthasarathi says. “If we had to go out building point to point integrations with each of those 57 tools, we’re going to be sitting here for a long time building integrations.”

YarcData sells its appliance using a couple of models. Customers opting for the purchase approach can get started with a smaller appliance for about $200,000. As time goes on, they can upgrade to the 512TB monster. Alternatively, customers can rent a Urika appliance, and basically pay YarcData a monthly or quarterly subscription fee.

Datawatch’s Big Visualization Strategy for Data

Cray Big Data Arm Reaches Out to W3C to Push SPARQL, RDF Standards

Applications: Data Mining

Sectors: Financial Services, Government, Healthcare, Retail

Vendors: Startups and More...

Tags: graph database, NoSQL

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Venturing Into the Great Unknown with YarcData

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Venturing Into the Great Unknown with YarcData

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link