Follow Datanami:
June 6, 2012

Why Cray is Clamoring for Your Code

Nicole Hemsoth

Five years ago, one would have never expected to see iconic supercomputer maker Cray setting up shop at a show like the Semantic Technology and Business Conference, but times have changed and they’re all about data—big data, of course.

The company, which recently formed its YarcData division and some fancy new hardware to focus on big data problems, announced a code contest with over 100k up for grabs.

What’s interesting about the race for the best in big data software is the way that Cray is framing it. The contest is called the YarcData Graph Analytics Challenge, and it will dole out rewards to the “best submissions for solutions of un-partionable big data graph problems.

Although graph problems go hand-in-hand with big data analytics, it’s rare to hear a company approaching the noisy big data space with an emphasis on the term “graph problems” which is rarely heard outside of academia (consider the Graph500, for instance).

“Graph databases have a significant role to play in analytic environments, and they can solve problems like relationship discovery that other traditional technologies do not handle easily,” said Philip Howard, Research Director, Bloor Research. 

As Arvind Parthasarathi, President of YarcData told us, there are many reasons why Cray would want to discover innovative software solutions to big data problems. As he told Datanami this week, “Many real world problems require fusing very large amounts of data to then analyze and understand the inherent relationships and connections in the data using graphs.”

According to Parthasarathi, part of the reason Cray is paying for prize-winning big data code is to raise the awareness of such problems in the industry.  He says that in addition to highlighting graph problems, the company wants to highlight the increasing adoption of RDF/SPARQL as the industry standard for graph analytics. As he stated, “We hope that the contest will accelerate the development, adoption and continued innovation of RDF/SPARQL, and in the applications that utilize these standards.”

The YarcData lead says that “Graph problems have always been present – but practical means of deriving insight and analytics to these problems at scale and at real time has been the challenge historically.”

Parthasarathi  points to the metaphor of a detective who is seeking to infiltrate a drug deal between two known dealers who are planning to meet in a particular city they each have to travel to. The question a detective would like to ask is, “Show me any relationships between known dealers who have bought plane, train, or bus tickets to the same destination in cash where they arrive within 1 week of each other.”

In this example, we do not know what the exact relationship between the dealers is ahead of the query: how strong or weak, and there may not be any direct connections linking the two people. There may be “non-obvious” intermediate relationships we need to find in-between them. We don’t know precisely what we are looking for. Parthasarathi says that these problems have application across a wide range of verticals—this questioning and finding of unknown relationships that constitute a big picture.

Parthasarathi says that commercial enterprises need to integrate and correlate their enterprise customer transaction data with real-time trends propagating within online social networks, and do quantitative behavioral science on patterns locked away in data being continually generated by the ubiquitous use of increasingly sensor-rich mobile devices. 

He also notes that there is an increased awareness of graphs in the general market due to concepts and technologies like Facebook’s “opengraph” and Google’s “Knowledge graph”. He tells us that, “While some IT departments may not have started talking about graphs, we’re seeing an increasing trend of graphs entering the vernacular and graph problems rising in business priorities. We believe graph analytics is an important segment of the overall analytics solution and that it will continue to grow.”

Parthasarathi says Cray has already seen a number of such “big data” graph problems playing out in production across a variety of verticals and applications and claims that if successful, the call for big data code will reveal some novel uses of graph problems. There are already many application areas for graph problems existing, however.  Among these–

  • Finding patterns of suspicious activities from real-time analysis of dynamically changing data for national security and law enforcement purposes
  • Analysis and searches across the huge body of medical literature and genomic/cancer databases to find previously unknown connections
  • Find and cluster patients by similarity across longitudinal and history data spanning all events, symptoms, diagnoses, and treatments using thousands of parameters
  • Analysis of the spread of infectious disease to identify key hubs and optimize interventions to slow or halt spread
  • Fraud detection and analytics in a number of use-cases include Medicare & health-care reimbursements, property & casualty insurance, trading compliance, and anti-money laundering

In addition to remarking on the value of graph problems, Bloor Research Director, Philip Howard says this contest will be “positive for the overall graph database market, and this contest could help expand the use of RDF and SPARQL as valuable tools for solving big data problems.”

The YarcData Graph Analytics Challenge will officially begin on Tuesday, June 26, 2012 and winners will be announced during a live web event on Dec. 4, 2012. Full contest details including specific criteria and the contest judges will be announced on June 26. To pre-register for a contest information packet, please visit the YarcData website at Information packets will be sent out June 26.

For a flashback to review on Cray’s entry into big data, check out this video from the Supercomputing Conference in November, 2011, shortly before the formal announcement of the company’s YarcData divison.