Univa
Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report


September 05, 2012

YarcData Architect on Hadoop’s Fatal Flaw


Systems like Hadoop and MapReduce are great at slicing problems into multiple pieces, evaluating each little piece, and plugging them back into the whole accordingly, much like an integral in calculus. But what if those little pieces interact with each other constantly, like sections of an ocean?

According to YarcData’s Solutions Architect, James Maltby, Hadoop and MapReduce are less suited to store these graphs than his company’s uRIKA database.

“Many graphs are tightly connected and not easily cut up into small pieces,” said Maltby. “A good example might be a map of genomic networks, which may contain 500 times as many connections as data nodes. Many MapReduce steps are required to solve this problem, and performance suffers. In contrast, uRIKA stores its graph in a large, shared memory pool, and no partitioning is necessary at all.”

Genomics is one of the more complicated and more exciting big data research fields. Medical scientists are working on genomics in hopes to ascertain precisely where diseases originate. However, the vast amount of genes per genome and the many connections those genes make amongst themselves makes genomics a complex big data problem. Slicing that problem severs those all-important connections.

Further, social networking data is intrinsically interconnected as people frequently make posts as a reaction to someone else’s post. Relational databases do not represent such data well, according to Maltby. “When the data is irregular or graph-structured, as in complex financial instruments or social network, the relational database becomes unwieldy and performance suffers.”

“In a semantic graph database like uRIKA,” said Maltby on how the semantic graph database differs from the relational, “the joins are implicit and built into the graph structure, so writing complex ‘what-if’ queries is easier, and performance is much improved.”

Of course, there exist in-memory semantic graph databases other than uRIKA. Per Maltby, what differentiates uRIKA is its performance, which stems from operating in-memory, and scalability. “uRIKA has the largest scaled sharable memory system in the industry, with up to 512 terabytes of RAM. Typical systems run from 2 to 32 terabytes of RAM.”

Not only does uRIKA reportedly scale 16 times more data than its nearest competitor, it also boasts an impressive input/output rate. “uRIKA is highly parallel, working on tens of thousands of parallel threads, working on the problem at the same time. And perhaps most importantly for big data problems, uRIKA has a high-speed I/O system. It’s capable of reading or writing up to 350 terabytes an hour.”

As big data problems grow more complex and interconnected, graph databases grow more important. Maltby and YarcData hope their uRIKA system will become the early standard-bearer for semantic graph databases.

Related Articles

MapReduce Makes Further Inroads in Academia

Study Stacks MySQL, MapReduce and Hive

Six Super-Scale Hadoop Deployments

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There is 1 discussion item posted.

Not fatal flaw but performance corner
Submitted by JimMaltby on Sep 6, 2012 @ 7:19 PM EDT


I should have been clearer in my video- I really consider uRiKA and Hadoop to be complimentary technologies. For problems with good horizontal scaling (i.e. partitionable), Hadoop offers unmatched price/performance. Several of our customers use Hadoop on clusters for preprocessing of unstructured or otherwise "raw" data into RDF triples for semantic graph analysis. These can be real HPC-class jobs! The point I was making was that for hard to partition datasets, a scalable shared memory appliance like uRiKA is more appropriate and offers much higher performance. In the Big Data application space, "scale up" and "scale out" are both necessary and complimentary.

Post #1

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia

SGI DataRaptor with MarkLogic Database

Job Bank

Datanami Conferences Ad

Featured Events

May 22-23, 2013
Business Intelligence Innovation Summit
Chicago, IL
United States

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event