Systems like Hadoop and MapReduce are great at slicing problems into multiple pieces, evaluating each little piece, and plugging them back into the whole accordingly, much like an integral in calculus. But what if those little pieces interact with each other constantly, like sections of an ocean?
According to YarcData’s Solutions Architect, James Maltby, Hadoop and MapReduce are less suited to store these graphs than his company’s uRIKA database.
“Many graphs are tightly connected and not easily cut up into small pieces,” said Maltby. “A good example might be a map of genomic networks, which may contain 500 times as many connections as data nodes. Many MapReduce steps are required to solve this problem, and performance suffers. In contrast, uRIKA stores its graph in a large, shared memory pool, and no partitioning is necessary at all.”
Genomics is one of the more complicated and more exciting big data research fields. Medical scientists are working on genomics in hopes to ascertain precisely where diseases originate. However, the vast amount of genes per genome and the many connections those genes make amongst themselves makes genomics a complex big data problem. Slicing that problem severs those all-important connections.
Further, social networking data is intrinsically interconnected as people frequently make posts as a reaction to someone else’s post. Relational databases do not represent such data well, according to Maltby. “When the data is irregular or graph-structured, as in complex financial instruments or social network, the relational database becomes unwieldy and performance suffers.”
“In a semantic graph database like uRIKA,” said Maltby on how the semantic graph database differs from the relational, “the joins are implicit and built into the graph structure, so writing complex ‘what-if’ queries is easier, and performance is much improved.”
Of course, there exist in-memory semantic graph databases other than uRIKA. Per Maltby, what differentiates uRIKA is its performance, which stems from operating in-memory, and scalability. “uRIKA has the largest scaled sharable memory system in the industry, with up to 512 terabytes of RAM. Typical systems run from 2 to 32 terabytes of RAM.”
Not only does uRIKA reportedly scale 16 times more data than its nearest competitor, it also boasts an impressive input/output rate. “uRIKA is highly parallel, working on tens of thousands of parallel threads, working on the problem at the same time. And perhaps most importantly for big data problems, uRIKA has a high-speed I/O system. It’s capable of reading or writing up to 350 terabytes an hour.”
As big data problems grow more complex and interconnected, graph databases grow more important. Maltby and YarcData hope their uRIKA system will become the early standard-bearer for semantic graph databases.
Related Articles
MapReduce Makes Further Inroads in Academia
































Discussion
There is 1 discussion item posted.
Not fatal flaw but performance corner
Submitted by JimMaltby on Sep 6, 2012 @ 7:19 PM EDT
I should have been clearer in my video- I really consider uRiKA and Hadoop to be complimentary technologies. For problems with good horizontal scaling (i.e. partitionable), Hadoop offers unmatched price/performance. Several of our customers use Hadoop on clusters for preprocessing of unstructured or otherwise "raw" data into RDF triples for semantic graph analysis. These can be real HPC-class jobs! The point I was making was that for hard to partition datasets, a scalable shared memory appliance like uRiKA is more appropriate and offers much higher performance. In the Big Data application space, "scale up" and "scale out" are both necessary and complimentary.
Post #1
Join the Discussion
To join the discussion, become a registered user today!