Follow Datanami:
September 5, 2012

YarcData Architect on Hadoop’s Fatal Flaw

Datanami Staff

Systems like Hadoop and MapReduce are great at slicing problems into multiple pieces, evaluating each little piece, and plugging them back into the whole accordingly, much like an integral in calculus. But what if those little pieces interact with each other constantly, like sections of an ocean?

According to YarcData’s Solutions Architect, James Maltby, Hadoop and MapReduce are less suited to store these graphs than his company’s uRIKA database.

“Many graphs are tightly connected and not easily cut up into small pieces,” said Maltby. “A good example might be a map of genomic networks, which may contain 500 times as many connections as data nodes. Many MapReduce steps are required to solve this problem, and performance suffers. In contrast, uRIKA stores its graph in a large, shared memory pool, and no partitioning is necessary at all.”

Genomics is one of the more complicated and more exciting big data research fields. Medical scientists are working on genomics in hopes to ascertain precisely where diseases originate. However, the vast amount of genes per genome and the many connections those genes make amongst themselves makes genomics a complex big data problem. Slicing that problem severs those all-important connections.

Further, social networking data is intrinsically interconnected as people frequently make posts as a reaction to someone else’s post. Relational databases do not represent such data well, according to Maltby. “When the data is irregular or graph-structured, as in complex financial instruments or social network, the relational database becomes unwieldy and performance suffers.”

“In a semantic graph database like uRIKA,” said Maltby on how the semantic graph database differs from the relational, “the joins are implicit and built into the graph structure, so writing complex ‘what-if’ queries is easier, and performance is much improved.”

Of course, there exist in-memory semantic graph databases other than uRIKA. Per Maltby, what differentiates uRIKA is its performance, which stems from operating in-memory, and scalability. “uRIKA has the largest scaled sharable memory system in the industry, with up to 512 terabytes of RAM. Typical systems run from 2 to 32 terabytes of RAM.”

Not only does uRIKA reportedly scale 16 times more data than its nearest competitor, it also boasts an impressive input/output rate. “uRIKA is highly parallel, working on tens of thousands of parallel threads, working on the problem at the same time. And perhaps most importantly for big data problems, uRIKA has a high-speed I/O system. It’s capable of reading or writing up to 350 terabytes an hour.”

As big data problems grow more complex and interconnected, graph databases grow more important. Maltby and YarcData hope their uRIKA system will become the early standard-bearer for semantic graph databases.

Related Articles

MapReduce Makes Further Inroads in Academia

Study Stacks MySQL, MapReduce and Hive

Six Super-Scale Hadoop Deployments

Datanami