Follow Datanami:
April 16, 2013

Finding SGI’s Needle with UV Big Brain

Ian Armas Foster

Two intuitively simple processes, searching and comparing, require two quite different skill sets when applied to large datasets. By literally moving a haystack around, SGI CTO Eng Lim Goh demonstrated the capabilities of the company’s UV Big Brain to accomplish both tasks. “The UV Brain gives you a massive coherent shared memory in order to carry a huge haystack and it has many processes around that big memory to allow you to do parallel processing of that comparison,” Goh said.

Searching for a needle in a haystack is a common idiom denoting the search for a significant thing amidst a swath of insignificance. As such, it is not surprising to see companies like SGI use said idiom when discussing the UV Big Brain computer.

It is slightly more surprising to see a literal presentation of hay being moved around to represent Hadoop nodes. As Goh noted to begin his demonstration, “In a Hadoop cluster, you basically start with a huge haystack. Then you divide and conquer, as follows.”

This approach, as Goh explained, is preferable when dealing with a situation where the nodes do not have to interact with each other. A search for a unique term, for example, fits the description, as the information related to that query is not dependent on knowing all of the information.

“You split the haystack up into multiple smaller haystacks. Each of these smaller haystacks is on a node in a Hadoop cluster…At each node level, you don’t need to talk to your neighbor.” Therefore, the workload can be split among the nodes and run in parallel. “Concurrently, each of these nodes are doing the same thing,” Goh said. This splitting and parallelization was represented by the simple splitting of the actual hay. It should be noted that while Goh did not appear to find the needle, his processing time was somewhat limited.

Either way, such an approach would not work as well for comparison queries.

As Goh held up one piece of hay against another and noted the height difference, he noted that such a process could be split up among different nodes in the cluster and the same amount of operations would happen. However, instead of those operations happening in an isolated environment, they interact across the network, putting strain on the connections and potentially creating bottlenecks.

“Every time you reach out, you are stressing that Hadoop network,” Goh explained. This does not stop a machine like UV Big Brain from making such queries, of course. Instead, the system reportedly adjusts so all information can be contained in one node, lessening or in some cases eliminating the network strain.

Related Articles

What Can Enterprises Learn From Genome Sequencing?

SGI Spreads Strategic Wings with DataRaptor

SGI Plants Big Data Seeds in HPC