Big data represents big opportunity. That opportunity lies in solving market inefficiencies, detecting fraud, and advancing scientific research (particularly genomics). Narrowing down one’s big data strategy, however, is a difficult task considering the relative youth of the information explosion.
To address these questions, SGI hosted a webinar that featured IDC’s Research Vice President of High Performance Computing Steve Conway, who went through the various use cases where big data can and already is helping and SGI’s Vice President of Product Marketing, Bill Mannel then discussed SGI’s big data strategy and how the company’s new big data hardware platform, Dataraptor, fits in.
Among the various use cases, Conway mentioned fraud detection as transitioning into more of a horizontal market. According to Conway, the US Government loses hundreds of billions of dollars in fraud from Medicare and Medicaid alone. With current detection policies, only about a billion is caught and punished. Using big data analytics to detect fraud detection at a higher rate such that even half of all fraud is caught and reversed would make those programs more sustainable.
Fraud detection also holds interest in the credit card industry, the retail industry and pretty much any industry that enterprising criminals can figure out and take advantage of (which is pretty much all of them). Finding instances of fraud today means spotting irregularities, a task that is difficult for a human to determine with certainty in the increasingly irregular world in which we live.
From a research standpoint, Conway noted, big data plays a big role in genomics. The marriage between big data and genomics is something covered frequently here but that doesn’t make the relationship any less remarkable. Conway too pointed to the challenges in the late 90’s of the famed Human Genome Project and how that ran for several years and a few billion dollars. Today, as a result of advances in big data, that same process requires a few hours and a few thousand dollars. There are few examples that represent the dramatic impact that big data has had over the last ten years quite as well as those in genomics.
Big data also provides big choices in that one can just as easily venture down the wrong big data path and create as many inefficiencies as they are solving. For example, a company focused on mapping and graphing data over a large area may be better suited staying away from Hadoop since batch and parallel processing do not necessarily fit graph functions all that well. Perhaps we need a big data application to help businesses select a big data strategy.
Mannel, in his portion of the presentation, drew several maps representing other companies’ perspective with regard to big data. For Mannel, they all pretty much boiled down to vendors focusing on what they do best and not much else.
This is a fine approach but it lends itself to having to deal with more vendors than is perhaps necessary. While India made a point of doing just that when forming their national identification survey, SGI hopes to make themselves viable in multiple fields.
Either way, to contrast, Mannel drew an intricate map of where SGI stands, starting with the intake of data from cell phone towers and airplanes (obviously not the only sources of data, but two that constitute a diverse representation) coursing through SGI’s Hadoop clusters and now through DataRaptor.
SGI hopes to take advantage of those search and building capabilities with DataRaptor, which, according to the company, is integrated with what they call “a tuned MarkLogic NoSQL database.” The purpose is to ultimately solve the myriad of big data problems presented by Conway in the use case portion of the seminar.
SGI hopes DataRaptor will aid companies in building applications based on the MarkLogic operational database. DataRaptor, according to Mannel, is a “plug in and go” system that can be operational within an hour. Deployments start at five nodes with 80 cores and scale to over 300 cores. The high-end 300 core systems hold 2.6 TB of memory and over 500 TB of storage.
Their two models, the ISS3112 and the ISS3124, can emphasize both performance or capacity. Each server is backed up with 16 Xeon E5-2600 cores and holds 384 GB of main memory. That capacity is meant to combine with MarkLogic, a NoSQL database intent on carrying out efficient big data queries and helping customers build applications. Those abilities could be enhanced by SGI’s new system.
SGI has a solid name in high performance hardware but will the MarkLogic angle be open and popular enough to propel them into big data in a big (enterprise) way?
One Giant Leap for Psychohistory
Quantcast Opens Exabyte-Ready File System
TIBCO CTO Flashes R&D Future for Fast Data