Follow Datanami:
May 30, 2012

Morgan Stanley Turns to Hadoop

Datanami Staff

When it comes to industries that are actively adopting Hadoop and big data analytics frameworks, those in the risk management and financial services businesses are the cream of the crop for production use cases.

This week multi-billion dollar giant Morgan Stanley came clean about how it has implemented Hadoop across a small cluster of tired commodity servers to create a portfolio analysis machine that is able to handle massive data from all directions.

As Tom Groenfeldt reported this week, the financial services monolith set about to find a portfolio analysis solution months ago to combat the limitations of their current  traditional database and grid computing infrastructure.

The company’s executive director of IT management, Gary Bhattacharjee, who had some early experiences with Hadoop in its infancy, worked to string together 15 tired old “end of life” boxes which, as he said, allowed the company to “bring really cheap infrastructure into a framework and install Hadoop and let it run.” This experimental MapReduce and Hadoop environment allowed the company to address petabyte-sized problems that the standard databases weren’t able to tackle.

For projects like portfolio analysis, the ability to tap into a “schema-less design” allowed Bhattacharjee and his team to comb through massive volumes of data and run pattern matching for each and every attribute. The end result was a system that let the team find patterns in data that weren’t recognizable before—a valuable tool for the risky business of portfolio management.

While Bhattacharjee couldn’t divulge all the specifics, he told Groenfeldt about the log analysis side of the Hadoop operations and talked more generally about the reliability, risks and benefits of using an open source set of tools. While he notes that before there were serious stability issues, “the Hadoop eco-system has exploded in such leaps and bounds. Now there are multiple vendors, including Microsoft and IBM, that take Open Source and certify the code base, so we don’t have to sever relations with our vendors to go with Hadoop. EMC and HP also offer big support.”

The Morgan Stanley IT manager went on to note that “The way it has typically been done for 20 year is that IT asked the business what they want, creates a data structure and writes structured query language, sources the data, conforms it to the table and writes a structured query. Then you give it to them and they often say that is not what they wanted. Since Hadoop stores everything and it is schema-less, I can carve up a record or an output from whatever combination the business wants. If I give the business 10 fields filtered in a certain way and they want another 10, I already have those and can write the code and deliver the results almost on demand.”

Related Stories

Six Super-Scale Hadoop Deployments

Floating Big Data on GPU Clouds

Open Source Testbed Targets Big Data Development