Follow Datanami:
March 9, 2012

IBM Big Data VP Surveys Landscape

Nicole Hemsoth

When it comes to marking the evolution of data and the innumerable tools to tame it that have emerged over the last decade, Anjul Bhambhri has what amounts to a bird’s eye view of the industry.

An electrical engineer by training, Bhambhri now serves as IBM’s Vice President of Big Data Products, a role that taps into her experiences working with IBM’s Optim app and other data management tools at companies like Informix and Sybase.

Known for her work in developing the XML core for IBM’s DB2 offering, she has a rather unique perspective on both the software and hardware sides of the emerging big data ecosystem.

Even with all cynicism about the big data market thrown in, it’s almost impossible not to point to IBM as one of the leading providers of big data solutions, particularly on the software side.

However, as we pointed to during our conversation with Bhambhri, the big data ecosystem is generating a lot of attention for startups and open source initiatives, some of which directly impede on IBM’s business on price, community support and other ways.

Bhambhri’s answer to the matter of blooming competition from both startups and open source (and what seems mostly in big data these days to be open source-based startups) was the following:

“I think it makes a lot of sense that solutions are mushrooming, which are leveraging the insights that can be gained from big data. I think for the customers that no longer want to build their own applications and if they can find a solution that fits their needs with maybe some customization, it’s important that those solutions be made available to customers so we are working with a lot of companies in the space who are providing solutions.

 ;IBM has offerings in that space as well, from internally developed solutions to technologies we’ve acquired from Unica, for example, so we’re we are obviously working with both the solution providers who are inside IBM as well as small companies or other companies that are providing big data solutions that we are partnering with that can be run on BigInsights so they can analyze and show the results of the analytics capabilities that we can provide on big data. It makes their solutions more complete and more competitive than if they were not able to analyze this big data, so the answer would be partnering as well as enabling any solutions that IBM itself is building.”

As Bhambhri noted, however, this is nothing new and everyone in the industry is just getting started on the journey to big data solutions. As she noted, “For relational databases, a lot of players providing offerings in this space go through the cycle of what the needs are for structured data. As you can imagine, a lot of that work is also starting for unstructured or semi-structured data.”

Acquisitions on IBM’s part are a continual event and we asked what might be down the road for IBM and the big data division she leads. As she told us, however:  ;“I cannot promise specifically on what we would do from an acquisition standpoint, but we are certainly partnering with a lot of players in the space and really making them successful, by providing our technology and sharing with them what the possibilities are around that. So I can’t promise specifically on what acquisitions we would be doing in this space or not.”

She says that while IBM isn’t missing any crucial pieces of the big data pie early on, she says that this is a new area and as such, the emphasis of IBM’s approach is focused on the platform. From our conversation:

“If you look at what we have done recently, you see we are paying attention to all fronts of the big data platform, especially in terms of how we can ingest data from a variety of sources, and also in terms of being able to analyze the data, perform historical analysis and uncover patterns. We are also focusing on providing tooling so that application developers can build new sources of applications and do ad hoc analysis as well as write, debug and deploy applications.”

She pointed to the value of this approach, saying that the company has already worked closely with over 200 customers over the last couple of years using this platform approach that looks to meet the needs of everyone from the analyst to the developer to the system admins.

Bhambhri notes, however, that there is still a lot of work that is happening around their big data solutions and the providing of what she calls big data accelerators. As she told us, IBM has “around 100 sample applications that have been harvested from the work we have done for specific use cases and customers. These have been built into the product so the customer can spend time analyzing as opposed to implementing.”

To highlight this she says that users can take these applications as a starting point to customers; that way if they want to, say, analyze data coming in from social media versus starting from scratch they can look at actual IBM data and while it might not be exact, it can provide a reasonable start point that IBM can continue developing into a more complete solution.

On that note, she pointed to a number of existing IBM products and case studies that highlight the ways big data is being harnessed. At the core of this part of our chat was our discussion of the company’s approach to complex event processing. From the transcript of our discussion:

What we have that goes a step beyond complex event processing is called InfoSphere Streams and it has the ability to ingest, analyze and ask of very large volumes of structured, semi-structured and unstructured data and the data as it is coming in can be processed and analyzed at micro-second latency.  ;

When you look at the volume there are applications which needed real-time analytics, which we have deployed in the telecommunications sector. We are processing 6 billion calls per day. Those volumes are huge and beyond what any complex event processing system could handle or analyze with that kind of latency. The volume of data is huge and it could be structured or unstructured. The reason the telco wants to analyze this data is because they want to resolve billing disputes and understand customer choice.

We have deployed in the healthcare space in Ontario where we are able to process how those people resolve information coming from exadata, text data, and this could be analyzed. It’s humanly impossible to analyze all this information, but with InfoSphere streams we are able to analyze it and we are able to predict by seeing a pattern in the data, we were able to predict onset of infection in a newborn baby 24 hours in advance so the doctors could do something about it. This is not just events that are coming in, but it’s a lot of structured, unstructured, semi-structured data that is flowing in and we are able to pre process all of that and see what patterns are emerging. The same thing has been deployed at the University of Columbia to process information about stroke patients and be able to see if some pattern is emerging which indicate what kind of an action the doctor should be taking.

This has been part of the Smarter Planet initiative. We have done work to improve water quality, to improve traffic condition problems in Stockholm, Zurich and Kyoto so commuters can be told which routes to avoid and city planners can take action based on what information they were getting from this data.”

Related Stories

IBM Investing Billions in the Big Data Frontier

Mr. Watson Goes to Wall Street

IBM Cranks Turbine Decisions