Follow Datanami:
July 24, 2012

University Report Unravels Big Data for CIOs

Ian Armas Foster

The University of Oregon’s Robert Brehm (also a software developer for Zoom Software Solutions) published a report called “What CIOs and CTOs Need to Know about Big Data and Data-Intensive Computing.” As stated in the title, the report is aimed at Chief Information and Technology Officers who are looking for answers as to how to move forward in the world of bigger data.

The report is comprehensive, summarizing thirty-two reputable sources along with providing accepted definitions to relevant big data-related terms, from ‘Alert System’ to ‘Visualization.’ The bibliography portion, in which the summaries reside, is split into five sections: things to consider when integrating big data, the relationship between big data and data-intensive computing, big data opportunities, big data limitations, and the cost of big data systems.

Brehm has managed to put together a hefty, readable compendium of the literature on the topic of data-intensive enterprise computing, leaving few stones unturned in terms of definitions and explanations. For instance, in one key section, he explains the importance of the role of the CTO and CIO in integrating big data-capable technologies. “In integrating big data as an emerging technology,” he says, “the CIO and CTO are responsible for the deployment of technology to support big data.” He also notes Hadoop’s potential to emerge on the forefront of data-intensive computing, citing a joint paper written by a trio of researchers at UC-Irvine in Vinayek Borkar, Michael Carey, and Chen Li, but that it needs major tweaking.

One of the strengths of the piece (and why we are recommending) is his ability to put big data context to the business intelligence world—a task that has been messy since some argue that big data and BI are separate entities with BI relying on traditional tools and big data requiring a complete refresh in terms of how enterprise analytics are done.

As Brehm notes, “Traditionally, IT departments have leveraged existing databases,” he says “to create data warehouses from which analytical reports could be derived for business intelligence.” He also recognizes the need to switch the emphasis from simply amassing large amounts of data to actually using it, citing a 2009 Richard Kouzes paper to make this argument, “data-intensive computing is a revolutionary shift away from data warehousing to applications that are concerned with the rapid processing of data streams to achieve timely and useful business analytics.”

Brehm says that the opportunities big data offers haven’t been realized by many enterprise-level users. Citing a McKinsey Global Institute paper, Brehm writes,“While some business sectors such as technology are ahead of the curve in their usage of big data, other sectors such as health care, manufacturing and the public sector are not yet on board. This is problematic, since big data can be incorporated in business operations to reduce costs and streamline operations.”

While opportunities abound, it is certainly true today that big data’s limitations outstrip their opportunities. Indeed, according to Brehm, three out of every five projects underperform their expectations. One of the biggest challenges is incorporating big data functionality into existing business intelligence operations, as scrapping that infrastructure altogether would be economically unsound. Further, Brehm echoes the common theme that there simply do not exist the people to run these technologies. Citing a Corporate Executive Board article, Brehm writes, “big data integration can be hampered from being fully integrated in the business if too few decision makers within the business are trained in the use of big data tools such as analytics.”

Arguably, the cost analysis section is one of the most important due to the bottom-line impact all of this talk about big data creates for users. Brehm admits that it is difficult to estimate the cost of a technology that is relatively new and, as a result, constantly shifting. As he writes, “the CIO and the CTO have to rely on rule of thumb, assumptions, and experience in order to provide justification for expenditures in big data.  Unfortunately, big data is too new and therefore accurate business financial data is not available.” Brehm references a Business Literacy Institute paper co-written by Karen Berman and Joe Knight.

Brehm also refers frequently to what he calls the ‘Fourth Paradigm,’ a scientific principle in which “scientists are unifying theory, experiment, and simulation to explore data for relationships.” Brehm is interested in exploring the shift from the third paradigm, in which scientists use computers for modeling purposes, to the fourth (the four paradigms concept was introduced by Jim Gray in a 2007 paper included in this report) and how companies and institutions are coping.

Overall, the report is a relatively short guide for executives who may be awash in options as to how to deal with big data in a manner that’s easier to digest than it would be if one scoured the academic presses for insight. What’s interesting here is that while this is, for all intents and purposes, an academic piece, it addresses the challenges of enterprise big data in a realistic way that lets readers take the info to heart—and put it in practical context.

Related Stories

Internet2 Puts Big Data Pedal to the Metal

Research Lifts Cluster Weight Off Graphs

Midwest Growing Hub for HPC, Big Data