Follow Datanami:
February 13, 2012

Data Guru Highlights Info Management Trends

Robert Gelber

To those who follow news about big trends in big data, the name “Edd Dumbill” has probably cropped in at least a few searches—if not on the program at any number of data-centric conferences and events. The entrepreneur, coder and author has most recently gathered attention with a thought-provoking video conversation about trends that are gaining steam in big data.

According to Dumbill, boiling code down to its essentials, addressing rapid data processing, taking advantage of shared and cloud resources, visualizing the data those resources spit out, and finding the talent to handle all of these things are top items on the 2012 enterprise data management agenda.

The first area of discussion dealt with simplification of code and programming in general. This appears to be one of the biggest downsides to big data analysis. Low-level programming can be daunting compared to higher level development tools and enterprise offerings, possibly stunting adoption of analytics applications. Charles Zedlewski, vice president of products for Cloudera said in an article, “mass adoption of Hadoop will really come once it’s routinely embedded inside applications, which will by definition provide a layer of abstraction that will mask the complexity of Hadoop from the average end user”.

Streaming data processing was another topic of discussion and it dealt with a number of necessary questions when working with massive amounts of data, like when it’s prudent to purge information. This can be quite tricky if one is used to simply adding a few hard drives, or if their enterprise runs an off-site backup. Eventually data admins will need to prioritize what data needs to be kept and which data will be used for short-term processing. Without it, space runs out and system storage looks like something out of an episode of Hoarders (that TV show where people keep buying things and never throw anything away).

On a lighter note, efficient data processing means having the ability to run real-time processes that depend on time-sensitive data. A great example is the ability to send location-aware advertisements to mobile devices in real-time. Depending on the needs of an enterprise, that location data could be stored for a short amount of time or not at all.

Shared resources also came up as an area for growth in the coming year. Assuming that most enterprises do not already possess all the external, non-essential data required for better analysis. The idea of groups and businesses sharing and updating community data resources, has a good amount of potential for those attempting to flex their analytic muscle.

For example, traffic reporting databases have the ability to pull data from street cams and get accident information from law enforcement. With that public information, a retailer could automatically determine when the most amount of traffic is near a brick and mortar store and purchase advertisements within a 5-mile radius during those times.

For businesses attempting to harness this new flood of information, creation of data science teams may be the route for adding advanced analytics into their arsenal. However, since an analytics team would not carry a typical company IT or accounting role, changes in strategy may be required. Some immediate concerns that come to mind include types of security access (financial information, customer data etc…) and presentation of findings.

Speaking of presentations, this leads us to Dumbill’s final trend, Visualizations. A lot of work goes into gathering, storing and analyzing data. Figuring out those technical details can be very difficult, but what’s worse is spending all that time and effort to create a report that has little or no graphical representation.

This is a lesson that has already been taught to the computer industry in general. The GUI was a monumental success as it allowed users to finally see and interact with their data in a way that seemed more human. Without the experience of actually seeing the data represented in a variety of simple, malleable interfaces, the truly important information in analysis may be lost to a confused board member.