How To Not Get Overwhelmed by Big Data
Big data, by its very nature, is overwhelming. Like a tsunami, the deluge of data we’re all experiencing–a Datanami, if you will–is a massive force that cannot be controlled. But by starting small and finding out what works for you, it’s possible to start harnessing today’s massive data flows, and prepare yourself for what comes next: even bigger data.
Big data technologies like Hadoop, Spark, and NoSQL databases are causing organizations to rethink how they do enterprise IT. The possibilities of big data analytics, in particular, seem tantalizing and game-changing. However, while CIOs and IT managers understand that big data technologies are evolving at a fast pace right now, they’re still unsure how best to get started.
Here are three expert tips on how best to get started doing big data:
Start with Small Data
Some customers are bewildered by big data and the assortment of related tools, says Vik Mehta, CEO of VastEdge Solutions, a Silicon Valley firm that provides data analytics services. “I think customers don’t have a clue about what big data means,” Mehta says. “Big data is something everybody wants to do. Companies are just trying to understand what it means, how can it help them, and what do they need to do to implement big data.”
Mehta’s firm has been involved on some significant big data projects, including a Hadoop project at Honeywell’s Life Safety division. VastEdge developed an application that stores and analyzes data pulled from sensors that Honeywell’s customers deploy in the field, such as smoke detectors in airports and radiation detectors in uranium mines. The application can analyses millions of events and send an alert if the reading deviate significantly or suddenly from historical averages.
Real-time exception alerting is one of the best ways to make use of big data, but don’t set your big data sites too high at first. Instead, Mehta recommends that customers develop their big data analytic techniques on smaller, less imposing data streams. Only after they’ve shown a return on investment (ROI) with the small data should they move onto to more challenging projects.
Mehta cautions prospective big data users not to get caught up on the hype. At a recent industry conference, Mehta was taken aback by an IBM claim that it could have a big data application up and running for a customer in four months, considering Cognos implementations typically take years.
Instead, customers should take the long view. “Big data is not something you get value out of right away,” Mehta says. “It’s an ongoing process and you keep on fine tuning the algorithms to get the maximum value of it.”
Open Your Eyes
It would be a mistake to look at big data analytics the same way you looked at traditional business intelligence and reporting applications, says Judith Hurwitz, president and CEO of Hurwitz & Associates, a technology strategy consulting firm. Instead, big data newbies should explore the data and let the data speak to them.
“One of the big differences between big data and traditional approaches is that you’re not saying ‘I want the answer to the following question.’ You’re not doing that,” Hurwitz says. “You’re doing much more ‘I want to understand where this data is going to lead me.’ It’s much more related to that.”
Because of the open-ended nature of big data exploration, it’s more difficult to know where you are and where you’ll be going with your big data analytics project. That’s why visualization tools are so important for helping people to detect and understand the patterns that exist in big data sets.
“If you’re looking at massive amount of data and you’re looking for a pattern, you can’t say ‘Let me look at the answer.’ You can’t do that. It doesn’t work that way,” Hurwitz says. “So you need data visualization tools that let you see. ‘Oh look, this area has a patch of dark. That means there’s a lot of that pattern right over there.'”
At the end of the day, big data is incomprehensible. If you’re dealing with 1PB of data, it’s the equivalent of four times all the content in the U.S. Library of Congress. It all comes back to making big data fit into the human scale.
“When we talk about big data, we’re not talking about big data,” Hurwitz says. “In reality, you can’t do anything with big data. There’s too much. What you actually want is small data. What you actually want to do is to be able to reduce massive amount of data to the things that are important to you.”
Explore External Data Sources
After you’ve started using big data-style analytic techniques to glean insight from smaller pieces of data, you can start getting a taste of what’s to come by exploiting bigger and more diverse data sources. Eventually you’ll want to start tapping into other sources of data to enrich your primary sources. Obviously the complexity level goes up here, but so do the potential rewards.
The availability of external data sources is one of the most important aspects of the big data revolution. Ten years ago, there were basically two sources of external data with open APIs that organizations could use to easily incorporate into their own analysis, according to Sharmila Mulligan, CEO of ClearStory Data, a data analytics vendor that uses Apache Hadoop and Apache Spark to help clients harmonize multiple data sets.
Today, the number is greater than 10,000, Mulligan says. Whether it’s weather data from the National Weather Service, clinical data from universities, reviews from Yelp, Dun & Bradstreet company profiles, consumer behavior data from Nielsen, or the entire Twitter fire hose, the number of external data sources today is expanding rapidly.
Not all of this data is useful, and its cleanliness may vary. Social media data, in particular, is rife with petabytes of stuff that’s worthless to most people. But when you consider that the biggest companies in the world are increasingly harmonizing upwards a dozen or more different data feeds (the median is about seven, Mulligan says) then you begin to understand where the big data battle lines are being drawn in the corporate world.
But it’s never about the data sources themselves, Hurwitz cautions. “There’s a lot of data that everyone can get,” she says. “It’s how you correlate it and how you put the pieces together, how you understand the context of the data….and how you relate it to what problem you’re solving.”
Big data can be overwhelming at times. The level of activity and hype leading up to this week’s Strata + Hadoop World show in New York City is a prime example of that. But by finding success with smaller projects, keeping an open mind about what the data is telling you, and gradually layering in bigger external data sources, you can start to reap the rewards of big data analytics.