Follow Datanami:
December 12, 2016

Can’t Get Satisfaction from Hadoop? Try Analytic Ops

Ron Bodkin

(hand idea/Shutterstock)

When Mick Jagger sang that he “can’t get no satisfaction,” he could have been referring to Hadoop. A significant disconnect between big data vision and execution is leaving many people (and their organizations) unsatisfied with their Hadoop experience.

One of the big challenges with Hadoop is just getting it going. It can be difficult to go from an idea and getting answers in an experimental setting to turning that into a data product that’s used by the rest of the company. As a result, the vision many people have for big data never seems to materialize. In some cases, organizations assume Hadoop just isn’t right for them, and they look to other technologies to solve their problems.

Another problem we see is that people fail to recognize how early they are in the process of understanding big data. They are experimenting to see if big data has any value to the business. When the interesting idea that kick-started their big data experimentation fails to deliver immediate business value, they give up.

It’s worth keeping in mind that the process of finding something that’s not just interesting, but will also have broader value, may take longer than expected. For this reason, it’s advisable to stay the course. Don’t give up too soon.

Hadoop Hang-Ups

Yet another mistake organizations make is assuming the big data exploratory environment will work as an operationally stable environment once in production. The people responsible for supporting experimentation by a limited number of expert users may not have the skills needed to bring a data product into a stable infrastructure that effectively serves a larger number of users.digital-hadoop

Creating a production environment for a data product requires specialized Hadoop skills around scaling, managing workloads, and optimization, and tuning. And it requires an agile operations mindset to automate data ingestion, wrangling, and model outputs to support productive exploration. The process of moving from pilot to production needs special care to ensure that satisfaction with Hadoop doesn’t take a nosedive.

In terms of designing a data product, in our experience, business users don’t care whether they’re using big data or not. Many organizations get stuck on making Hadoop visible to justify their efforts. Doing so can actually complicate processes and require business users to learn a new way of doing things. It’s often better to provide data the way people like to consume it with familiar interfaces rather than forcing them to learn a new technology. This will help ensure adoption and allow you to see a return on your efforts sooner.

Analytic Ops

One key to turning a steady stream of good ideas into business value is to create a bridge between data scientists, who come up with good ideas, and the operations team running the environment. The bridge I’m speaking of is Analytic Ops: an approach to continuous delivery of analytics results that requires close cross-functional collaboration.

businessman_gearsAnalytics Ops is based on cross-functional teams that include data scientists along with engineers who understand math and machine learning, business experts, DevOps practitioners and application engineers. Analytics Ops uses agile processes to create features for modeling, simulate models, deploy and test them live and monitor them in production. As an agile process, it relies on automation and a commitment to regular processes to test, deploy, retrain and rescore models, all with automatic monitoring to catch issues.

Finally, realize that it will take effort to put Analytics Ops in place and alleviate some of the current frustration with Hadoop. Much of the dissatisfaction is due to people losing patience when results aren’t immediately realized.

For example, just 18% of organizations are using Spark in production for advanced analytics, according to the 2016 Spark User Survey. People are still in the process of understanding these emerging technologies and exploring their data. If this is the case for your organization, take a more deliberate and strategic approach, investing time to determine your most important use cases. Don’t forget that the vision people had for the data warehouse was not fully realized right away, either.

Bear in mind that you’re looking to build a capability for generating thousands of insights over time and for building data-driven products that automate good ideas in the business. To do this, you’ll need to put in place not just the right technology, but the right people: a business expert who oversees the vision you’re trying to achieve, one or more data scientists who uncover insights and build and iterate models, an engineer who knows how to build production software for broader use, and a DevOps engineer who knows how to automate and continuously deploy models into production.

It takes time to develop a team and a process for repeatable success with big data analytics, but it’s time well spent, an investment that will ultimately bring you that all-too-often elusive satisfaction with Hadoop.Ron Bodkin

About the author: Ron Bodkin is the president and founder of Think Big Analytics, a Teradata company. Ron founded Think Big to help companies realize measurable value from Big Data.

Related Items:

Why Data Science is a Team Sport at AOL

Can Hadoop Be Simple Again?

Wanted: A Plug-In Architecture for Hadoop Development

Datanami