Follow Datanami:
June 30, 2016

8 Tips for Achieving ROI with Your Data Lake


Would you spend millions of dollars building a data lake if you knew you wouldn’t get your money back? Of course not. But all too often, organizations embarking on big data projects don’t do what it takes to achieve a return on investment.

Here are eight ways you can help your own cause and earn a return on your data lake investment.

1. Start Small, Be Targeted

You may have grandiose ambitions to use big data analytics to transform your organizations into digital powerhouse, and that’s great. Big things are possible with big data. But the truth is that most successful big data teams start small, with a single project firmly grounded in solving an actual problem that affects your organization.

“You can get value straight away if you create your first use case to solve a problem you actually have,” Edd Dumbill, vice president of strategy at Silicon Valley Data Science, says in a recent Hadooponomics podcast with Blue Hill Research’s James Haight. “And this is the pattern if you look at successful adoptions of big data. This is absolutely the pattern they’ve had.”

Getting that first success under your belt is probably the single most important thing you can do if you’re just starting out in big data. This advice has been offered to Datanami by numerous big data experts, and it’s pertinent advice that ought to be heeded.

2. Explore New Data



Don’t be afraid to venture out of your data comfort zone and seek out fresh new data sets that could have an impact on your organization. There’s a huge variety of data out there today, and a lot of it literally floating by on the Internet, just waiting for you to scoop it up and do something creative with it.

Whether it’s social media data plucked from the fire hose, geographic travel data bought from smartphone operators, or demographic data acquired from data brokers, there’s a veritable treasure trove of data to explore. The combination of your existing customer, product, or sales data with this outside data can be used to improve some aspect of your business.

It’s worth mentioning that most big data projects involve some form outside data, and this is why the data lake phenomenon is growing. These diverse data sets come in all shapes and sizes, which is why a flexible platform like Hadoop is often used to do the initial data collection and the first phase of transformation.

3. Use Free Software Where You Can

It’s true that data lakes aren’t free. By definition, they store a lot of data, and storage costs money. But that doesn’t mean that software should eat your entire budget. In fact, many of the most popular big data tools are open source, and that means they’re free.

Vivian Zhang, CTO and founder the NYC Data Science Academy, advises new data science practitioners to look for deals where they can. “Big data newbies should start with an open data science technology stack such as Hadoop, Spark and etc.,” she tells Datanami via email. “It is free of charge and will help them get up to speed.”

As the newbies find their footing in the big data world, then they can look to commercial solutions to support their production big data operations. “When it is necessary, they can then move to commercial solutions such as Transwarp, Cloudera, MapR, etc.” Zhang says. “There are a lot of mature solutions for different industries which demonstrate business value and  accountable return on investment.”

4. Train Your Peopledata scientist

There are three indispensable elements to every data science project: data, technology, and people. Like the veritable three-legged stool, if you’re missing any one of these elements, your project won’t hold up. But getting your people trained with the appropriate skills is probably one of the most difficult aspects of your big data journey.

So what should you learn? That’s tough to say. At the top of the heap are data scientists, who typically have an advance degree in a field like mathematics or statistics and a strong grasp of machine learning techniques. Those using tools like Spark are often proficient in a programming language like Scala or Python; SAS and R are also widely used in the data science field.

While most data scientists will want a post-graduate degree, you may be able to obtain much of what you need to know through online training or, even better, by attending one of . And it’s worth mentioning that, if you’re an business or data analyst, knowing how to use a tool like Tableau, Qlik, or Spotfire can take you quite far.

5. Don’t Underestimate the Difficulty

There’s a widespread perception that you can just load a bunch of data into an analytic repository, press a button, and voila! Out pour amazing and transformative insights, turning you overnight into a big data rock star.

The truth is, data science is hard. The insights are there to be had, and yes, you can become a algorithm-waving rock star and the toast of your department. But extracting those actionable nuggets of gold requires the combination of a data scientist’s skill and a executive’s understanding of how the business works.

Thinking like a business person helps, says Dumbill. “You have to understand a little bit about how to actually interface the exploration and exploitation of your data into solving a business problem,” he says in the Hadooponomics podcast. “So right now, this is probably manifested most clearly in the trend of talking about data lakes, with large repositories where you can unify data that was previously siloed. That is a good thing, right? But there’s no way in which, if you invest several million dollars in a year in that, putting the data together, that that will result in immediate business benefit.”

6. Keep Calm and Calculate ROI



Don’t expend to get an immediate ROI from your big data lake. While you shouldn’t go into a big data project with a completely blank slate, you should give yourself room to explore your data and see what it tells you. That takes time and patience.

But eventually, your investments will need to bear fruit if you expect your CFO to continue signing off on the big data project. That means you should figure out how to calculate the ROI. There are several recognized methods for figuring this out, but basically it’s the dollar amount of return you get minus the money invested.

Don’t be afraid to reach out for help with the calculations, Zhang says. “C-suite should absolutely calculate the ROI and encourage their technical team to fully utilize open data science solutions,” she says.

7. Challenge Your Quants

So you landed a hard-to-find data scientist and brought her into the team. Congratulations! Now that you’ve found a unicorn, you might be tempted to coddle her and appease her so she won’t get scared and flee for new ground.

Here’s some advice: Don’t. That management technique actually could backfire, according to Dumbill, who founded the Strata conference for O’Reilly Media before it merged with Cloudera’s Hadoop World.

“If you want to have a transformational data science team, they need to be challenged by new work,” Dumbill says in that podcast. “They need to be constantly stimulated and be able to solve really difficult problems.”

8. Get Buy In from Executives

(Monkey Business Images/Shutterstock)

(Monkey Business Images/Shutterstock)

As you can tell, there’s a lot that goes into building a successful data lake, and a lot goes into building a successful data science team. But it could all be for naught if you don’t follow this last piece of advice.

According to Dumbill and many other big data gurus, getting support from the owners or executives in charge is an absolute must. “When you’re talking about changing the decision making abilities of a company as a whole, you have to have executives buy-in,” he says.

While the C-suite — including CFOs, COOs, and of course CEOs — have been trained to think of the IT department as a cost center to be managed (and probably cut to the bone or even outsourced if possible), data science is a different animal entirely. Ideally, your data analytic projects are tackling problems that are strategic to the company, and that means it should get a little more attention.

“The worst kind of arrangement you can have is a siloed black box of analysts and data scientists who get requests shouted over the wall and then have to throw reports back over the wall later on,” Dumbill says.

What’s your advice for achieving big data success? We’d love to hear from you. You can share your expertise or comments to us at [email protected]


Related Items:

What’s Hot This Summer: Data Science Bootcamps

Avoid These Five Big Data Governance Mistakes

Data Science Operationalization in the Spotlight at Leverage Big Data ’16