Strata Speakers Drop Clues on Winning with Data Science
The rapid pace of technological innovation is giving organizations amazing new capabilities in the field of data science. These advances are lowering the barrier of entry and super-charging data science capabilities for organizations around the world. With the playing field leveled somewhat, we look to the Strata Data Conference this week for clues on what will separate the data science winners from losers.
Hilary Mason, head of Fast Forward Labs at Cloudera, made several recommendations for how to find success in data science during her opening keynote address Wednesday morning. She began by remarking how quickly the field of big data has changed.
“Things are things we can do today we could not do all at 5 years ago. But more importantly than that, there are things that are cheap today that were unaffordable 5 years ago,” Mason said. “This may be because of economic constraints and the cost of CPUs or GPUs. But it’s that change in accessibly that is really empowering, and the way we use those tools in our practice is changing as quickly as the technology changes.”
However, even though we have tremendous computing capability available to us and great new data science tools and techniques available to us, that doesn’t mean we should assume that the combination of data science, big compute, and big data will automatically solve all problems.
“That means that, if they’re not technology problems, you can’t solve them without solving everything around the technology,” Mason said. “It might be people, process, organizational structure, collaboration, goals, and investment strategies.”
Mason is a big advocate of data products. As a data scientist at Fast Forward Labs and now working for Cloudera, she takes pleasure in well-built data products. One of her favorites is Google Maps, she told the Strata audience. Why is Google Maps so impressive? “It’s boring!” she declared.
Building a data product as successfully boring as Google Maps requires an abundance of data science skill and processing power, she concedes. But more than that, it requires finding a place in the world and a context where it can fit in and have impact.
“Success requires both technology and practice,” Mason said. “Even the people who are best in the world at this still struggle to get it perfect. Doing this well requires both deep technical understanding and good expertise around product and practice.”
Mason is definitely bullish on the potential for data science and machine learning tools and technology to improve the design and usefulness of data products. She said this is the most exciting time for such tools, and that the next two years will bring even greater innovation. “The hype has died,” she said. “We’re able to do really interesting, effective work now.”
But there’s a catch, a “technological trick,” that will make it hard to find optimal data solutions, Mason warned. The problem, which she has talked about before, is that you can’t just go out and buy a shrink-wrapped product that’s going to make you a winner. The reason why is that each organization’s data product needs to be so highly tailored to its own business that it would be impossible for a vendor to make it.
What’s more, the data scientists can’t be the only source of inspiration. “You should own that work,” she advised. “The people who are best positioned to recognize the opportunities are often not the data scientists, but the people who own the products, who run the business. It’s your job, too.”
Information (Deficit) Economy
Anoop Darwar, a senior vice president at MapR, was also bullish on the potential for data to make big changes. But he pointed out an interesting dilemma that could impact your data science work.
He cited the 1971 work of American economist Herbert A. Simon, who stated that “in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes.”
Darwar postulated a corollary around Simon’s idea. He said that, despite the availability of huge amounts of new data, we aren’t necessarily making better decisions today than we did five years ago. This is due to a counter-intuitive effect of data growth, what he termed the “information deficit economy.”
“Every week there’s a new algorithm,” he said during his keynote address Thursday. “There’s an abundance of machine learning. But it’s also created a deficit for data scientists, but also data logistics.”
Darwar offered three secrets to breaking out of this information deficit economy. The first tip is this: “Love the problem, not the solution.”
“If you search for what’s happening in big data or what’s happening with tools, you’ll be amazed at the amount of information that you’re going to get,” he said, “and you again will be beginning the attention deficit. So instead of focusing on the tools, focus on the problem you want to solve.”
The second secret was to invest in a data fabric (which ironically is what MapR is now selling). Storing data in multiple silos may be easy, but it leads to problems down the road.
His third piece of advice was to inject intelligence into operations. “You can take the insight and leverage it into the operations before the insight vanishes,” he said. “It’s an arbitrage.”
Eric Colson, the chief algorithms officer at StitchFix, had some great insights into the nature of competition. His chief piece of advice was to take a conscious and deliberate approach towards differentiating their businesses. You must think big and avoid falling into patterned thinking, he said.
“Changes in technology have made new data available, and the company that better leverages that data is going to have an advantage,” he said. “If you’re going to do this — differentiate through data science — it cannot be business as usual. You cannot do the same old company, the same old work structure, the same old goals and processes. Things are going to have to get different. Specifically, we have to rethink the role that data science plays in an organization and we’re going to have to create an environment in which it can thrive.”
Colson is adamant that the source of creativity with data will come from the data scientist. But to enable the data scientist to be creative, the CEO must first create the right pre-conditions. To demonstrate this, he used an evolutionary quirk in the African giraffe as an example. While the animal appears to be elegantly designed to excel in its specific niche — eating leaves high off trees — in fact it suffers a design flaw.
As the African giraffe evolved, its neck got longer. However, because the animal’s laryngeal nerve was on the other side of the aorta, as it is in all mammals, it needed to loop all the way from the giraffe’s mouth to its heart and back up to its brain. That results in a 15-foot nerve that doesn’t work so well.
“As a result giraffes don’t make much sound,” Colson said. “This isn’t optimal and it can’t be fixed through evolution. Framing sets the source for initial design and all future evolution. Once framed, typically only incremental changes are possible.”
Just like the giraffe with the semi-functional laryngeal nerve, data scientists can be relegated to subpar work if they’re not allowed to set the frame of reference from the beginning, Colson said. For example, you may tell your data scientist to go optimize something, yet you unconsciously restrict her by telling her to not to deviate much from the past, he said.
“This is constraining,” he said. “They become invisible to us because we’re so used to it, but they are there and they are constraining. What we thought was optimal may not be optimal at all. We’re stuck in the plane, optimizing, doing incremental improvements when we could have been up there at the top of the curve.”
To avoid this trap, be careful of what you ask your data scientists to do. “The point is, don’t ask your data scientist to optimize something. Instead ask them to frame the problem. Proper framing can give you great intuition into how things work, both how they work now and how they might work in the future. And you’ll be far more likely to help them make both incremental improvements as well as step functions.”