Data Science Success All About the Models, Domino Says
Forget about being “data driven.” What you really want to be is “model driven,” according to the CEO of Domino Data Lab, which today unveiled its new vision for elevating the predictive model as the single most important asset driving success in data science organizations.
Nick Elprin co-founded Domino Data Lab with two colleagues, Chris Yang and Matthew Granade, at the height of the big data boom in 2013. With experience working as quants in the financial services industry, the founders were eager to build a platform that could help organizations build systems to leverage their data to gain a competitive edge, no matter the industry.
At first, Domino focused on lowering the barrier separating data scientists from utilizing parallel computational infrastructure. At some point, it started calling its product a “data science platform” that eased the development and deployment of predictive models that employ machine learning algorithms. Greasing the wheels for collaboration and ensuring the reproducibility of models among data scientists was the next big thing for Domino.
Somewhere along the way, the company’s leaders felt the message and the goals were missing the mark. They witnessed how companies in the wider world were not getting good returns on their data science investments, despite hiring expensive data scientists who could wield the latest algorithms on massive amounts of data collected.
So about a year ago, the leadership took a step back to think about what was going wrong. An answer began to emerge, and it revolved around models. “My assessment is they’re often missing the key observation of what makes data science so different and so powerful,” Elprin told Datanami. “Our view on that is a notion of the model as the core asset, the principle output of data science work.”
It’s the treatment of models – not how data, software, or hardware are utilized – that differentiates the organizations that are the most successful at data science from the ones who are spinning their wheels, according to Elprin. The CEO cited a 2017 McKinsey survey that found only 20% of companies were making extensive use of models, but were experiencing profits 10% higher than those who don’t.
“What’s going to become clear over the next few years is being model-driven, as opposed to being data-driven, is going to become the new currency of competitive advantage,” he said. “Companies that are successful in getting value of data science treat models as a new type of business asset.”
With that insight in mind, Domino went back and looked for patterns in how its own customers were putting models to work. The company identified what works for data science and could become best practices, and patterns of behavior that ultimately led to failure.
A key insight emerged: the most successful companies treat models very differently than how they treat data and software. “The way that they build models, develop models, deploy models, manage and have governance around models, and the way they create technology infrastructure systems to support models — those are very different than what they’ve done for past eras around setting up systems for data or setting up systems for software,” Elprin said.
That led Domino to identify three critical properties of models that make them distinct form other types of business assets.
First, the raw materials that data scientists use to create models are different from other business assets. “Models require computationally intensive algorithms and that’s why you see this growing need for elastic scalable compute, for specialized hardware like GPUs,” he said. “Those are things that software engineering teams don’t normally need, and also that data platform and data teams don’t really require.”
Another critical raw material for model development is the open source ecosystem. “There are new tools, new packages, or updated packages coming out every day around, especially around Python and R,” Elprin said. “If a company is trying to compete and have the best most innovative models, they need ways to have data scientists have access to that very rapidly evolving ecosystem without stifling their flexibility.”
The second property that differentiates models from other types of software development is the process. “Models are inherently built in a research process, and a research process is experimental, emergent, and exploratory, and that’s very different from software and it’s very different from data ingest systems,” Elprin said.
A data science team developing models might try hundreds of ideas before finding one that works, and that’s just fine, but it does create different requirements. “Teams developing models need different capabilities to facilitate rapid experimentation and rapid exploration so they can drive breakthroughs,” Elprin said. “In software, it’s about de-risking and driving to clarity of requirements. In models, it’s about rapid experimentation, trying many ideas as fast as possible.”
The third critical property of models to keep in mind is how they behave. In software engineering, there is typically a specification that developers aim for, and tests that can confirm whether the spec has been met. There’s nothing of the sort when building predictive models.
“Models are probabilistic. They don’t have a correct answer. They just have better or worse answers when they’re alive in the real world,” Elprin said. “What that means is organizations need new ways for quality control, monitoring, governing, and reviewing models to ensure safety and expected behavior.”
When you add it all up, the requirements for succeeding with models point to the need for a dedicated discipline, Elprin said. The current product category that encompasses this idea – the data science platform – is insufficient for the emerging requirements around data science models. “Either how people define data science platform needs to change or I think people are going to talk about something much broader, like a model management platform,” he said.
Domino is talking about its new Model Management framework today at Rev, its annual conference that’s taking place this week in San Francisco. While the company hasn’t yet shipped any new bits that improve the management of models in its flagship data science platform, conference attendees will see demos of new functionality that’s coming down the pike, Elprin said. “We have a bunch of stuff that’s in development,” he said.
For now, Domino’s framework is mostly words and ideas, but over time it will be fleshed out with real capabilities that let organizations treat models as the critical business assets that Domino leaders believe they are. “The next era of winners in business are going to be companies that figure out how to be model driven, how to put models at the heart of their business,” Elprin said. “That’s what model management is all about.”