Do You Need a Chief Data Scientist?
Data scientists are modern-day wizards who can turn digital coal into virtual diamonds. But data scientists are unique individuals with special talents, and organizations risk squandering those gifts if data scientists are managed like any other employee. Some organizations are finding that the individual best suited to manage data scientists is another data scientist, also known as the Chief Data Scientist.
To get the low down on the emerging job description, Datanami spoke with Ira Cohen, the co-founder and CTO of Anodot, which develops machine learning-based tools for understanding time-series data. Cohen wears many hats at Anodot (VP of fish care, anyone?), and one of the hats that Cohen wears is that of the Chief Data Scientist.
There is something special in the development of machine learning that’s different than other software development projects, Cohen says, and that uniqueness extends to the folks who develop it.
“The reason why you need a Chief Data Scientist in the first place is you need somebody who can bridge the gap between management and [the data scientists], and what machine learning can do and cannot do,” Cohen says. “You need somebody who understands what it is in a deeper form than a CTO, who might have broader knowledge of a lot of things, but not necessarily machine learning.”
Machine learning is a powerful tool, particularly when backed by large amounts of data. But the path from big data brainstorm to machine learning nirvana is not necessarily a straight one, and it often takes a trusted hand to navigate the data science team around potential pitfalls.
“You need to understand that data is the fuel of what you’re creating, and understand the non-deterministic risk of developing these capabilities,” Cohen says. “There’s a gap between what people expect and what can actually be achieved [with machine learning], or the risks of whether you can achieve it or not. And the Chief Data Scientist is the one who can bridge that gap.”
Going Down Rabbit Holes
Data scientists are researchers at heart, and they need to be given space to explore different problem sets and possible data-driven solutions. But there’s a fine line between giving data scientists enough room to find novel solutions, and wasting the precious resource that is a data scientists’ time. It’s up to the Chief Data Scientist to enforce that line, Cohen says.
“When you’re a researcher, it’s very easy for you to go down rabbit holes. You go down lots of rabbit holes,” Cohen says. “I’ve had to pull my people out of rabbit holes. This is part of what we do. ‘You’ve done enough, pull out. If we have time, we’ll go down that rabbit hole again. But let’s move to the next hole.’”
Too many organizations let their data scientists spend too much time with their heads buried in rabbit holes that end up petering out, Cohen says. Figuring out the right balance between letting data scientists explore these holes, and also selecting which holes to look in, is one of the main job duties of the Chief Data Scientist.
“On the one hand, you need to be patient because you do need to let people go down some of these holes, because magical things happen when you do,” Cohen says. “When you start something new, you don’t know the solution, so you have to do research. That’s the way you develop these capabilities. You try out a lot of things.”
On the other hand, there can be a lot of time and money wasted going down rabbit holes. “You can’t imagine how many times I’ve heard from people who hired a data science team and PhD level experts, and after a year, they have nothing to show for what they did, because they went down lots of rabbit holes and tried to make everything perfect and never achieved results,” Cohen says.
The devil is in the details when it comes to applying machine learning correctly. Depending on the specific use case, there may be dramatically different machine learning capabilities brought to bear. Sorting through these details, and figuring out how best to attack the problem, often requires the services of someone with experience in the matter – and that somebody is often the Chief Data Scientist.
Take churn prediction, for example. Predicting which customers are likely to stop being customers is one of the most common big data use cases. We’ve been writing about churn predictions for the better part of a decade here at Datanami. But there are a variety of ways to chop this problem up, and there are different levels of technical difficulty associated with each one.
“If you think you can prevent the churn in real time…that’s one product. And we have a whole set of requirements” to attack that problem, Cohen says. “Or maybe you think you should predict potential churn for each customer once a month. That’s a completely different product with a completely different set of machine learning requirements.”
If you had a Chief Data Scientist on the payroll, you would task her with understanding the scope of the data science work, and which approaches might work and which ones probably won’t. She would understand the technical and business tradeoffs in taking one approach over the other.
“When we talk to a CTO, where all he knows is some blogs he read and some higher-level sound bites, the problem may seem simpler to him and he might think, ‘Oh, I can get my data scientists to work on this and they’ll be able to do something similar,’” Cohen says. “Having somebody who understands the material deeply, including what the capabilities are and how to evaluate capabilities and so on and so forth – that really makes the difference.”
Build Vs. Buy
The specific role that a Chief Data Scientist plays depends on how the organization is applying data science, and where it falls on the build-versus-buy spectrum. Here, it’s important to differentiate between an organization that is creating a for-sale product or service that includes machine learning as a core feature, or whether it’s looking to use machine learning or data science capabilities for a product or service that’s used internally.
Anodot, which creates and sells software that uses machine learning models to analyzing time-series data, is a good example of an organization building an external product with machine learning as a core feature. Cohen leads a team of data scientist in building all of the machine learning capabilities that are available in the Anodot product.
On the other hand, there are organizations that are using machine learning capabilities to create a product that is used internally, or for data science services. In these types of organizations, the Chief Data Scientist, with her deep experience, is best equipped to answer these tough questions, Cohen says.
“I think companies should build it themselves if they’re going to sell it, or if it’s a mission critical application,” Cohen says. “But it has to be mission critical. Otherwise, why bother?”
Experience Matters with Ethics
It’s critical that a Chief Data Scientist has experience with machine learning, either building real-world systems or thinking deeply about them as a researcher, Cohen says. You wouldn’t hire a Chief Technology Officer who wasn’t well-versed in technology, and the same goes for a Chief Data Scientist.
That doesn’t necessarily mean the Chief Data Scientist has their hands on the keyboard, however. “In principle, the role is a manger,” Cohen says. “But a lot of times, just like you see CTOs coding and other C- level engineering types still doing stuff with their keyboards, you would see that as well. It’s important, but it’s not critical. His or her main role is not developing stuff. It is to manage and apprise.”
Navigating ethical concerns around machine learning is another part of the job. Just as the Chief Data Scientist will be called on to translate to the C Suite what machine learning is and is not capable of, the C Suite will rely on the Chief Data Scientist to make critical decisions around what is an ethical use of machine learning and what is not.
Every data scientist should be aware of concerns around ethics, Cohen says, but the Chief Data Scientist will be relied upon to make the tough call when the rubber meets the road.
“The ethics part has to be driven by the Chief Data Scientist, because again, he or she understands the deeper capabilities and doesn’t get fooled by bul****t ideas,” he says. “Sometimes, if you don’t understand it, you might think that it’s doing a lot more than it was doing, while it’s actually quite benign. But if you know deeper, you can evaluate whether it’s doing something unethical or not.”
This experience is even more critical when using deep learning algorithms, which may not have the same level of explainability as more traditional machine learning algorithms. Just because you don’t know exactly how a black box algorithm made a recommendation does not mean that it can’t be used for ethical reasons, Cohen says.
“It’s really how you use it, what you feed it as inputs, and what outputs you expect from it–that’s what determines whether it would be ethical or not ethical, even if you don’t understand the insides of it completely or how it’s doing what it’s doing,” he says.
“In some cases, you do need to know, and that would also lead the Chief Data Scientist to say we cannot use these capabilities or these algorithms, we have to use these other algorithms that will allow us to go into the model and understand why it’s doing what it’s doing,” he continues. “You need that understanding.”
Not every organization that is adopting data science needs a Chief Data Scientist. But as the size and scope of data science activities increase, the benefits of having an experienced hand guiding the data science activities and providing good advice to the C-suite will become more apparent.