Don’t Overlook the Importance of Data Stewards
Data scientists may be the rock stars of big data, and data engineers currently are in high demand. But companies that are serious about creating a winning data strategy should carefully consider what a well-trained data steward can bring to their organizations.
At a high level, data stewards are individuals who are responsible for ensuring that an organization’s data is managed and ready to use. They’re in charge of making sure data is clean, defined, and able to be used for downstream analytics or even AI use cases.
According to Monica Richter, the chief data and analytics officer at Dun & Bradstreet, data stewards are broadly defined with five main responsibilities.
As Richter explains:
- “Stewards should help you know what you have. They help you to inventory the data.
- They should help you know what it means. They need to be able to define the data.
- They should help you to know what databases are in your data lake. They have to be able to locate where to find the data.
- They have to help you know whether you can trust the data, the data quality, where it’s sourced from.
- Last but not least, they have to know what you can use it for. These are the data rights around it. These are potentially data regulations that might be important for that data.”
Of course, not all data stewards are alike, and they will bring different skills and have different responsibilities. Some data stewards work predominantly with corporate data, and perhaps may use a graph database to establish links between different groups. Other data stewards may work more closely in the data quality business, and be experts at using R or Python to build data cleansing routines.
Analise Polsky, a business solutions manager and thought leader with SAS, has another view of data stewards. “To me data stewards are the referees of the data world,” Polsky says in a video. “They have to know how the game is played. They have to understand the data itself.”
One important aspect of data stewardship is that the position can be a function of both business and IT. “They kind of sit in both worlds,” Polsky says. “Often what we really want is to see data stewards sit between data governance and data management.”
Stewardship of the Data
There aren’t as many job openings for data stewards as there are for other data roles, like data scientists, data engineers, and data analysts. While the responsibilities of data stewards have been defined for over a decade, there does not seem to be as much visibility into the data stewardship role, and that lack of visibility could impact demand for the position.
A quick search on the jobsite Glassdoor reveals only about 1,000 available data steward positions across the U.S, compared to about 29,000 for data scientists. At another job site, Indeed, there were about 7,000 open positions for data stewards, compared to about 33,000 openings for data scientists.
While the demand for folks with the “data steward” title trails other popular positions, that doesn’t mean that organizations aren’t employing data stewards. In many instances, a person may be performing the tasks of a data steward, but will have some other title.
“Most organizations have data stewards,” Polsky says. “You have the go-to people for your data, and they’re often your de-facto data stewards. What we want to do is really formalize the role and say, ‘Here are the key responsibilities and here’s the role we want to them to have.'”
D&B’s Richter – who oversees 16 inventory data stewards and 14 corporate data stewards — sees a similar trend. Companies that are in the information services industry, like D&B, are much more likely to have formalized roles for data stewards. Companies in adjacent and heavily regulated industries, like financial services, are also much more likely to have data steward positions.
“But if you’re in sports equipment, you might be using a data steward more for your corporate data,” Richter tells Datanami. “They’re helping you with your customer data, perhaps, or the services that you have. But they don’t have the need like Dun & Bradstreet would have for our inventory data, or they might be using more aggregate roles, where somebody might be doing partly data stewardship and partly doing data operations or data quality. They might have more of a combined role.”
Being flexible is a key job description for folks who are doing data steward work, whether or not they have a formal data steward title. According to Richter, one of the most important aspects of the data steward position is the ability to play both offense and defense.
That’s because a data steward must not only defend against the regulatory and reputational risks that are associated with potentially mishandling sensitive data, but she also must use her data management skills to leverage data for her company’s competitive advantage.
“For me, it’s two sides of a very important coin,” Richter says. “You must be able to mitigate the risks and have policies in place for well-defined data. But you must be thinking about how to liberate the insight of that data, and you must have the quality and you must have the actual utility of the data still front and center.”
It’s often the case that data stewards will shift from an offense to a defensive alignment depending on the organization’s current priorities. For example, when D&B was deep into General Data Protection Regulation (GDPR) remediation, the data stewards were in a defensive alignment.
“When we were doing the GDPR work, those stewards were, all across the board, heavily on the left side of the railroad track, on the risk-mitigating aspect of their stewardship role,” Richter says. “And now we’re GPDR compliant….we’re really looking at that monetization, the right-hand track.”
As an organization gets a handle on its data and the overall data environment matures, its aspirations also naturally move from a defensive-oriented position to a more offensive one. The data steward’s role also progresses with time, Richter says.
“When you have it managed and it’s defeined and you have those five knowledge area in place,” she says, “the stewards start to work much more closely with the business unit in the creative space and the innovation space about now what can we do with the assets.”
A Passion for the Data
The relationship between data scientists and data engineers has been well-explored. But the potentially successful connection between data scientists and data stewards shouldn’t be overlooked, Richter says.
“The data scientist can be brought to their knees if the data is not stewarded well,” the D&B CDAO says. “You’re not utilizing the full capacity of your data scientist unless you have the data stewards already getting the data ready across these five knowledge areas.”
If the data steward is doing his job to catalog and define the data and ensure that high quality, then the data scientist’s job is made much easier. At D&B, data stewards and data scientists often work in tandem, Richter says.
“Normally would be thinking about this as a buddy system, when you have a stewardship network that is really utilizing its power to get the data in the best shape possible,” she says. “The data scientist comes in and has a much easier time, and higher quality output, because of that fabric of a buddy system between a great data steward and a great data scientist.”
Depending on their specific role, some data stewards may be more technically inclined than others. If they’re using tools to build data pipelines, they may even be in a hybrid data engineer/data steward role. The qualifications for succeeding as a data steward would dependent on the specifics of the job.
But according to Richter, the single most important requirement for succeeding in data stewardship is having a passion for the data. “It’s a multi-faceted individual who, first and foremost, has a passion for the data,” she says. “They must come in as a data-inspired individual. They must be inherently curious about what we are actually searching for.”
Once folks have mastered the data steward’s role, they have several options in terms of their career. They could upskill themselves into data scientists. Or they could follow a more management-oriented track and potentially become chief data officers.