The 3 Roles Needed for the Modern Data Team
According to the McKinsey Global Institute, data-driven organizations are 23 times more likely to acquire customers, six times as likely to retain those customers, and 19 times as likely to be profitable as a result. As more companies are working to reap the benefits of becoming data-driven, the demand for effective data teams becomes more prominent. But while some companies’ data teams run like well-oiled machines, the majority of enterprise organizations are still figuring out how to get their data teams up and running.
One of the most common mistakes is not properly defining and differentiating the roles and responsibilities of data team members. Let’s take a look at how you can create the foundation for a data-driven culture — starting with your core team.
Defining the Roles
At the core of your team should be a data engineer, data analyst and data scientist. Early on you might have one person filling more than one of these roles. While the titles sound similar — and many job listings contribute to this ambiguity — there are some important distinctions between each:
Data engineer: Your data engineer — sometimes called an ETL (extract, transform, load) engineer — is responsible for moving and propagating access to data. Rather than analyze and interpret the data, their chief mandate is piping it to the right places.
Data analyst: Your data analyst should be focused on answering business questions using data. They know SQL and may be comfortable in a few other languages such as Python or R. This person effectively serves as the bridge between data and business insights. Generally, this is a person holds an advanced math or physics degree and exhibits an abnormal amount of intellectual curiosity and skepticism.
Data scientist: Oh, the data scientist. Much like big data, data science is the buzzword of the decade. While the term is often misused — companies often mistakenly cite the need for a data scientist when what they’re really looking for is an analyst — this role has a specific purpose: Their job is to build predictive models and automated classifications off of your existing data to help guide future decisions and predict outcomes. This person should have a proficient background in statistics and some coding chops to get those math functions implemented in their analytics.
How These Roles Work Together
To put it succinctly: Your data engineer reliably pipes data to a central location so your data analyst can answer existing business questions and discover opportunities for questions not yet asked. Your data scientist is then tagged in when a greater level of depth is required. The analyst will pass over a de-normalized (i.e., wide), cleaned data set, along with all the definitions and nuanced explanations of every dimension, as well as the business constraints to the data scientist.
Let’s look at a real-world example: I worked with a customer who was looking into the profitability of their platform. Previously, the cost and revenue side of the business were managed by separate teams and only joined at a very low level of granularity in VLOOKUPs living on the finance team’s desktops. The data was there, but nobody knew where to look for it or how to use it.
Before making any programmatic or process changes, the current data, when joined against calculated COGS (cost of goods sold) showed that a cohort of customers on a top-tier plan were actually losing the business over $30K per month! This prompted some further digging and surfaced a few scammers on the platform, which were expeditiously investigated and banned. This easy win was facilitated by an analyst’s curiosity and data that had been previously collected by the engineer who wrote the production code years prior.
This win garnered an appetite for testing the optimal price-point for legacy customers on grandfathered plans. The data team was up to bat again. Some immediate questions surfaced:
- How could we design a test that would reduce risk, while informing action?
- What data needed to be tracked?
- What cohorts of customers should be excluded?
- Which cohorts of customers needed to be prioritized?
- What backup plan existed to de-escalate any hairy situations?
I approached the problem using the data we had already collected. However, this data was purely descriptive. Who was paying how much? What discounts had been offered? How many customers could be affected? We needed an experiment that would track the response to a change in the ecosystem. We were looking for a price elasticity curve.
Until now, this was a one-man show — the business operations and data analyst was leading the charge. As complexity increased, an engineer was tapped in to alter process and add nuanced tracking. We needed more customer service interaction data — calls, chats, emails to customer support along with sentiment — and usage data — sessions, logins, cancellation attempts. The engineer-analyst power duo set up a cross-functional price-change test that included a customer hotline, a marketing-approved email blast, a product change, and a finance-approved revenue risk. Operational excellence was ensured by the head of customer success and several dry runs of the rollout. Though a data scientist was employed for a quick gut-check, he wasn’t really involved until the data collection portion was completed.
Once the data collection was completed, cleaned, and joined with relevant metadata, the data scientist was tapped in: What was the optimal tradeoff of average order value (AOV) versus conversion / churn? The usual complaints arose — The sample size isn’t large enough; There are too many dimensions; This data may include biases because of the collection mechanism and escape hatches. Despite these objections, a fairly accurate model was produced and action was then taken to help generate more revenue.
Nurturing an Effective Team
Given that these roles are so dependent on one another and there’s a somewhat linear relationship, should the data engineer manage the group, or should the data scientist manage them? Neither, actually. In my experience, bad things happen when one of these roles leads the others. The group should be flat, and ideally will report into a manager with some sort of technical background — the kind who can understand and articulate the tradeoffs between technical constraints and business needs.
Not all companies will be in a position where it makes sense — either strategically or financially — to bring all of these roles on board at the same time. If you’re one of these companies, you’ll probably want to focus on the engineer and analyst first. In most cases, a data scientist will be reliant on them to do their jobs. You’ll find that an engineer and analyst can go a long way, including setting up dashboards and tracking and detecting problems and opportunities.
Of course, once you start looking deeper, you’ll want to bring in a data scientist. This is the transition from What marketing channels are my best performers? to What is the spend-level of diminishing returns per marketing channel, including halo effect? Data science projects tend to have longer investigation and delivery times, with lower success rates but higher returns when they do succeed. While you can expect an analyst to turn around many reports in a day or two, a data scientist may need a few weeks to dig into a given problem.
Once you have your power team delivering, you need to immediately start thinking of their promotion path. When you were hiring, you hired for curious, skeptical opportunists — these are your proverbial truffle-hunters. They generate a ton of value, but are also in high demand. As of recent, there are tons of analytics and data science bootcamp graduates, but not many analysts and data scientists that can execute on business-critical tasks and run cross-functional projects.
Modeling this technical team’s career path after those designed for engineers is a natural adaptation. Early on, these folks can decide whether they would prefer to go down an individual contributor or management path, and this choice should neither hinder their potential for salary or growth. Historically, higher levels of the analytics career ladder have been dubbed “Manager” and “Director” whether or not they managed direct reports. This primarily reflected their proficiency in leading cross-functional teams and owning outcomes-based goals / arms of the business. Unlike the data science / data analyst title conflation, a conflation in rank seems mostly harmless.
Now you have a high-value team delivering to their full capacity and being appropriately recognized for their contribution. What’s next? Requests are going to start flooding in. You’re going to need to scale, fast. And sadly, cross-role unicorns that you originally staffed your team with are few and hard to find. This is the time for specialization.
One of my managers used to say, “We love scientists, not science projects.” What he was alluding to was the propensity of highly technical and intellectually curious employees to go off on long-running tangents. Mitigating this is easier with project managers on the team. The project manager will be responsible for interfacing with unclear requesters, documenting project plans, cleaning up tickets, educating data literacy and facilitating the efficient growth of the team, all while allowing your newly specialized roles to perform at their full capacity. What may have started as a three-person team tracking requests in a spreadsheet will quickly evolve into a technical arm of your company, along with the ticketing and issue-tracking you would expect from any other engineering organization.
Propagating a Data-Driven Organization
When you have just one data engineer, data analyst and data scientist, it’s easy to understand how they’ll fit into and serve the larger organization. What about when you have 20 of each and some support PMs (a problem I’m sure many of you wish to have one day!)?
Even when your team is just three, you’ll find that your business users do not share an equal appetite for data. Some departments will have very few requests (which can be troubling), while others will always have dozens of requests in the queue (many of which may not lead to any action after receiving the data), and will be very vocal in making sure their work gets serviced. Typically you’ll find these requestors sit in sales or marketing — functions that tend to be closer to revenue or spend.
As your team grows, you should continue to pool resources — at least at first. Using this approach, your team can stay central while rotating between business units as requests are made. There are two big advantages to this: 1) It keeps your analytics and reporting structure and definitions standardized, or at least it limits the opportunities for siloes to emerge; 2) letting your team rotate exposes them to different types of business questions, helping them build relationships and evolve a holistic understanding of the business.
Over time, members of your team will start to specialize, and some of your more active business units — or underserved parts of the organization like sales operations — will need dedicated resources. At this point, some data teams will start to shard; while there’s still a central analytics department, individual team members may end up embedded directly within these business units.
When this inflection point occurs, it’s important to think through how your team’s personal development and review cycles will be managed going forward, along with how to maintain data standards. The last thing you want to end up with are recalculated fields; e.g., having sales and marketing maintain different definitions of a lead.
Establishing a data analytics function is an investment in your company’s future, and not a decision to rush into. Be sure you have a plan for hiring, growth and functional immersion to get the most benefit from this team.
About the author: Leon Tchikindas has been working in analytics for more than seven years, helping businesses grow sales revenue, increase product engagement, and amplify marketing ROI. He currently serves as Head of Analytics and Business Operations at Periscope Data where he has empowered more than 800 companies in building a data-driven culture.