What Is An Analytics Engineer and When Do You Need One?
A new persona is starting to make the rounds in the big data field. It’s called an analytics engineer, and depending on your data workflow and the size of your team, it could help you speed up your advanced analytics efforts.
Achieving success with big data is usually the result of a team effort. It’s hardly ever a one-man or a one-woman show. But as data changes and technology improves, the roles that people play in the big data game also shift.
That is the dynamic we’re seeing now with the rise of a new big data persona called the analytics engineer. According to Anna Filippova, the director of community and data at dbt Labs, an analytics engineer is somebody who organizes the data warehouse so other people can query it easily.
“The idea behind an analytics engineer is a recognition that it’s important for a data team to have someone who’s focused on creating meaning and structure out of data,” Filippova says. “It’s producing data almost as a product, defining core tables in a company that should be very high quality that everybody should know how to use, coaching sessions, and teaching people how to work with SQL, how to work with those data sets — things like that.”
In other words, the analytics engineer role emerged when it became evident that dbt was automating much the work that the data engineer previously did manually or by writing scripts, according to Filippova.
“They also call themselves analytics engineers because they’re basically applying software engineering best practices to the art of analytics, and so they call themselves analytics engineers,” she says.
A quick search of job boards at Indeed and Monster does not show a large number of analytics engineer jobs open at the moment. In some cases, the search engines returned results for data engineering jobs. To some extent, dbt Labs is leading the curve here.
Filippova came to the analytics engineering profession by a circuitous route. Before joining dbt Labs, she was working on a data research team at GitHub, and became frustrated with the haphazard way the data integration tasks were being conducted.
“I loved helping people make decisions,” she tells Datanami, “but I was one of those people who realized that it was really hard to do that when all of your data is incredibly messy, and I can see everyone making copies of each others’ scripts and doing things really, really inefficiently.”
So she took things into her own hands. She went to her manager and said she’d like to spend time organizing the various data transformation scripts people were using in a bid to improve the efficiency of the data analyst team. Her boss agreed, and thus was born the analytics engineering team at GitHub. And when somebody sent her an article that described what she was doing as analytics engineering, she accepted the title. Eventually, she decided to go to work for the company doing the most to enable analytics engineers, and that’s how she ended up at dbt Labs.
Many analytics engineers use dbt to perform data transformation tasks, she says. The company formerly known as Fishtown Analytics, as well as the dbt community, recommends starting a data team by hiring an analytics engineer, “and then do a fast follow by hiring an analyst, as opposed to a data engineer,” she says.
Now that the modern data stack is automating so much of the data integration work that was previously done manually, the data engineer’s job description is starting to change. In her previous job, data engineers were more focused on keeping the on-prem systems running. They largely left the data modeling to the analytics engineers.
“They were mostly carrying pagers and making sure that things didn’t collapse,” Filippova says of the data engineers at GitHub. “They were also far from what the business needed, problems that the business had, so it was difficult to go out and build a data model that would solve for that.”
Identifying oneself as an analytics engineer “is usually synonymous with being a dbt user,” Filippova says, “although not necessarily the case.”
The tool formerly known as Data Build Tool certainly is popular. In a year, its Slack channel has grown from 15,000 to more than 32,000. The Philadelphia, Pennsylvania company was valued at more than $4 billion earlier this year following its Series D round of funding of $222 million.
The unlimited and affordable nature of cloud object storage has kicked off a tidal wave of data movement to the cloud–a data tsunami, if you will. The dbt tool has solidified itself as a key component of an emerging data stack serving these data warehouses. Other members includes ELT tools like Fivetran, Airbyte, and Matillion that help to extract data from source systems and load it into cloud data warehouses, with dbt serving as the transformation layer via automated SQL scripts developed using Jinja, a common templating language used in the Python ecosystem.
This setup is helping organizations not only move huge amounts of data for analysis in the warehouse, but also making it easier for analysts to get more out of the data they have moved. That’s the role of the analytics engineer.
“For a long time people used to [say], the more data you have the better your insights will be. Just throw more data at the problem. It will be fine,” Filippova says. “And it turns out it matters what kind of data, and it turns out it matters how clean that data is and how well-structured it is.
“Over time, more and more folks emerged that really cared about structuring and presenting data to the rest of the company in a way that can be much more useful,” she continues. “It was a recognition that people were doing a lot of duplicate work, that people were not using data to the best of its potential. And eventually those folks started calling themselves analytics engineers.”