Inside AutoTrader UK’s Data Observability Pipeline
In the course of shifting its analytics estate to the cloud, AutoTrader UK has adopted many new tools and technologies, including BigQuery, Looker, and dbt, which have helped to democratize data access among users. Along the way, the company slipped a data observability solution into the stream to ensure that its data doesn’t slide off the road.
AutoTrader UK started life in 1975 as a magazine publisher for classified advertisements for cars, trucks, and other vehicles. For decades, whether you were buying or selling a new or used vehicle, AutoTrader UK (which is a separate entity from its U.S. counterpart) was where you turned to tap into the marketplace.
Over the years, the Manchester-based company has retained its position as the largest marketplace for cars, but its business model has changed substantially. For example, the print publication is no more, and all the listings are now posted online. It has been good for the publicly traded company, which recorded £368.9 million ($496 million at today’s exchange rates) in revenue last year and is a component of the FTSE Index.
The company, which employs about 1,000 people, has also embarked upon a technology overhaul, including migrating away from an Oracle-based data warehouse that users queried with IBM Cognos BI tools. According to Ed Kent, principal developer in AutoTrader UK’s platform engineering team, the migration is all part of the modernization process.
“AutoTrader UK has had an aspiration for a while now to become fully cloud-based,” Kent says. “We want to decommission our on-premise systems and we’ve been at it for couple of years. One of the big remaining things [remaining] was the warehousing.”
The company elected to move the warehouse to Google Cloud’s BigQuery, and to adopt Looker as the primary BI and visualization tool that employees use to access it (Google acquired Looker for $2.6 billion in 2019, you will recall). It also brought in dbt, or Data Build Tool, a popular tool for automating data transformations as part of the extract, transform, and load (ETL) process.
One of the goals in overhauling the analytics estate was to enable more self-service on the part of AutoTrader UK’s internal and external users, Kent says. Before the transformation began five years ago, getting a new view of the data or a new dashboard or report would have required quite a bit of work.
“We had a centralized data team, and if you wanted some new report built, you would go to that data team,” Kent says. “You would explain what you wanted. They would handle everything from ingesting the data, modeling it, transforming it, building out the reports. And then they’d let you know when it was done.”
That approach no longer cuts it for AutoTrader UK, which, like many companies, is attempting to put data front and center in many more decisions than it was used in the past. That is especially true of company’s finance team, which was a big user of the data warehouse and the BI tools.
“The problem there is, it doesn’t scale,” Kent tells Datanami. “Everyone wants something based off data these days. Everything we do is data-driven. It’s got to have some backing based on real world data. And it simply doesn’t scale to have this one team that handles everything centrally.”
AutoTrader UK relied on new technology to help it build a more decentralized data estate. The combination of Looker’s data modeling language, LookML, as well as dbt were instrumental in helping the company to break its dependence on data centralization.
The dbt tool is used to automate the data transformation jobs that periodically run to extract data from source systems and load it into BigQuery. “In dbt, basically I define a data model, which is basically like a SQL statement, that defines how that table should be populated on the next run of dbt,” Kent says.
The company also has a team of data analysts who are shaping the data with LookML once it lands in BigQuery. This abstraction layer is critical to expanding access to data, Kent says.
“Once you’ve written the LookML, someone who’s less data-savvy can, in theory, go in and self-serve and they can start interrogating the data, asking questions, getting to know the complexities of what’s lying under the hood,” Kent says. “The way it’s presented means they can, in theory, self-serve what they need without having to go to an analyst.”
While more automation and more abstractions expand the pool of potential users and takes burden off the data team, it also brings more chances for data to go off the rails or to fall between the cracks. That is why Kent and the platform engineering team decided to bring the data observability solution from Monte Carlo into the picture.
“We had this proliferation of models, but with no real governance around it,” Kent says. “[We had] this vast, sprawling estate of models, and trying to retrofit hard-coded rules around data observability was really difficult.”
For example, if a customer data table that was designed to have one row per customer suddenly started having two rows per customer, that would indicate something has gone awry, Kent says. Or if one of the categories that each customer is attached to suddenly changes, that could be another indication of a problem.
“I could say, ‘I know this table should update every 24 hours. I know it should always have 10,000 rows in it.’ I can kind of manually write out rules like that,” Kent says. “That’s fine if I’ve got 10 or 20 models. If I’ve got several hundred, it becomes a lot harder.”
Monte Carlo’s data observability solution brings well-worn concepts from DevOps and SRE (site reliability engineering) disciplines and brings them to data, CEO and co-founder Barr Moses told Datanami earlier this year.
The Monte Carlo solution is based around what Moses dubs the five pillars of observability, including: the freshness, or the timeliness of the data; the volume, of the completeness of the data; the distribution, which measures the consistency of data at the field level; schema, relating to the structure of fields and tables; and lineage, or a change-log of the data. If the software detects any changes across any of the fields, it will generate an alert.
AutoTrader UK adopted Monte Carlo near the end of 2020, and has been relying on it to keep an eye on the data flowing into its analytics solutions. According to Kent, the software is flagging about 10 items per week. “Of those, some are genuine errors, some are false positives, some are interesting, but…not necessarily the fault of the data as such,” he says. “Some of this stuff may have gone unnoticed.”
With more users involved in data transformations via dbt and self-serving dashboards and reports via Looker, Monte Carlo serves as a sort of safety net to prevent errors from creeping into the pipelines. That’s been a real benefit for AutoTrader UK.
“We’re trying to move from this decentralized model…to provide relatively easy-to-use platform capabilities for people to build out their own data models,” Kent says. “Monte Carlo fits into that strategy quite nicely, so we can provide data observability capability as a platform-level capability rather than each team having to go manually implement something themselves.”