dbt Labs Tackles Data Project Complexity with Mesh at Coalesce
Adoption of dbt has skyrocketed since it was launched just seven years ago, and today, more than 30,000 organizations use the open source software in production. The complexity of data projects has also increased, with 5% of dbt users managing more than 5,000 models. To address the threat posed by that growing complexity, dbt Labs today launched a new paradigm called dbt Mesh at its annual user conference in San Diego, California.
About a year ago, dbt Labs CEO Tristan Handy wanted to make a change to the company’s internal analytics, so he fired up dbt to initialize his development environment. He was a bit surprised by what he found.
“First thing I noticed, holy s&%t, there’s a lot more models than there used to be! It’s been a while since I’ve been in the dbt code,” he said during his keynote address today at the Hilton Bayfront hotel on San Diego Bay. dbt Labs itself has more than 1,000 models in its project, which puts it in the middle of the pack in terms of complexity.
The second thing Handy noticed was getting up and running in this new more complicated code base was harder. “In a lot of ways, developing this larger project felt exactly the opposite of what running dbt code in the early days felt like,” he said. “It felt slow, unwieldly, unpleasant. I didn’t love it.”0
Handy poked around, and discovered that many dbt projects had grown considerably. In the beginning, a large dbt project involved maybe 150 models, which is the core unit of work in dbt. Instead of building transformation pipelines, like traditional ETL tools, dbt users define the data transformations they want to execute in a data model, and that model is then executed to output a single database table.
But over the years, the number of models grew. A few years ago, 500 models was a large dbt project, but today, 1,000 models isn’t uncommon. According to Handy, 5% of dbt users have more than 5,000 models in play. Handy had happened upon a growing problem and was determined to find a fix.
“The size of the dbt code base isn’t, in and of itself, a problem,” Handy said. “Software engineers maintain larger, more sophisticated code bases all the time. It’s not how many lines of code. It’s complexity. The question is how do we flatten this line? How do we make our dbt investment scale without having complexity spiral out of control?”
The complexity problem is a product, in part, of dbt Labs’ massive and sudden success. Handy and his co-founder created dbt as an abstraction layer atop SQL to address the data transformation needs of Fishtown Analytics consulting clients who wanted to analyze data in AWS Redshift. Enabling data analysts to do the data transformation and prep work that would normally have required a skilled data engineer has worked out well.
“The initial ideas behind dbt apparently resonated,” Handy said. “I never would have anticipated seven years ago that dbt has shaped what it looks like to do data work today.”
The reward for democratizing data access among smaller companies that didn’t have a small army of data engineers apparently was being invited by bigger companies to do more and tougher data work. Handy found a solution to this problem in the same place that all of dbt’s best ideas come from: “Just steal it from software engineers,” he said.
“Software engineers have tools to do exactly this,” Handy said. “With the right software architecture, more mature systems are actually easier to work with, not harder. They have great APIs, great documentation, and great developer experience.”
dbt Labs has been working on this problem for the last year, and today rolled out the product of that work: dbt Mesh.
“dbt Mesh isn’t a single feature,” Handy said. “It’s a way of architecting your dbt DAGS [directed acyclic graphs] and workflows. It enables decentralized ownership without losing visibility and control. It borrows ideas from, but is not the same as, data mesh.”
Originally spearheaded by Zhamak Deghani, data mesh is a data paradigm that enables teams of data professionals to work independently, but in a federated manner. Data isn’t centralized in a data mesh, but the core data management and data governance principles are shared across the team. Handy said he has gotten to know Deghani–both of whom are 2022 Datanami People to Watch–over the years and appreciates her insights into data management and data access.
In dbt Labs implementation of a mesh, interfaces are openly declared between contributors inside of dbt, which determine the model access levels, model contracts, and model versions. dbt Mesh “natively support dependencies across projects, which allows each domain team to own their own data products,” the company says.
Ultimately, the mesh concept will “democratize ownership by allowing every team to own and contribute to their own data products instead of requiring a single, monolithic dbt project for the entire organization,” dbt Labs says. dbt Mesh is available now as a preview within dbt Cloud.
The Philadelphia, Pennsylvania company made several other announcements at Coalesce 2023 today.
For starters, it launched dbt Explorer, which is essentially a data catalog for the contents of dbt projects. The company says dbt Explorer fits into the data mesh paradigm by making it easier for distributed teams of data analysts, or analytic engineers, to discover and understand their data and dbt assets.
It also launched Cloud CLI, a command line interface that complements the graphical IDE that’s accessible from a browser. The CLI is targeted at more advanced users who want access to dbt Cloud capabilities from the comfort and convenience of their preferred terminal interface or IDE. Cloud CLI allows users to access their data teams’ dbt asetts but without the hassle and inconvenience that comes with configuring, authenticating, and maintaining a local version of dbt.
Lastly, the company announced the general availability of its semantics layer, which enables organizations to centrally define their business metrics in dbt and then query them from any integrated BI and analytics tools, including Tableau, Google Sheets, Hex, and Mode. The semantics layer a result of dbt Labs acquisition of Transform in February.