It’s both simple and complex at the same time, and it’s rapidly gaining in popularity. Five years in the making, the release of dbt Core v1.0 in late 2021 was a major milestone for the product.
The maintainers of dbt (data build tool), dbt Labs, wanted to signal to the world that dbt Core had reached the point where its stability and maturity should inspire confidence for companies to depend on it for their data infrastructure.
The company has the numbers to back up that claim. They boast 150+ contributors, 5,000+ commits, and 8,000+ projects that run dbt every single week!
As explained by the company itself, dbt is a development framework that “enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.”
As of this writing there are three main ways to run dbt:
The main idea is that dbt manages your transformations in an ELT stack. At a high level, ELT is the process by which you:
1. Extract data from a source system
2. Load data as is to a (raw/source/staging) schema in your data warehouse
3. Transform the data inside the data warehouse using SQL
For example, from our own Onebridge data warehouse, here is an example of some transformation logic:
We use SQL to do data transformation tasks: renaming, filtering data, joining tables, and creating business logic. In dbt speak, this is a “model.” You can instruct dbt to turn this statement into a table or view.
A common comment I hear is, "It just executes SQL. What’s so special about that?” For one model, it seems like overkill. Where dbt shines is when you have hundreds of models spread across multiple environments. The ability of dbt to organize, manage, test, and document these SQL transformations means dbt is as good as they say.
In future articles, we’ll show how Onebridge is utilizing dbt to manage our own data warehouse. Stay tuned.