dbt vs Airflow: These Tools Are Not Competitors
dbt transforms data inside your warehouse. Airflow orchestrates workflows across systems. You almost certainly need both.
dbt
If you're choosing one tool to start a data stack, dbt is the more impactful choice for analytics engineers. It handles the transformation layer — which is where most data work happens — with excellent developer experience, testing, documentation, and lineage tracking built in. Airflow is essential too, but dbt addresses a more universal pain point.
What Each Tool Actually Does
dbt (data build tool) runs SQL transformations inside your data warehouse. You write SELECT statements, dbt handles the CREATE TABLE AS execution, dependency management, and testing. It's entirely SQL-based — no Python required, though Python models exist for complex transformations.
Airflow is a workflow orchestrator. It schedules and runs tasks across any system: databases, APIs, cloud services, machine learning pipelines, dbt runs. Tasks are Python functions in a Directed Acyclic Graph (DAG).
The confusion: both 'do data pipelines' but at different levels. dbt does T in ETL (transforms data). Airflow does E and L and orchestrates everything including T (like running dbt).
dbt's Killer Features
The ref() function: SELECT * FROM {{ ref('users') }} automatically resolves to the right table in the right schema, handles dependency ordering, and enables parallel execution. The dependency graph is automatic.
Built-in testing: unique, not_null, accepted_values, relationships — test your data with two lines of YAML. Test failures block deployment. This is table stakes for data quality.
Documentation: dbt docs generate builds a data catalog from your model descriptions. Every analyst on your team knows what fct_orders means and where the data comes from.
dbt Cloud ($50/seat/month) adds scheduled runs, a built-in IDE, and CI/CD integration. dbt Core is free and runs anywhere.
Airflow's Complexity Tax
Airflow is powerful and also genuinely complex to operate. The scheduler, web server, and workers run separately. Managing dependencies between Python environments for different DAGs is painful. Backfills (re-running historical data) require understanding execution_date semantics that confuse even experienced users.
Airflow 2.x improved significantly but the operational burden is real. You need someone who understands it to maintain it in production.
Managed Airflow alternatives: Astronomer (managed Airflow, $0.40/vCPU/hour), MWAA (AWS managed), Cloud Composer (GCP managed). These shift the operational burden but cost money.
Alternatives worth knowing: Prefect and Dagster are modern Airflow alternatives with better developer experience and Python-native DAG definition.
Running Them Together
The production data stack: Airflow orchestrates everything (ingest data, trigger dbt runs, notify on failure, run ML models). dbt handles the SQL transformation layer that Airflow calls into.
The dbt-airflow-provider package makes this clean: a DbtTaskGroup turns your dbt project into an Airflow task group, with each dbt model as a separate task that can be individually retried or inspected.
If you're small: dbt Cloud handles scheduling and CI/CD for the transformation layer. You might not need Airflow at all if your data sources support simple ingestion (Fivetran, Airbyte) and your transformations are pure SQL.
Quick Comparison
| Factor | dbt | Airflow |
|---|---|---|
| Primary Role | SQL transformation layer | Workflow orchestration |
| Language | SQL (Jinja-templated) | Python |
| Developer Experience | Excellent | Complex |
| Operational Complexity | Low | High |
| Cross-system Orchestration | No | Yes |
| Data Quality Testing | Built-in, first-class | You build it yourself |
| Data Lineage/Docs | Automatic | Not built-in |
The Verdict
Use dbt if: You need to transform data inside a warehouse. Every analytics team needs dbt or something like it. This is table stakes.
Use Airflow if: You need to orchestrate workflows across multiple systems — ingestion, transformation, ML training, notifications. Airflow is the Swiss Army knife.
Consider: Use both. dbt for the SQL transformation layer, Airflow (or Prefect/Dagster) to orchestrate it all. They're complementary, not competitive.
If you're choosing one tool to start a data stack, dbt is the more impactful choice for analytics engineers. It handles the transformation layer — which is where most data work happens — with excellent developer experience, testing, documentation, and lineage tracking built in. Airflow is essential too, but dbt addresses a more universal pain point.
Related Comparisons
Disagree? nice@nicepick.dev