It doesn’t matter if it’s a stored procedure, a pandas pipeline, or a DAG full of Python operators. The real problem is the same across all of them: the whole thing was built to run, not to be looked inside.
April 2026
What people think
Scripts work. That’s the thing. Most teams have been running them for years without serious issues. The pipeline runs every night, the data lands, dashboards update. The standard argument for moving away is usually about the code. It’s messy, not in version control, hard to test.
All of that’s true. But it frames the problem as a code cleanliness thing. Like if you just reorganized the logic and added some Git commits, you’d be fine.
There’s another dimension to it. If you were leaving a list of chores for someone, you’d write do the dishes and do the laundry. Not a 50-step breakdown of every micro-action involved. The high-level tasks are readable. Someone new can follow them. But hand them a dense multi-page list that breaks every action down to its individual components, and then multiply that across a large house, or a whole maid service running dozens of houses. The overhead of understanding what’s happening swamps the actual work. That’s what a script ends up feeling like at scale. Not just messy code. A wall of sequential instructions where you have to read all of it before any of it makes sense.
The actual problem is different. It’s not in the code at all.
The tension
The real problem shows up when something breaks. More specifically: what happens right after it breaks.
Whatever form your script takes — SQL procedure, Python file, DAG — it runs a series of steps, produces a final output, and ends. The intermediate state lives only as long as the process does. When the session ends or the process crashes, it’s gone. There’s nothing left to look at. To debug, you start the whole thing over from the top.
Think about household chores. You’re doing laundry, dishes, vacuuming, and cooking. You open the dryer at the end and your clothes are still damp. You wouldn’t respond by going back to recook dinner, rewash the dishes, and revacuum. A problem in one place doesn’t erase everything else. But that’s exactly what a script forces you to do. The process ends, the intermediate state vanishes, and a failure at step 4 wipes out steps 1, 2, and 3 right along with it. Every debug attempt is a full restart.
One failure wipes out every completed step, including the ones that worked fine.
pick your situation
CREATE OR REPLACE PROCEDURE build_customer_order_summary() AS $$ BEGIN CREATE TEMP TABLE tmp_customers AS ⚠ temp — gone when session ends SELECT customer_id, TRIM(LOWER(email)) AS email, COALESCE(first_name,'Unknown') AS first_name, COALESCE(last_name, 'Unknown') AS last_name FROM raw.customers; ⚠ hardcoded schema — rename breaks silently CREATE TEMP TABLE tmp_orders AS ⚠ temp — same problem SELECT order_id, customer_id, order_date, total_amount FROM raw.orders WHERE status != 'cancelled'; ⚠ hardcoded schema CREATE TEMP TABLE tmp_payments AS ⚠ temp — same problem SELECT payment_id, order_id, amount AS payment_amount FROM raw.payments WHERE amount > 0; ⚠ hardcoded schema CREATE TEMP TABLE tmp_orders_with_payments AS ⚡ must wait for steps 2 & 3 — serial only SELECT o.*, SUM(p.payment_amount) AS total_paid FROM tmp_orders o LEFT JOIN tmp_payments p ON o.order_id = p.order_id GROUP BY o.order_id, o.customer_id, o.order_date, o.total_amount; CREATE OR REPLACE TABLE analytics.fct_customer_order_summary AS ⚠ full overwrite every run SELECT c.customer_id, c.email, COUNT(op.order_id) AS lifetime_orders, COALESCE(SUM(op.total_amount),0) AS lifetime_revenue FROM tmp_customers c LEFT JOIN tmp_orders_with_payments op ON c.customer_id = op.customer_id GROUP BY c.customer_id, c.email; -- Session ends. All 4 temp tables dropped. Nothing to inspect. ⚠ debug from scratch END; $$ LANGUAGE plpgsql;
One file. Five steps. Whether it succeeds or fails, the session ends and there’s nothing left to look at.
import pandas as pd from sqlalchemy import create_engine engine = create_engine("postgresql://...") # Step 1 — pull data to this machine ⚠ data transfer — grows with table size raw_customers = pd.read_sql("SELECT * FROM raw.customers", engine) ⚠ hardcoded schema raw_orders = pd.read_sql("SELECT * FROM raw.orders", engine) ⚠ hardcoded schema raw_payments = pd.read_sql("SELECT * FROM raw.payments", engine) ⚠ hardcoded schema # Step 2 — clean in memory ⚡ DataFrame — gone if process crashes clean_customers = raw_customers.copy() clean_customers["email"] = clean_customers["email"].str.strip().str.lower() clean_customers["first_name"] = clean_customers["first_name"].fillna("Unknown") clean_customers["last_name"] = clean_customers["last_name"].fillna("Unknown") # Step 3 — filter orders in memory ⚡ not queryable after crash clean_orders = (raw_orders .loc[raw_orders["status"] != "cancelled", ["order_id", "customer_id", "order_date", "total_amount"]]) # Step 4 — filter payments in memory ⚡ serial — must wait for step 3 clean_payments = (raw_payments .loc[raw_payments["amount"] > 0] .rename(columns={"amount": "payment_amount"}) [["payment_id", "order_id", "payment_amount"]]) # Step 5 — join pmt_by_order = clean_payments.groupby("order_id", as_index=False)["payment_amount"].sum() orders_w_pmts = clean_orders.merge(pmt_by_order, on="order_id", how="left") # Step 6 — aggregate and write back summary = (clean_customers ⚠ full replace every run .merge(orders_w_pmts, on="customer_id", how="left") .groupby(["customer_id", "email"], as_index=False) .agg(lifetime_orders=("order_id", "count"), lifetime_revenue=("total_amount", "sum"))) summary.to_sql("fct_customer_order_summary", engine, ⚠ data travels back to warehouse schema="analytics", if_exists="replace", index=False) # Process ends. All DataFrames dropped. Nothing survives a crash.
One file. Data travels to your machine, gets transformed in memory, travels back. Nothing survives if the process dies mid-run.
from airflow import DAG from airflow.operators.python import PythonOperator from airflow.providers.postgres.hooks.postgres import PostgresHook from datetime import datetime def run_customers(**ctx): PostgresHook("warehouse").run(""" CREATE OR REPLACE TABLE staging.customers AS SELECT customer_id, TRIM(LOWER(email)) AS email, COALESCE(first_name,'Unknown') AS first_name ⚠ hardcoded schema in each function FROM raw.customers; """) def run_orders(**ctx): PostgresHook("warehouse").run(""" CREATE OR REPLACE TABLE staging.orders AS SELECT order_id, customer_id, order_date, total_amount FROM raw.orders WHERE status != 'cancelled'; """) def run_payments(**ctx): PostgresHook("warehouse").run(""" CREATE OR REPLACE TABLE staging.payments AS SELECT payment_id, order_id, amount AS payment_amount FROM raw.payments WHERE amount > 0; """) def run_orders_with_payments(**ctx): PostgresHook("warehouse").run(""" CREATE OR REPLACE TABLE intermediate.orders_with_payments AS SELECT o.*, SUM(p.payment_amount) AS total_paid ⚡ deps declared in DAG, not in this SQL FROM staging.orders o LEFT JOIN staging.payments p ON o.order_id = p.order_id GROUP BY o.order_id, o.customer_id, o.order_date, o.total_amount; """) def run_summary(**ctx): PostgresHook("warehouse").run(""" CREATE OR REPLACE TABLE analytics.fct_customer_order_summary AS SELECT c.customer_id, c.email, COUNT(op.order_id) AS lifetime_orders, COALESCE(SUM(op.total_amount),0) AS lifetime_revenue FROM staging.customers c LEFT JOIN intermediate.orders_with_payments op ON c.customer_id = op.customer_id GROUP BY c.customer_id, c.email; """) with DAG("customer_order_summary", start_date=datetime(2024,1,1), schedule="@daily") as dag: t1 = PythonOperator(task_id="stg_customers", python_callable=run_customers) t2 = PythonOperator(task_id="stg_orders", python_callable=run_orders) t3 = PythonOperator(task_id="stg_payments", python_callable=run_payments) t4 = PythonOperator(task_id="int_orders_w_payments", python_callable=run_orders_with_payments) t5 = PythonOperator(task_id="fct_order_summary", python_callable=run_summary) [t1, t2, t3] >> t4 >> t5 ⚠ lineage lives here, not inside the SQL
Looks structured. But the dependencies live in the DAG definition, not the SQL. Add a new model and you update two places. The SQL still uses hardcoded table names a schema rename can silently break.
The obvious fix
The natural response depends on where you’re starting. If you’re on stored procedures, you break them into smaller procedures and chain them together — each one writes a real table. If you’re on a Python script, you add Airflow and turn each step into a task. If you’re already on Airflow, you add more structure: better retry logic, SLAs, sensors.
All of those help. But they leave the harder problems untouched.
Think about dividing chores among housemates. Person 1 does laundry, Person 2 folds, Person 3 cooks, Person 4 does dishes. Each person finishes their job and leaves the result. That’s an improvement over one person doing everything in one uninterruptible session. But someone still has to text the group chat: “Person 2, don’t start folding until Person 1 is done with the washer.” That coordination — the order, the dependencies — lives in the group chat. Not in the work.
That group chat is your orchestration layer — whether it’s a shell script chaining stored procs, an Airflow DAG, or a [t1, t2, t3] >> t4 >> t5 definition at the bottom of your DAG file. The dependencies between steps live outside the SQL. If you add a new model, you update two places. Nobody looking at any individual query can tell what it depends on or what depends on it.
There’s also still no shared definition. If two teams both need clean orders data, they each write their own version. Over time the logic drifts. And there’s still no testing — each step assumes the previous one’s output was correct.
ref() does. The dependency is declared in the code. There’s no group chat to maintain. The order emerges from the work itself.
What actually changes
The SQL barely changes. That’s not really what dbt does. What changes is what’s still there after the pipeline finishes running.
Instead of temp tables or in-memory DataFrames that disappear, each step becomes a real table in your warehouse. Hardcoded schema names become source(). Implicit ordering becomes explicit dependencies with ref().
Click any block to see what the actual change looks like.
What emerges
The script had this exact structure inside it the whole time. The same dependencies existed — they just weren’t visible anywhere. Every ref() call you write is a dependency declaration. dbt reads them all and figures out the execution order. You don’t define the graph. It comes out of the code.
All of this existed inside the script. It just wasn’t visible.
The consequence
Once the dependencies are explicit, dbt knows which models don’t depend on each other. So it runs them at the same time. The script couldn’t do that. It ran everything in order, one step after another, because there was no other option.
Think of it like chores. You wouldn’t start the washing machine, wait for it to finish, move the clothes to the dryer, wait for that to finish, and only then start cooking. Some of those things are related, but most aren’t. You’d run the independent ones in parallel. That’s exactly what the lineage makes visible.
The dryer waits on the washer, but nothing about cooking waits on laundry. Both chains run at the same time.
Total time: 5 units
Total time: 3 units
At real scale (hundreds of models), this compounds. A batch of 40 independent staging models runs in roughly the time it takes to run one.
When it breaks
This is where it actually matters. Not when things go well. When they break. Click each scenario.
Step 4 failed. The temp tables or DataFrames from steps 1–3 died with the process. To figure out what happened you have to re-run the whole thing from the top. Every single debug attempt is a full re-run.
stg_orders and stg_payments are still in the warehouse. Run dbt run -s int_orders_with_payments to re-execute just the thing that failed.
Add debug statements, re-run everything, check the counts. Multiple cycles before you find where the rows got out of control. The intermediate state is gone — you have to recreate it every time.
Query stg_orders. How many rows? Query int_orders_with_payments. How many? The difference tells you exactly where the problem is. No re-run needed.
The script fails or silently returns NULLs. You trace back through the file to figure out which step got hit. If you have Airflow, you check three files instead of one. Then you hope nothing downstream was depending on it.
stg_payments is the only model that touches raw.payments. Fix it there. Everything downstream picks up the change automatically.
stg_ layer is an isolation boundary. Raw schema changes only affect one file.Nothing survived from the last run. You have to re-run the whole thing just to reproduce the failure. Then add logging. Re-run again. You’re 45 minutes in before you’ve even started fixing anything.
The run logs tell you exactly which model failed and why. The intermediate models are still in the warehouse. Fix the issue, re-run just that model and what depends on it.
Before it breaks
All the scenarios above start the same way: someone discovers a problem after the fact. A dashboard is broken. The row count is off. An alert fires at 3am. That’s the equivalent of opening the dryer and finding your clothes still wet. You only know because you went to use the output.
Most data workflows don’t have testing built in. So problems surface downstream, at the wrong time, in ways that affect real decisions. You come home hungry and don’t know the refrigerator broke, so you spend money on takeout, and then discover you also have to fix the fridge. Two problems instead of one, because the first one wasn’t visible when it would have been cheapest to catch.
dbt tests are that earlier detection point. A freshness check, a not-null assertion, a referential integrity test. These run before the data reaches the dashboard. If something upstream is wrong, it surfaces at the source, not in the numbers your stakeholders are using to make decisions.
One broken fridge cascades into five problems without visibility. Tests catch it at the source, before anything downstream is affected.
The point
The script is not your enemy.
The invisible pipeline is. The one that runs and disappears. Whether it’s a stored procedure that drops its temp tables, a Python script that loses its DataFrames, or an Airflow DAG whose tasks know their order but not their lineage — the problem is the same: debugging means starting over.
dbt fixes that not by being a better query runner. The SQL is basically the same. It fixes it by making every step a real table that sticks around. The pipeline runs. The tables survive. When something breaks, you start from evidence, not from memory.
When you’re evaluating any data pipeline, there’s one question worth asking: what does it leave behind when it runs?