An interactive walkthrough. Follow the steps โ each one builds on the last so you can see exactly what changes and why.
-- ================================================================ -- Stored Procedure: build_customer_order_summary -- Runs nightly. Touch with care. -- ================================================================ CREATE OR REPLACE PROCEDURE build_customer_order_summary() AS $$ BEGIN -- Step 1: clean customers CREATE TEMP TABLE tmp_customers AS โ temp โ gone when session ends, nothing to inspect on failure SELECT customer_id, TRIM(LOWER(email)) AS email, COALESCE(first_name, 'Unknown') AS first_name, COALESCE(last_name, 'Unknown') AS last_name, CASE WHEN region IS NULL THEN 'Unassigned' ELSE UPPER(region) END AS region, created_at FROM raw.customers; โ hardcoded schema โ rename silently breaks this -- Step 2: clean orders CREATE TEMP TABLE tmp_orders AS โ temp โ same problem SELECT order_id, customer_id, order_date, status, total_amount FROM raw.orders WHERE order_id IS NOT NULL AND status != 'cancelled'; -- Step 3: clean payments CREATE TEMP TABLE tmp_payments AS โ temp โ same problem SELECT payment_id, order_id, payment_method, amount AS payment_amount, created_at AS payment_date FROM raw.payments WHERE payment_id IS NOT NULL AND amount > 0; -- Step 4: join orders + payments CREATE TEMP TABLE tmp_orders_with_payments AS โก must wait for steps 2 & 3 โ no parallelism possible SELECT o.order_id, o.customer_id, o.order_date, o.status, o.total_amount, SUM(p.payment_amount) AS total_paid, COUNT(p.payment_id) AS payment_count, o.total_amount - COALESCE(SUM(p.payment_amount),0) AS outstanding_balance FROM tmp_orders o LEFT JOIN tmp_payments p ON o.order_id = p.order_id GROUP BY o.order_id, o.customer_id, o.order_date, o.status, o.total_amount; -- Step 5: final output CREATE OR REPLACE TABLE analytics.fct_customer_order_summary AS โ full overwrite every run โ no incremental option SELECT c.customer_id, c.email, c.first_name || ' ' || c.last_name AS full_name, c.region, COUNT(op.order_id) AS lifetime_orders, COALESCE(SUM(op.total_amount),0) AS lifetime_revenue, COALESCE(SUM(op.total_paid),0) AS lifetime_payments, MAX(op.order_date) AS most_recent_order_date FROM tmp_customers c LEFT JOIN tmp_orders_with_payments op ON c.customer_id = op.customer_id GROUP BY c.customer_id, c.email, full_name, c.region; -- Proc ends. All 4 temp tables dropped automatically. -- If step 5 failed: no intermediate state survives to inspect. โ debug from scratch END; $$ LANGUAGE plpgsql;
CREATE TEMP TABLE tmp_customers AS โ temp โ gone on session end SELECT customer_id, TRIM(LOWER(email)) AS email, COALESCE(first_name,'Unknown') AS first_name, COALESCE(last_name, 'Unknown') AS last_name, CASE WHEN region IS NULL THEN 'Unassigned' ELSE UPPER(region) END AS region, created_at FROM raw.customers; โ hardcoded schema name
-- No CREATE. No TEMP. Just SELECT. SELECT customer_id, TRIM(LOWER(email)) AS email, COALESCE(first_name,'Unknown') AS first_name, COALESCE(last_name, 'Unknown') AS last_name, CASE WHEN region IS NULL THEN 'Unassigned' ELSE UPPER(region) END AS region, created_at FROM {{ source('raw', 'customers') }}
CREATE TEMP TABLE is gone โ dbt handles whether to build a table or a view based on config. raw.customers becomes source('raw','customers') so dbt knows this is where raw data enters the pipeline and can test it, monitor freshness, and alert you if it stops arriving. The result is a real, persistent, queryable table in your warehouse โ not a session-scoped temp that vanishes.CREATE TEMP TABLE tmp_orders AS โ temp โ gone on session end SELECT order_id, customer_id, order_date, status, total_amount FROM raw.orders โ hardcoded schema name WHERE order_id IS NOT NULL AND status != 'cancelled';
SELECT order_id, customer_id, order_date, status, total_amount FROM {{ source('raw', 'orders') }} WHERE order_id IS NOT NULL AND status != 'cancelled'
ref('stg_orders') directly. In the stored proc world, they copy-paste tmp_orders into their own proc โ and now "what is a valid order" is defined in two places.CREATE TEMP TABLE tmp_payments AS โ temp โ gone on session end SELECT payment_id, order_id, payment_method, amount AS payment_amount, created_at AS payment_date FROM raw.payments โ hardcoded schema name WHERE payment_id IS NOT NULL AND amount > 0;
SELECT payment_id, order_id, payment_method, amount AS payment_amount, created_at AS payment_date FROM {{ source('raw', 'payments') }} WHERE payment_id IS NOT NULL AND amount > 0
CREATE TEMP TABLE tmp_orders_with_payments AS โก must wait for steps 2 & 3 โ enforced by you, not the system SELECT o.order_id, o.customer_id, o.order_date, o.total_amount, SUM(p.payment_amount) AS total_paid, o.total_amount - COALESCE(SUM(p.payment_amount),0) AS outstanding_balance FROM tmp_orders o LEFT JOIN tmp_payments p ON o.order_id = p.order_id GROUP BY o.order_id, o.customer_id, o.order_date, o.total_amount;
SELECT o.order_id, o.customer_id, o.order_date, o.total_amount, SUM(p.payment_amount) AS total_paid, o.total_amount - COALESCE(SUM(p.payment_amount),0) AS outstanding_balance FROM {{ ref('stg_orders') }} o LEFT JOIN {{ ref('stg_payments') }} p ON o.order_id = p.order_id GROUP BY o.order_id, o.customer_id, o.order_date, o.total_amount
tmp_orders and tmp_payments become ref('stg_orders') and ref('stg_payments'). That ref() declares the dependency so dbt knows to run staging first, and it resolves the correct table name in any environment โ dev, staging, or prod โ automatically.CREATE OR REPLACE TABLE analytics.fct_customer_order_summary AS โ full overwrite every run SELECT c.customer_id, c.email, c.first_name || ' ' || c.last_name AS full_name, c.region, COUNT(op.order_id) AS lifetime_orders, COALESCE(SUM(op.total_amount),0) AS lifetime_revenue, MAX(op.order_date) AS most_recent_order_date FROM tmp_customers c LEFT JOIN tmp_orders_with_payments op ON c.customer_id = op.customer_id GROUP BY c.customer_id, c.email, full_name, c.region; -- Session ends. All 4 temp tables dropped. Nothing to inspect. โ debug from scratch
-- Materialization strategy (table, incremental, view) is config, not SQL. SELECT c.customer_id, c.email, c.first_name || ' ' || c.last_name AS full_name, c.region, COUNT(op.order_id) AS lifetime_orders, COALESCE(SUM(op.total_amount),0) AS lifetime_revenue, MAX(op.order_date) AS most_recent_order_date FROM {{ ref('stg_customers') }} c LEFT JOIN {{ ref('int_orders_with_payments') }} op ON c.customer_id = op.customer_id GROUP BY c.customer_id, c.email, full_name, c.region
CREATE TEMP TABLE becomes a file, raw.x becomes source(), tmp_x becomes ref(). Those three substitutions unlock everything else.
raw.customers. If that schema renames, the proc breaks silently. source() registers these as official entry points โ dbt can test them, check freshness, and alert you before a bad data day becomes a bad dashboard day.ref() is a declared dependency. dbt reads them all and builds the execution order automatically. You don't write orchestration code โ the ref() calls are the orchestration.stg_customers, stg_orders, and stg_payments each only reference source() โ they have no ref() edges between them. dbt reads the graph, sees no dependencies, and dispatches all three at once. The ref() declarations from tab 2 are the orchestration โ you didn't write any extra config to make this happen.
Step 4 failed. The temp tables from steps 1โ3 are gone โ they died with the session. Re-run the entire procedure from the top to reproduce the failure. Every debug cycle = full re-run.
Step 4 failed. stg_orders and stg_payments are still in the warehouse. Run dbt run -s int_orders_with_payments to re-execute only that model. Upstream work stays intact.
Fan-out in the join? Bad filter in step 2? You add temp debug tables mid-proc, re-run everything, and manually query each one. Multiple cycles to isolate.
Query stg_orders โ how many rows? Query int_orders_with_payments โ how many? The difference shows exactly where the fan-out happened. No re-run needed.
The proc either silently returns NULLs or throws an error. You find out when a BI dashboard breaks. Trace back through the proc to find which temp table was affected.
stg_payments is the only model that touches raw.payments. Fix it there. Every downstream model gets the fix automatically on the next run. One change point, not five.
stg_ layer acts as an isolation boundary. Raw schema changes only affect one file โ the staging model for that source.Connect to the DB. No intermediate state anywhere. Re-run the full proc to reproduce. Add RAISE NOTICE statements. Re-run again. 45 minutes of archaeology before you even start fixing.
dbt run logs show the exact model that failed and the exact error. Intermediate models are still in the warehouse. Run only the failed model, fix, run downstream only.
Grep the codebase and hope nothing uses tmp_orders in another proc you don't know about. No system-of-record for dependencies.
The lineage graph shows every downstream model that references stg_orders. Click on it in the dbt DAG and see the full impact of any change instantly.