You're deep into managing your data warehouse, when suddenly you realize your Snowflake instance has accumulated digital cobwebs - A collection of old, stale unused tables and views. "No problem," I thought. "I'll just write some SQL to clear these out". I rolled up my sleeves, fired up dbt, eager to flex my SQL muscles and show those forgotten tables who's in charge. Cleaning up old Snowflake tables should be simple, right? After all, how hard could it be to automate the process and keep the warehouse tidy? Little did I know, I was in for a surprise....
The DIY Route: A Journey of Ambition
In general, I wanted to get rid of tables and views that hadn't been updated for over a week. After some time of tinkering, I built a query that listed all the tables and views in a specific schema and dynamically generated DROP
statements for objects older than seven days.
I envisioned a world where my warehouse would be spotless....
Here's the query I wrote (just in case you're wondering):
WITH object_info AS (
SELECT
table_name AS object_name,
table_type,
created,
last_altered
FROM
{{ target_database }}.INFORMATION_SCHEMA.TABLES
WHERE
table_schema = '{{ target_schema }}'
),
old_objects AS (
SELECT
object_name,
table_type
FROM
object_info
WHERE
COALESCE(last_altered, created) < CURRENT_TIMESTAMP() - INTERVAL '7 days'
)
SELECT
CASE
WHEN table_type = 'VIEW' THEN 'DROP VIEW ' || '{{ target_schema }}.' || object_name
WHEN table_type = 'BASE TABLE' THEN 'DROP TABLE ' || '{{ target_schema }}.' || object_name
END AS drop_statement
FROM
old_objects
Frankly, It was quite satisfying to see those DROP
statements roll out, ready to clean up the clutter....
To execute DROP
statements (or any other DDL/DML) in production from within a dbt model, we'll need to leverage:
- dbt macros and
- dbt's
run-operation
functionality.
Converting the sql query to a dbt macro was quite straightforward:
{% macro drop_old_objects(target_database, target_schema) %}
-- Generate the SQL to find old tables and views
{% set drop_statements_query %}
WITH object_info AS (
SELECT
table_name AS object_name,
table_type,
created,
last_altered
FROM
{{ target_database }}.INFORMATION_SCHEMA.TABLES
WHERE
table_schema = '{{ target_schema }}'
),
old_objects AS (
SELECT
object_name,
table_type
FROM
object_info
WHERE
COALESCE(last_altered, created) < CURRENT_TIMESTAMP() - INTERVAL '7 days'
)
SELECT
CASE
WHEN table_type = 'VIEW' THEN 'DROP VIEW ' || '{{ target_schema }}.' || object_name
WHEN table_type = 'BASE TABLE' THEN 'DROP TABLE ' || '{{ target_schema }}.' || object_name
END AS drop_statement
FROM
old_objects;
{% endset %}
-- Execute the query to fetch the drop statements
{% set results = run_query(drop_statements_query) %}
{% set drop_statements = results.columns[0].values() %}
-- Loop through and execute each drop statement
{% for statement in drop_statements %}
{{ log('Executing: ' ~ statement, info=True) }}
{% do run_query(statement) %}
{% endfor %}
{{ log('All old objects have been dropped.', info=True) }}
{% endmacro %}
Once the macro is defined, we can execute it in production using the dbt run-operation
command. This allows us to run operations that are not tied to specific models.
dbt run-operation drop_old_objects --args '{"target_database": "MY_DATABASE", "target_schema": "MY_SCHEMA"}'
If we need this to work in multiple environments (like dev
, staging
, or prod
), we can enhance the macro to dynamically select the database/schema based on the environment.
{% macro drop_old_objects() %}
{% set target_database = 'prod_db' if target.name == 'prod' else 'dev_db' %}
{% set target_schema = 'reporting' %}
-- Rest of the macro logic remains the same
...
{% endmacro %}
and then just run:
dbt run-operation drop_old_objects
The Plot Twist: I Found a Macro!
As I was about to deploy my handcrafted solution (probably while sipping my fourth coffee of the day), I stumbled across this post on Discourse, and it immediately caught my attention. The title? "Clean Your Warehouse of Old and Deprecated Models (Snowflake)." It seemed like a cruel twist of fate-someone had already figured out the exact solution I needed, and they did it more efficiently.
The macro I found was cleaner, smarter, and most importantly, it was already battle-tested! It focused not just on arbitrary old tables and views, but specifically targeted tables and views that no longer had corresponding dbt models. I mean, talk about precision!
{% macro public_drop_old_relations(dryrun=False) %}
{% if execute %}
{% set current_models=[] %}
{% for node in graph.nodes.values()
| selectattr("resource_type", "in", ["model", "seed", "snapshot"])%}
{% do current_models.append(node.name) %}
{% endfor %}
{% endif %}
{% set cleanup_query %}
with models_to_drop as (
select
case
when table_type = 'BASE TABLE' then 'TABLE'
when table_type = 'VIEW' then 'VIEW'
end as relation_type,
concat_ws('.', table_catalog, table_schema, table_name) as relation_name
from
ANALYTICS.information_schema.tables
where table_schema ilike 'PUBLIC%'
and table_name not in
({%- for model in current_models -%}
'{{ model.upper() }}'
{%- if not loop.last -%}
,
{% endif %}
{%- endfor -%}))
select
'drop ' || relation_type || ' ' || relation_name || ';' as drop_commands
from
models_to_drop
-- intentionally exclude unhandled table_types, including 'external table`
where drop_commands is not null
{% endset %}
{% do log(cleanup_query, info=True) %}
{% set drop_commands = run_query(cleanup_query).columns[0].values() %}
{% if drop_commands %}
{% if dryrun | as_bool == False %}
{% do log('Executing DROP commands...', True) %}
{% else %}
{% do log('Printing DROP commands...', True) %}
{% endif %}
{% for drop_command in drop_commands %}
{% do log(drop_command, True) %}
{% if dryrun | as_bool == False %}
{% do run_query(drop_command) %}
{% endif %}
{% endfor %}
{% else %}
{% do log('No relations to clean.', True) %}
{% endif %}
{% endmacro %}
This macro did everything I wanted-and more! It looped through all models currently in my dbt project, compared them against tables and views in Snowflake's PUBLIC
scheme, and automatically dropped any that no longer had a corresponding dbt model. Genius, right?
The Solution Tested: A Sigh of Relief
First, I ran it in dry-run mode to see which tables would be dropped without actually executing the DROP
statements:
dbt run-operation public_drop_old_relations --args '{"dryrun": True}'
Lo and behold, it spat out a list of tables and views that were ripe for deletion. After double-checking the list (just to make sure it wasn't about to drop something critical-like the performance dashboard pipeline I was working on last month or my sanity), I pulled the trigger:
dbt run-operation public_drop_old_relations
Boom. Clean as a whistle!
No more obsolete tables hanging around, no more manual cleanup, and no more second-guessing if the old models had been properly deleted. The macro took care of it all!
Conclusion: Work Smarter, Not Harder
This experience was a humbling reminder that sometimes, even the best-laid plans are just the beginning. Sure, it's fun to build things from scratch. And yes, there's a certain pride that comes with writing your own solution. But there's also a real power in recognizing when someone else has already paved the way with a more efficient method.
In the end, I'm happy I stumbled across that macro, and now my dbt projects are cleaner than ever. It not only solved my problem but also opened my eyes to the wealth of dbt resources out there. Sometimes, the smartest move isn't to reinvent the wheel-it's to find the right wheel and let it roll.
So next time you think about tackling a new problem from scratch, take a second to look around. The solution might already be out there, waiting for you to discover it!
Work together with one of our consultants and maximize the effects of your data.
Contact us, and we'll help you right away.