Dbt: DBT should have a Dry-Run mode

Created on 13 Oct 2018 · 12Comments · Source: fishtown-analytics/dbt

DBT should have a dry-run mode

Feature description

DBT should have a dry-run mode so that users can see all of the "commands" DBT will run for a given command without having DBT actually execute those commands against a database.

Who will this benefit?

This can be useful for both debugging as well as making sure that a given command is configured properly. This is especially useful when you're configuring a production setup and want to make sure you're not about to drop schemas or tables you didn't intend to.

It can also be useful for testing macros and other DBT configurations that are post-compilation but don't show up in the compiled SQL

Source

mikekaminsky

👍27

Most helpful comment

+1 as well to adding a dry run feature such as a dbt compile --all command

christopher-tin on 1 Oct 2019

👍2

All 12 comments

So, I took a look at dbt/adapters/default/impl.py to see how hard this might be and I don't think I have enough context on why some things are implemented the way they are to take a stab at this myself without some guidance.

I can see that the add_query method is where the statements get executed (cursor.execute(sql, bindings) but there's a bunch of other stuff happening in here I don't understand (I suspect a lot of it is related to handling multiple connections).

In case it's helpful to see where my head is going, the way I've implemented this in the past is to have two functions for executing SQL. One for commands (no results returned) and one for queries (returning data) -- this way your read and write operations are split.

Then, for the dry-run mode, you can put some logic in the command-handling function that just logs the command instead of passing it to cur.execute(). Something like:

def run_command(sql):
    print(sql)
    if not DRY_RUN:
        self.cur.execute(sql)

def get_data(sql):
    print(sql)
    self.cur.execute(sql)
    result = cur.fetchall()
    return result

mikekaminsky on 14 Oct 2018

yes, I vote to have the dry-run feature, which will be super helpful for debug and performance tuning

hui-zheng on 26 Apr 2019

👍2

@cmcarthur What's the reasoning for this to be removed from the milestone?

thalesmello on 28 May 2019

Hi @thalesmello - this is a pretty involved change to dbt! Connor and I tried to prioritize it for the Wilt Chamberlain release (0.14.0), but we ended up deciding to cull it in favor of Archive-related functionality. This feature is definitely going to be a big improvement to dbt - I'm looking forward to re-prioritizing it!

drewbanin on 29 May 2019

Thanks for the explanation @drewbanin 👍

thalesmello on 29 May 2019

👍1

+1 as well to adding a dry run feature such as a dbt compile --all command

christopher-tin on 1 Oct 2019

👍2

Any movement on this?

darrenhaken on 12 Jun 2020

Hello everyone!
Any progress on this? I think being able to see the actual sql that will be executed is quite the important feature.

giovanni-girelli-sdg on 11 Aug 2020

@giovanni-girelli-sdg To see the actual SQL that dbt will execute you can use dbt compile https://docs.getdbt.com/reference/commands/compile/.

alepuccetti on 11 Aug 2020

👍1

Hi @alepuccetti! thanks for the reply, but there you can only see the SELECT statement. What I'd like to see is the actual SQL that will run, such as a MERGE, INSERT or CREATE statement.

Edit at least it's what I can find. Is there a way to see the full materialization code?

Edit2: I had not seen the target/run folder, that's enough for debugging in DEV. I still think it would be great to be able to have a dry run mode where no sql is actually executed, but I can see how that would be much harder to do. Thanks for the great work!

giovanni-girelli-sdg on 11 Aug 2020

👍1

@mikekaminsky you opened this on my birthday!

bhtucker on 2 Sep 2020

🎉1

+1 to this, especially if it allowed the calculation of the cost of the run, and allowed prevention of the run if the cost was too high unless a --force argument was used.