Be able to run only new or modified queries within a terminal command. For example, dbt run --models path --type new, modified
This will especially benefit analysts that are building entirely new data models from scratch and trying to deploy them into dev for the first time. This could involve creating new base models (or simply adding a new measure) and a variety of dependent mart models depending on the project. With complex projects, models can live in a variety of folders. Rather than having to tag every model touched, or rely on color coding in their text editor of choice (i.e. modified models are highlighted "green" in Atom), or simply run everything, this "new + modified only" function would save some admin time.
Hey @sagarvelagala - cool idea! Can you think of a good mechanism for dbt to determine if a model is new or changed? One good option is to leverage git. Check out this post which shows you how to do something similar from outside of dbt: https://discourse.getdbt.com/t/tips-and-tricks-about-working-with-dbt/287/2
@drewbanin I don't have a very technical answer if that's what you're looking for =). In my mind I was thinking there would be a way to compare the compiled query files from the most recent run by a user to the new ones? From there, I imagine there is a way to leverage the "tag" feature in dbt and basically create a layer of "smart" tags that are generated by dbt every time a run command happens. There's a lot of cool stuff that could be done with a smart tagging feature that is probably inaccessible to an analyst that just knows SQL (aka me). Examples could be "new," "modified," or even "previously failed" tags if a model was involved in a run failure (or test failure!) in the previous run.
I really like that idea! I hadn't considered "smart tags" like this before.
We have some new code shipping in 0.15.0 that will only re-compile "changed" models. Maybe we can leverage that to do something like what you're describing? Check out the PR here if you're interested: https://github.com/fishtown-analytics/dbt/pull/1646
I think an easy way to implement this would be to store the last modified date for each model. Then when this command is run compare the last modified date of the current files with the ones in the previous one and only run those.
We'd be very interested in something like this in DBT Cloud - we have a test-on-pull set up from github, but our preprod test runs are getting a bit long and expensive - would rather just dbt run changes and then test all.
The bash command linked above doesn't work in cloud (it doesn't error in develop, just runs forever) - interested if any other thoughts have come up to achieve it since last comments on this job?
We've been thinking a lot about this! Check out https://github.com/fishtown-analytics/dbt/issues/2465
I'm going to call this resolved! (https://github.com/fishtown-analytics/dbt/issues/2641)
Please re-open, of course, if there's a nuance here that's missing in our implementation, which shipped as a beta feature in dbt v0.18.0 (docs).
Most helpful comment
I really like that idea! I hadn't considered "smart tags" like this before.
We have some new code shipping in 0.15.0 that will only re-compile "changed" models. Maybe we can leverage that to do something like what you're describing? Check out the PR here if you're interested: https://github.com/fishtown-analytics/dbt/pull/1646