Dbt: Feature: @model to build additional dependencies

Created on 27 Nov 2018  路  4Comments  路  Source: fishtown-analytics/dbt

Feature

Feature description

When using the --models flag in conjunction with + only results in the downstream/upstream dependencies being built (and not their individual dependencies). It would be useful to have a ++modelName++ syntax to build that path on the tree including the dependencies of the dependent/parent models.

Who will this benefit?

Anyone using selective builds in clean CI environments.

Edit: Updated Feature Description (See below)

Add an @ modifier (applicable only at the beginning of a selector) which will select:

  • all of the descendants of the specified resource
  • all of the ancestors of _those_ resources

Examples:

# Run everything downstream of a source, plus transitive parents of those downstream models
dbt run --model @source:snowplow

# Run a model, plus everything downstream, plus transitive parents of those downstream models
dbt run --model @some_model
enhancement help wanted

All 4 comments

thanks for the suggestion @swettk! I'm super into the idea, but open to alternatives for the syntax. Ideally, we'd pick something 1) sensible 2) easy to parse 3) conflict-free with most terminals.

This is currently prioritized for the Stephen Girard release, but our thinking needs to be developed a little further before we can tackle it.

  1. Rather than ++, let's use @. It's going to be less confusing/easier to implement than repurposing the + I think.
  2. What does the @ sign mean at the beginning/end of the selector?

    • At the end: Run the parents of the children of the selected model

    • At the beginning: I don't know that this is super useful, nor do I know exactly what it would indicate? I guess the opposite of the above is: "Run the children of the parents of the selected model". That might be consistent, but I am hard pressed to think of a good use case for that.

Either way, the @ selector has a weird effect of kind of doing the _opposite_ of what you would think. If you do

dbt run --model my_model@

dbt will run:

  1. my_model
  2. all of the children of my_model
  3. all of the parents of the children of my_model

Almost by definition, (3) will include the _parents_ of my_model, right? That's a little counter-intuitive, and it's not really what I would expect here!

This almost makes me think that this functionality shouldn't be implemented with a selector modifier like @, but instead with some other flag. Like --transitive-closure or something :). I think the use case is predominately for CI envs (as noted above) and I don't want to convolute the human-centric parts of model selection for something that will mainly be used by machines.

Keen to discuss, but let's answer these questions before doing any work on this issue

@drewbanin can we drop the before/after stuff, and just define dbt run --model @my_model as: "Run my_model and all its descendants, and every other model that is an input to those descendants"? In that world my_model@ doesn't mean anything, and is invalid

Yeah, I'm into that. I think this is also going to be really helpful for sources, right?

dbt run --model @source:snowplow

Run every model that depends on Snowplow source data, plus all of their inputs? Should be pretty useful!

Was this page helpful?
0 / 5 - 0 ratings