Our CI runs dbt run followed by dbt test in a completely clean snowflake warehouse. When using the @ syntax for both, run passes fine, but tests will fail because there are relationship tests on models that aren't built.
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
If you view the graph on https://gitlab-data.gitlab.io/analytics/dbt/snowflake/#!/model/model.gitlab_dw?g_v=1&g_i=@zuora you'll see that the sfdc_executive_business_review model is not built, but on a dbt test this relationship test is run https://gitlab.com/gitlab-data/analytics/blob/master/transform%2Fsnowflake-dbt%2Fmodels%2Fsfdc%2Fbase%2Fschema.yml#L66
I would either expect that test to be skipped or the relevant model to be built.
These are the logs for an example failed job with @zuora
Which database are you using dbt with?
The output of dbt --version:
Running with dbt=0.14.2
The operating system you're using:
Mac OS X
The output of python --version:
3.7.4
Add any other context about the problem here.
The repro for this case will be very complicated, but I think the issue is when tests get added in to the final graph.
I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.
I think a simple reproduction case should be:
-- models/model1.sql
select 1 as id
-- models/model2.sql
select 1 as fk
models/schema.yml
models:
- name: model1
columns:
- name: id
relationships:
to: ref('model2')
field: fk
Running:
# Runs only model1
dbt run --models @model1
# Runs no tests
dbt test --models @model1
# Runs the test
dbt test --models model1 model2
I'm not sure if we should be excluding the tests whose ancestors aren't in the graph, or doing another pass and adding all the test ancestors in, but one of those choices is the appropriate solution here.
It's compelling to think that tests should only be run if _all_ of their ancestors are selected in the run, but that will make custom data tests pretty wonky! This is probably the right answer for @tayloramurphy's use case, but probably the wrong answer in the more general case.
I do not think that in this case, the selected nodes in dbt run --models @zuroa should be made aware of the tests that exist on these models -- that would _definitely_ be unexpected behavior! I do want to note that that would be the natural behavior of my suggestion in #2203 :)
So, I think the only other path forward here is to make relationship tests smarter. I think it would be pretty reasonable to special-case schema tests that select from more than one model. We can add functionality to support tests returning with a warning that can be escalated to an error with --warn-error, and we could leverage the relation cache to determine if the to relation exists in the database or not.
Curious what y'all think?
It's compelling to think that tests should only be run if all of their ancestors are selected in the run, but that will make custom data tests pretty wonky!
What will be wonky about it? Custom data tests should use ref and source just like everything else, right?
True, but something about this behavior for custom data tests feels wrong to me. Maybe custom data tests are less of an issue because you can conceivably select them by name/path? Let me think on that some more... it's not quite as jarring as I initially felt it was, but I don't feel _great_ about it still.
I'm here from the future to say that the @ operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:
# Runs both model1 and model2
dbt run --models @model1
# Runs the test
dbt test --models @model1
I came here because I've been thinking a lot about:
if we should be excluding the tests whose ancestors aren't in the graph
Particularly with reference to state:modified + testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for both run + test. "All parents must be present" is one of a few potential approaches I'm mulling over.
In any case, I'm going to close this issue :)
I'm here from the future to say that the
@operator seems to work for tests now, insofar as the reproduction case that @drewbanin included above does the following in dbt v0.18.1:# Runs both model1 and model2 dbt run --models @model1 # Runs the test dbt test --models @model1I came here because I've been thinking a lot about:
if we should be excluding the tests whose ancestors aren't in the graph
Particularly with reference to
state:modified+ testing caveats, where one "leg" of a test may be modified but the other one may not be, and we'll only have one in our selection criteria for bothrun+test. "All parents must be present" is one of a few potential approaches I'm mulling over.In any case, I'm going to close this issue :)
@jtcohen6 do you know if this behavior is fixed in 0.18.1 for the + selector as well?
@ags2121 That behavior is not fixed. I saw your comment over on #2132, I agree that we should open a new issue to discuss.