When the current run isn't aware of a source that exists in deferring run and when we specify a model filter, the dbt run command fails with a KeyError: <unaware_source_id>. I think dbt run should successfully complete or if its a configuration error it should give a more descriptive error statement.
dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target which fails with an error. dbt -d run --models state:modified+ --defer --state ./deferring_target passes without any errors. To recreate ./deferring_target:
dbt run and copy the target directory. The run should complete without any errors or a more descriptive error when we specify model filters like config.materialized:table.
Which database are you using dbt with?
The output of dbt --version:
installed version: 0.18.1
latest version: 0.18.1
Up to date!
Plugins:
- bigquery: 0.18.1
- snowflake: 0.18.1
- redshift: 0.18.1
- postgres: 0.18.1
The operating system you're using:
MacOs Catalina 10.15.7
The output of python --version:
Python 3.8.2
~/git/jaffle_shop(bug/defer-with-filters) » dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target 2 ↵ naveen@Naveens-MacBook-Pro
2020-11-09 22:41:10.593567 (MainThread): Running with dbt=0.18.1
2020-11-09 22:41:12.097193 (MainThread): running dbt with arguments Namespace(cls=<class 'dbt.task.run.RunTask'>, debug=True, defer=True, exclude=None, fail_fast=False, full_refresh=False, log_cache_events=False, log_format='default', models=['config.materialized:table,state:modified+'], partial_parse=None, profile=None, profiles_dir='/Users/naveen/.dbt', project_dir=None, record_timing_info=None, rpc_method='run', selector_name=None, single_threaded=False, state=PosixPath('deferring_target'), strict=False, target=None, test_new_parser=False, threads=None, use_cache=True, use_colors=None, vars='{}', version_check=True, warn_error=False, which='run', write_json=True)
2020-11-09 22:41:12.098467 (MainThread): Tracking: tracking
2020-11-09 22:41:12.112634 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0d4a60>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0daf10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0da400>]}
2020-11-09 22:41:12.144638 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.146188 (MainThread): Parsing macros/catalog.sql
2020-11-09 22:41:12.151603 (MainThread): Parsing macros/relations.sql
2020-11-09 22:41:12.154273 (MainThread): Parsing macros/adapters.sql
2020-11-09 22:41:12.183946 (MainThread): Parsing macros/materializations/snapshot_merge.sql
2020-11-09 22:41:12.188553 (MainThread): Parsing macros/core.sql
2020-11-09 22:41:12.193896 (MainThread): Parsing macros/materializations/helpers.sql
2020-11-09 22:41:12.205495 (MainThread): Parsing macros/materializations/snapshot/snapshot_merge.sql
2020-11-09 22:41:12.208681 (MainThread): Parsing macros/materializations/snapshot/strategies.sql
2020-11-09 22:41:12.228941 (MainThread): Parsing macros/materializations/snapshot/snapshot.sql
2020-11-09 22:41:12.271125 (MainThread): Parsing macros/materializations/seed/seed.sql
2020-11-09 22:41:12.299178 (MainThread): Parsing macros/materializations/incremental/helpers.sql
2020-11-09 22:41:12.301574 (MainThread): Parsing macros/materializations/incremental/incremental.sql
2020-11-09 22:41:12.318498 (MainThread): Parsing macros/materializations/common/merge.sql
2020-11-09 22:41:12.351718 (MainThread): Parsing macros/materializations/table/table.sql
2020-11-09 22:41:12.360024 (MainThread): Parsing macros/materializations/view/view.sql
2020-11-09 22:41:12.368574 (MainThread): Parsing macros/materializations/view/create_or_replace_view.sql
2020-11-09 22:41:12.375325 (MainThread): Parsing macros/etc/get_custom_alias.sql
2020-11-09 22:41:12.376571 (MainThread): Parsing macros/etc/query.sql
2020-11-09 22:41:12.378245 (MainThread): Parsing macros/etc/is_incremental.sql
2020-11-09 22:41:12.381212 (MainThread): Parsing macros/etc/datetime.sql
2020-11-09 22:41:12.393173 (MainThread): Parsing macros/etc/get_custom_schema.sql
2020-11-09 22:41:12.395858 (MainThread): Parsing macros/etc/get_custom_database.sql
2020-11-09 22:41:12.398614 (MainThread): Parsing macros/adapters/common.sql
2020-11-09 22:41:12.478857 (MainThread): Parsing macros/schema_tests/relationships.sql
2020-11-09 22:41:12.481771 (MainThread): Parsing macros/schema_tests/not_null.sql
2020-11-09 22:41:12.483674 (MainThread): Parsing macros/schema_tests/unique.sql
2020-11-09 22:41:12.487213 (MainThread): Parsing macros/schema_tests/accepted_values.sql
2020-11-09 22:41:12.498646 (MainThread): * Deprecation Warning: dbt v0.17.0 introduces a new config format for the
dbt_project.yml file. Support for the existing version 1 format will be removed
in a future release of dbt. The following packages are currently configured with
config version 1:
- jaffle_shop
For upgrading instructions, consult the documentation:
https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-0-17-0
2020-11-09 22:41:12.498840 (MainThread): Sending event: {'category': 'dbt', 'action': 'deprecation', 'label': '909baabc-7612-4869-8c50-a18d9cfa843b', 'property_': 'warn', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f108b80>]}
2020-11-09 22:41:12.499374 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.530387 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_customers".
2020-11-09 22:41:12.549148 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_payments".
2020-11-09 22:41:12.559049 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_orders".
2020-11-09 22:41:12.570599 (MainThread): Acquiring new postgres connection "model.jaffle_shop.dim_customers".
2020-11-09 22:41:12.584907 (MainThread): Acquiring new postgres connection "model.jaffle_shop.fct_orders".
2020-11-09 22:41:12.594309 (MainThread): Acquiring new postgres connection "model.jaffle_shop.order_payments".
2020-11-09 22:41:12.604708 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_payments".
2020-11-09 22:41:12.613672 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_orders".
2020-11-09 22:41:13.141491 (MainThread): Found 8 models, 20 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 3 seed files, 0 sources
2020-11-09 22:41:13.144965 (MainThread):
2020-11-09 22:41:13.145323 (MainThread): Merged 6 items from state (sample: ['seed.jaffle_shop.raw_orders', 'model.jaffle_shop.stg_payments', 'seed.jaffle_shop.raw_payments', 'model.jaffle_shop.stg_orders', 'seed.jaffle_shop.raw_customers'])
2020-11-09 22:41:13.145489 (MainThread): Connection 'model.jaffle_shop.customer_orders' was properly closed.
2020-11-09 22:41:13.145797 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20f100>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ffd0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ff70>]}
2020-11-09 22:41:13.146030 (MainThread): Flushing usage events
2020-11-09 22:41:13.338030 (MainThread): Encountered an error:
2020-11-09 22:41:13.338333 (MainThread): 'source.jaffle_shop.public.source1'
2020-11-09 22:41:13.343158 (MainThread): Traceback (most recent call last):
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 124, in main
results, succeeded = handle_and_check(args)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 202, in handle_and_check
task, res = run_from_args(parsed)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 255, in run_from_args
results = task.run()
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 419, in run
result = self.execute_with_hooks(selected_uids)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 379, in execute_with_hooks
self.before_run(adapter, selected_uids)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 392, in before_run
self.defer_to_manifest(selected_uids)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 389, in defer_to_manifest
self.write_manifest()
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 62, in write_manifest
self.manifest.write(path)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 762, in write
self.writable_manifest().write(path)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 741, in writable_manifest
forward_edges, backward_edges = build_edges(edge_members)
File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 231, in build_edges
forward_edges[unique_id].append(node.unique_id)
KeyError: 'source.jaffle_shop.public.source1'
Oh boy. Thanks for the repro instructions @nave91, I've been able to replicate this error locally in an even simpler project.
I've confirmed that the error results from dbt run _only_ when:
--defer or DBT_DEFER_TO_STATE=true)The error crops up here, where dbt tries to infer the backward edges of the deferred model and then trips over its missing source key:
https://github.com/fishtown-analytics/dbt/blob/0ce2f41db47e7e538f727f0ef57f89fffdab8e25/core/dbt/contracts/graph/manifest.py#L231-L237
What are our options here? I think it's either one of the following:
sources. (It has to be there, otherwise the state manifest couldn't have compiled.)This error is rare in the "Slim CI" use case because it _only_ crops up if you _exclude_ modified models from selection criteria. It's much more common to run all state:modified, which would successfully capture models whose source() dependencies have changed.
Update: this error also crops up for _ephemeral models_ that have been deleted from the project, but which are understood to have edges within the comparison manifest. I think our best bet here is to handle the missing key as gracefully as possible, and move on.
Captain's log, 12/14/20, 9:48 pm: Per #2954, KeyError also appears when a schema test which uses its arguments to construct its unique_id (e.g. relationships) changes one of its arguments. When it comes time to run the new/modified version of the test, the _old_ test rears its head, presumably from the child_map of the deferred manifest:
Encountered an error:
'test.testy.relationships_model_a_old__args__ref_model_b_'