Dbt: KeyError while using deferred flag and model filters

Created on 9 Nov 2020  Â·  3Comments  Â·  Source: fishtown-analytics/dbt

Describe the bug

When the current run isn't aware of a source that exists in deferring run and when we specify a model filter, the dbt run command fails with a KeyError: <unaware_source_id>. I think dbt run should successfully complete or if its a configuration error it should give a more descriptive error statement.

Steps To Reproduce

  1. Checkout branch: https://github.com/nave91/jaffle_shop/tree/bug/defer-with-filters
  2. Run dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target which fails with an error.
  3. Run dbt -d run --models state:modified+ --defer --state ./deferring_target passes without any errors.

To recreate ./deferring_target:

  1. Add any source to project with a model depending on it.
  2. Run dbt run and copy the target directory.

Expected behavior

The run should complete without any errors or a more descriptive error when we specify model filters like config.materialized:table.

Screenshots and log output

System information

Which database are you using dbt with?

  • [x] postgres
  • [ ] redshift
  • [ ] bigquery
  • [ ] snowflake
  • [ ] other (specify: ____________)

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.18.1

Up to date!

Plugins:
  - bigquery: 0.18.1
  - snowflake: 0.18.1
  - redshift: 0.18.1
  - postgres: 0.18.1

The operating system you're using:
MacOs Catalina 10.15.7
The output of python --version:
Python 3.8.2

Additional context

~/git/jaffle_shop(bug/defer-with-filters) » dbt -d run --models config.materialized:table,state:modified+ --defer --state ./deferring_target                                 2 ↵ naveen@Naveens-MacBook-Pro
2020-11-09 22:41:10.593567 (MainThread): Running with dbt=0.18.1
2020-11-09 22:41:12.097193 (MainThread): running dbt with arguments Namespace(cls=<class 'dbt.task.run.RunTask'>, debug=True, defer=True, exclude=None, fail_fast=False, full_refresh=False, log_cache_events=False, log_format='default', models=['config.materialized:table,state:modified+'], partial_parse=None, profile=None, profiles_dir='/Users/naveen/.dbt', project_dir=None, record_timing_info=None, rpc_method='run', selector_name=None, single_threaded=False, state=PosixPath('deferring_target'), strict=False, target=None, test_new_parser=False, threads=None, use_cache=True, use_colors=None, vars='{}', version_check=True, warn_error=False, which='run', write_json=True)
2020-11-09 22:41:12.098467 (MainThread): Tracking: tracking
2020-11-09 22:41:12.112634 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0d4a60>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0daf10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f0da400>]}
2020-11-09 22:41:12.144638 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.146188 (MainThread): Parsing macros/catalog.sql
2020-11-09 22:41:12.151603 (MainThread): Parsing macros/relations.sql
2020-11-09 22:41:12.154273 (MainThread): Parsing macros/adapters.sql
2020-11-09 22:41:12.183946 (MainThread): Parsing macros/materializations/snapshot_merge.sql
2020-11-09 22:41:12.188553 (MainThread): Parsing macros/core.sql
2020-11-09 22:41:12.193896 (MainThread): Parsing macros/materializations/helpers.sql
2020-11-09 22:41:12.205495 (MainThread): Parsing macros/materializations/snapshot/snapshot_merge.sql
2020-11-09 22:41:12.208681 (MainThread): Parsing macros/materializations/snapshot/strategies.sql
2020-11-09 22:41:12.228941 (MainThread): Parsing macros/materializations/snapshot/snapshot.sql
2020-11-09 22:41:12.271125 (MainThread): Parsing macros/materializations/seed/seed.sql
2020-11-09 22:41:12.299178 (MainThread): Parsing macros/materializations/incremental/helpers.sql
2020-11-09 22:41:12.301574 (MainThread): Parsing macros/materializations/incremental/incremental.sql
2020-11-09 22:41:12.318498 (MainThread): Parsing macros/materializations/common/merge.sql
2020-11-09 22:41:12.351718 (MainThread): Parsing macros/materializations/table/table.sql
2020-11-09 22:41:12.360024 (MainThread): Parsing macros/materializations/view/view.sql
2020-11-09 22:41:12.368574 (MainThread): Parsing macros/materializations/view/create_or_replace_view.sql
2020-11-09 22:41:12.375325 (MainThread): Parsing macros/etc/get_custom_alias.sql
2020-11-09 22:41:12.376571 (MainThread): Parsing macros/etc/query.sql
2020-11-09 22:41:12.378245 (MainThread): Parsing macros/etc/is_incremental.sql
2020-11-09 22:41:12.381212 (MainThread): Parsing macros/etc/datetime.sql
2020-11-09 22:41:12.393173 (MainThread): Parsing macros/etc/get_custom_schema.sql
2020-11-09 22:41:12.395858 (MainThread): Parsing macros/etc/get_custom_database.sql
2020-11-09 22:41:12.398614 (MainThread): Parsing macros/adapters/common.sql
2020-11-09 22:41:12.478857 (MainThread): Parsing macros/schema_tests/relationships.sql
2020-11-09 22:41:12.481771 (MainThread): Parsing macros/schema_tests/not_null.sql
2020-11-09 22:41:12.483674 (MainThread): Parsing macros/schema_tests/unique.sql
2020-11-09 22:41:12.487213 (MainThread): Parsing macros/schema_tests/accepted_values.sql
2020-11-09 22:41:12.498646 (MainThread): * Deprecation Warning: dbt v0.17.0 introduces a new config format for the
dbt_project.yml file. Support for the existing version 1 format will be removed
in a future release of dbt. The following packages are currently configured with
config version 1:
 - jaffle_shop

For upgrading instructions, consult the documentation:
  https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-0-17-0

2020-11-09 22:41:12.498840 (MainThread): Sending event: {'category': 'dbt', 'action': 'deprecation', 'label': '909baabc-7612-4869-8c50-a18d9cfa843b', 'property_': 'warn', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f108b80>]}
2020-11-09 22:41:12.499374 (MainThread): Partial parsing not enabled
2020-11-09 22:41:12.530387 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_customers".
2020-11-09 22:41:12.549148 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_payments".
2020-11-09 22:41:12.559049 (MainThread): Acquiring new postgres connection "model.jaffle_shop.stg_orders".
2020-11-09 22:41:12.570599 (MainThread): Acquiring new postgres connection "model.jaffle_shop.dim_customers".
2020-11-09 22:41:12.584907 (MainThread): Acquiring new postgres connection "model.jaffle_shop.fct_orders".
2020-11-09 22:41:12.594309 (MainThread): Acquiring new postgres connection "model.jaffle_shop.order_payments".
2020-11-09 22:41:12.604708 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_payments".
2020-11-09 22:41:12.613672 (MainThread): Acquiring new postgres connection "model.jaffle_shop.customer_orders".
2020-11-09 22:41:13.141491 (MainThread): Found 8 models, 20 tests, 0 snapshots, 0 analyses, 138 macros, 0 operations, 3 seed files, 0 sources
2020-11-09 22:41:13.144965 (MainThread):
2020-11-09 22:41:13.145323 (MainThread): Merged 6 items from state (sample: ['seed.jaffle_shop.raw_orders', 'model.jaffle_shop.stg_payments', 'seed.jaffle_shop.raw_payments', 'model.jaffle_shop.stg_orders', 'seed.jaffle_shop.raw_customers'])
2020-11-09 22:41:13.145489 (MainThread): Connection 'model.jaffle_shop.customer_orders' was properly closed.
2020-11-09 22:41:13.145797 (MainThread): Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20f100>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ffd0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10f20ff70>]}
2020-11-09 22:41:13.146030 (MainThread): Flushing usage events
2020-11-09 22:41:13.338030 (MainThread): Encountered an error:
2020-11-09 22:41:13.338333 (MainThread): 'source.jaffle_shop.public.source1'
2020-11-09 22:41:13.343158 (MainThread): Traceback (most recent call last):
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 124, in main
    results, succeeded = handle_and_check(args)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 202, in handle_and_check
    task, res = run_from_args(parsed)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/main.py", line 255, in run_from_args
    results = task.run()
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 419, in run
    result = self.execute_with_hooks(selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 379, in execute_with_hooks
    self.before_run(adapter, selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 392, in before_run
    self.defer_to_manifest(selected_uids)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/run.py", line 389, in defer_to_manifest
    self.write_manifest()
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/task/runnable.py", line 62, in write_manifest
    self.manifest.write(path)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 762, in write
    self.writable_manifest().write(path)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 741, in writable_manifest
    forward_edges, backward_edges = build_edges(edge_members)
  File "/Users/naveen/.pyenv/versions/3.8.2/envs/dbt-core/lib/python3.8/site-packages/dbt/contracts/graph/manifest.py", line 231, in build_edges
    forward_edges[unique_id].append(node.unique_id)
KeyError: 'source.jaffle_shop.public.source1'
bug state

All 3 comments

Oh boy. Thanks for the repro instructions @nave91, I've been able to replicate this error locally in an even simpler project.

I've confirmed that the error results from dbt run _only_ when:

  • defer is turned on (--defer or DBT_DEFER_TO_STATE=true)
  • the selection criteria _does not include_ at least one non-ephemeral model that directly depended on a now-missing source

The error crops up here, where dbt tries to infer the backward edges of the deferred model and then trips over its missing source key:
https://github.com/fishtown-analytics/dbt/blob/0ce2f41db47e7e538f727f0ef57f89fffdab8e25/core/dbt/contracts/graph/manifest.py#L231-L237

What are our options here? I think it's either one of the following:

  • "Defer sources." Instead of looking in the current project for a source matching that key, look in the state manifest's sources. (It has to be there, otherwise the state manifest couldn't have compiled.)
  • Catch this exception and... do nothing with it. At the end of the day, there is no functional difference between a run where an upstream source, beyond the veil of deferral, does exist in the current project, vs. one where it does not.

This error is rare in the "Slim CI" use case because it _only_ crops up if you _exclude_ modified models from selection criteria. It's much more common to run all state:modified, which would successfully capture models whose source() dependencies have changed.

Update: this error also crops up for _ephemeral models_ that have been deleted from the project, but which are understood to have edges within the comparison manifest. I think our best bet here is to handle the missing key as gracefully as possible, and move on.

Captain's log, 12/14/20, 9:48 pm: Per #2954, KeyError also appears when a schema test which uses its arguments to construct its unique_id (e.g. relationships) changes one of its arguments. When it comes time to run the new/modified version of the test, the _old_ test rears its head, presumably from the child_map of the deferred manifest:

Encountered an error:
'test.testy.relationships_model_a_old__args__ref_model_b_'
Was this page helpful?
0 / 5 - 0 ratings