Dbt: Error: Dictionary changed size during iteration

Created on 11 Sep 2019  路  7Comments  路  Source: fishtown-analytics/dbt

Describe the bug

I鈥檝e encountered this error twice so far in two days, coincidentally(?) since I upgraded to 0.14.1, and saw it on dbt Cloud. The error is:
Completed with 1 error and 0 warnings: dictionary changed size during iteration
In both cases this happened when doing dbt test --models source:*
(Reference: runs #1683050 and #1678913 in dbt Cloud)

Screenshots and log output

Backtrace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/dbt/node_runners.py", line 196, in safe_run
result = self.compile_and_execute(manifest, ctx)
File "/usr/local/lib/python3.6/dist-packages/dbt/node_runners.py", line 134, in compile_and_execute
ctx.node = self.compile(manifest)
File "/usr/local/lib/python3.6/dist-packages/dbt/node_runners.py", line 300, in compile
return compile_node(self.adapter, self.config, self.node, manifest, {})
File "/usr/local/lib/python3.6/dist-packages/dbt/compilation.py", line 221, in compile_node
node = compiler.compile_node(node, manifest, extra_context)
File "/usr/local/lib/python3.6/dist-packages/dbt/compilation.py", line 118, in compile_node
compiled_node, self.config, manifest)
File "/usr/local/lib/python3.6/dist-packages/dbt/context/runtime.py", line 150, in generate
model, runtime_config, manifest, None, Provider())
File "/usr/local/lib/python3.6/dist-packages/dbt/context/common.py", line 496, in generate
return generate_model(model, config, manifest, source_config, provider)
File "/usr/local/lib/python3.6/dist-packages/dbt/context/common.py", line 471, in generate_model
source_config, provider)
File "/usr/local/lib/python3.6/dist-packages/dbt/context/common.py", line 408, in generate_base
"graph": manifest.to_flat_graph(),
File "/usr/local/lib/python3.6/dist-packages/dbt/contracts/graph/manifest.py", line 235, in to_flat_graph
k: v.serialize() for k, v in self.nodes.items()
File "/usr/local/lib/python3.6/dist-packages/dbt/contracts/graph/manifest.py", line 235, in
k: v.serialize() for k, v in self.nodes.items()
File "/usr/local/lib/python3.6/dist-packages/dbt/api/object.py", line 63, in serialize
return copy.deepcopy(self._contents)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 239, in _deepcopy_dict
for key, value in x.items():
RuntimeError: dictionary changed size during iteration

System information

Which database are you using dbt with?

  • [ ] postgres
  • [ ] redshift
  • [ ] bigquery
  • [x] snowflake
  • [ ] other (specify: ____________)

The output of dbt --version:

installed version: 0.14.1
   latest version: 0.14.1

The operating system you're using:
The error occured on dbt Cloud

The output of python --version:

Additional context

Add any other context about the problem here.

bug

All 7 comments

Thanks @MartinGuindon!

@beckjake have you seen something like this before? Looks like there's some concurrent mutation of the manifest that happens during a dbt test (and probably other types of invocations too)

Ugh, to_flat_graph again. I'm pretty confident that if we changed the node runners to explicitly initialize the flat graph instead of trying to lazy-load it this would go away, because we'd never be doing a deepcopy while a node is updating information about itself - I assume that's the problem.

ah - so we could fix that by removing

"graph": manifest.to_flat_graph(),

from the context? We should do that as soon as we're able to provide a more sane alternative.

I recall that we memoized the to_flat_graph() response in 0.14.1 -- is it surprising that @MartinGuindon is still seeing this on 0.14.1? Does this mean that two threads are calling into to_flat_graph() at the same time before the flat_graph is cached?

RE: removing it, I'd love to but it's something people actually use.

Does this mean that two threads are calling into to_flat_graph() at the same time before the flat_graph is cached?

Yeah, so we should just build it in before_run or somewhere similar. Anywhere that happens before we start run would be fine, if you have more than zero nodes selected you'll need to call it anyway.

@drewbanin To my knowledge, I've actually never encountered that isssue _before_ upgrading to 0.14.1. Also I've had quite a few occurrences in the last 2 days.

Runs #1698615, #1696637, #1696312, #1693258, #1692100, #1691359, #1687687, #1687106, #1686503... and maybe more, I stopped looking after page 2.

Thanks for that additional info @MartinGuindon - I've tentatively prioritized this for our Louisa May Alcott (0.15.0) release, but we might try to slip this in earlier too. Will follow up here with more info as we have it!

This was fixed in https://github.com/fishtown-analytics/dbt/pull/1750

We're going to cut an 0.14.3 release which includes this fix next week!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

beckjake picture beckjake  路  3Comments

drewbanin picture drewbanin  路  3Comments

chrisburrell picture chrisburrell  路  3Comments

jgillies picture jgillies  路  3Comments

boxysean picture boxysean  路  3Comments