When there are views in the DAGs e.g. (view_root -> view_middle -> view_leaf)
A dbt run causes problems because:
view_root completes, it'll drop all its dependent views, i.e. view_middle and view_leaf in this case.view_middle and view_leaf will not be re-created until they're run.view_root might run very early in a run while view_middle and view_leaf run hours later and causes these views to not be available.view_middle and view_leaf are broken while dbt is running.Created a sample dbt project to reproduce: https://github.com/nickwu241/dbt_reproduce_drop_view_cascade_bug
It will
view_root -> view_middle -> view_leafSELECT * FROM pg_sleep(10)view_middle and view_leaf are unavailable after view_root runs.10 to be 30 if you need more time to see the databaseI expect dbt to not drop the dependent views until the whole run has finished.
I expect DROP VIEW view_root_backup CASCADE; won't happen until all dependent views are re-created.
dbt run --profiles-dir .
Running with dbt=0.15.2
Found 3 models, 0 tests, 0 snapshots, 0 analyses, 125 macros, 1 operation, 0 seed files, 1 source
09:57:30 |
09:57:30 | Running 1 on-run-start hook
09:57:30 | 1 of 1 START hook: dbt_reproduce_drop_view_cascade_bug.on-run-star... [RUN]
09:57:30 | 1 of 1 OK hook: dbt_reproduce_drop_view_cascade_bug.on-run-start.0... [CREATE TABLE in 0.00s]
09:57:30 |
09:57:30 | Concurrency: 1 threads (target='dev')
09:57:30 |
09:57:30 | 1 of 3 START view model public.view_root............................. [RUN]
09:57:41 | 1 of 3 OK created view model public.view_root........................ [CREATE VIEW in 10.14s] <<<<<<<<<<< view_middle and view_leaf will be dropped here
09:57:41 | 2 of 3 START view model public.view_middle........................... [RUN]
09:57:51 | 2 of 3 OK created view model public.view_middle...................... [CREATE VIEW in 10.08s]
09:57:51 | 3 of 3 START view model public.view_leaf............................. [RUN]
09:58:02 | 3 of 3 OK created view model public.view_leaf........................ [CREATE VIEW in 10.06s]
09:58:02 |
09:58:02 | Finished running 3 view models, 1 hook in 31.74s.
Completed successfully
Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
Which database are you using dbt with?
The output of dbt --version:
installed version: 0.15.2
latest version: 0.15.2
Up to date!
The operating system you're using:
macOS Mojave 10.14.6
The output of python --version:
Python 3.7.3
Let me know if you require more information or have any questions! You can reach me on the DBT slack as Nick Wu as well.
Thank you!
hey @nickwu241 - this is a good question! There isn't any mechanism in Postgres to define a view as "late bound" or "unbound" or anything like that. dbt unfortunately _must_ drop cascade the root tables, resulting in downstream views being dropped during the run. Here are some examples of how this is addressed on modern data warehouses:
My recommendation would be to either:
analytics_new), then swap the analytics schema for the analytics_new schema at the end of the run (this is called a blue/green deployment)Closing this as I don't anticipate making any changes to dbt to support this use-case on Postgres, but I do want to validate that the issue you're seeing is real, it's just not one that Postgres makes particularly easy for dbt to support today.
@drewbanin Thank you for the detailed response and also with the recommendations, I appreciate it very much!
We are actually currently using Redshift for analytics but I produced the issue in Postgres because it's easier to reproduce with docker.
So for the short term I'll use bind: False for those views. And we'll definitely also consider the 2 other recommendations of
Thanks again!
Most helpful comment
hey @nickwu241 - this is a good question! There isn't any mechanism in Postgres to define a view as "late bound" or "unbound" or anything like that. dbt unfortunately _must_
drop cascadethe root tables, resulting in downstream views being dropped during the run. Here are some examples of how this is addressed on modern data warehouses:My recommendation would be to either:
analytics_new), then swap theanalyticsschema for theanalytics_newschema at the end of the run (this is called a blue/green deployment)Closing this as I don't anticipate making any changes to dbt to support this use-case on Postgres, but I do want to validate that the issue you're seeing is real, it's just not one that Postgres makes particularly easy for dbt to support today.