The proper way to atomically replace a view/table in Postgres/Redshift is:
Currently, dbt drops the table inside of the transaction, which can lead to errors like:
ERROR: table 12345 dropped by concurrent transaction
or
could not complete because of conflict with concurrent transaction
Via @teej
See also: this SO post
To fix this, dbt's table and view materializations need to be updated to drop the old model _outside_ of the transaction.
This PR will ultimately require < 5 lines of code change, but since it involves the table and view materializations, we should be very cautious here. Is there a good way to write an integration test for this?
You might be able to simulate this behavior by:
However I think the model would have to take long enough to build so as to ensure that the test doesn't sporadically pass.
You can force this condition to trigger by adding a sleep in the model's SQL. You can sleep in Redshift with:
CREATE OR REPLACE FUNCTION f_sleep (x float)
RETURNS bool IMMUTABLE
AS
$$
from time import sleep
sleep(x)
return True
$$ LANGUAGE plpythonu;
oooh nice idea @teej
We only started seeing this issue when updating from 0.8.0 to 0.9.1.
Is there any workaround? And, do we know which version introduced this issue?
@igrayson we overhauled how model materializations work in 0.9.0, but it looks like the core SQL logic was unchanged.
Ultimately, this is going to be a small code change, but it's a little daunting to make because Redshift's transaction management is pretty finicky.
Happy to point someone in the right direction if they're interested in tackling this, and if not, I think we'll be able to hit it pretty soon. @igrayson @teej @abelsonlive are you folks able to help test an alpha version of this fix when one is ready? That would go a long way towards making me feel good about any prospective fix.
@drewbanin can definitely help test anything you put out... have been a bit swamped but might still take a stab at this if my schedule clears up
thanks @abelsonlive -- we're focused on getting 0.9.2 out, we'll tackle this one for 0.9.3
Sounds good - as discussed on the Slack channel, this could potentially also fix an issue I've been having where DBT is forced to wait indefinitely due to Redshift failing to execute some statements (e.g. DROP VIEW) within a transaction block. When I check the stv_recents Redshift table, the query is left in a 'running' state and doesn't complete.
This change is going to apply to Redshift / Postgres / Snowflake
I could reproduce that bug when specifying a pre-hook at the model level like so: docs

Running with dbt=0.14.3
@louisguitton that's surprising -- can you tell me which database you're using? We have _definitely_ seen this issue when grants are applied to a _schema_, but i've never seen it happen when grants are applied to a specific model.
Eg, this can fail:
models:
project-name:
post-hook: "grant select on all tables in schema {{ this.schema }} to db_reader"
or
models:
project-name:
post-hook: "grant usage on schema {{ this.schema }} to db_reader"
What's the exact error message that you're seeing from the database?
Thanks for following up @drewbanin .
I'm using Redshift, and a pre_hook that creates 1 UDF.
I'm running dbt run --models model_1 model_2 with 8 threads and I get
Completed with 1 error and 0 warnings:
Database Error in model stg_localytics__users_match_viewed (models/staging/localytics/stg_localytics__users_match_viewed.sql)
could not complete because of conflict with concurrent transaction
compiled SQL at target/compiled/of_models/staging/localytics/stg_localytics__users_match_viewed.sql
Due to the error message, I assume this is to be expected: the UDF is created twice (once for model_1 and once for model_2) on separate threads so they end up being conflicting concurrent transactions.
If I run dbt run --models model_1 && dbt run --models model_2 it works fine.
I guess then I'm approaching creating the UDFs wrong.
Ah! ok - are you able to use something like an operation to create these UDFs? Ideally Redshift would let you create UDFs transactionally, but I suppose there isn't a ton of merit to creating the same UDF once for each model anyway