Sqlalchemy: SQLAlchemy does not expire models on commit/does not update models with external changes

Created on 28 May 2019  路  4Comments  路  Source: sqlalchemy/sqlalchemy

The gist of the issue is explained in detail in this stackoverflow question. In short, SQLAlchemy does not update the model instance after external transaction has changed the value in the database even with expire_on_commit set to True and SELECT ... FOR UPDATE to retrieve the current value is issued within transaction.

I can provide any additional information if needed,

question

All 4 comments

My question is, why does this happen at all? There are several factors that seem to contradict the current behavior:

I haven't run your script but on visual inspection I would assume it's because your loop is emitting an UPDATE for an exact value of "field" after the external transaction has completed. Note that your UPDATE is not emitting "UPDATE table SET value = value +1", it's emitting "UPDATE table set value = <exact value>"

the order then is:

script transasction begin()
SELECT for UPDATE
t.field access
UPDATE t.field = <new value>
commit + expire
print t.field, this does t.field SELECT again (turning on echo=True will reveal this) , unexpires the value outside of your transaction
run external program
external program emits UPDATE on table
script transaction begin
SELECT for UPDATE, blocks
external program COMMMIT
SELECT for UPDATE proceeds
loads row
row already exists in the identity map, has already loaded value 27 in [1], value is not modified

Note that your script relies upon autocommit=True in order to have the "confusing" behavior; if you had transaction blocks in all cases, normal "repeatable read" transaction isolation would make the steps more clear. This is why autocommit=True is strongly advised against in the documentation and is in fact deprecated in 2.0 for complete removal.

I hope this helps to clarify!

Hello, @zzzeek! Thank you very much for your reply. I still don't quite understand why the script does not work and why you mention autocommit as a primary source for confusion. Let me lay out in details what I see when the script runs and then, hopefully, you will be able to point out the flaw in my logic.

1) First, let's consider the case when the external transaction commits while the script sleeps (for this I've increased sleep time to 5 seconds to be able to complete the actions in database shell). Let's say that the last value that the script had before going to sleep was 25 and the external transaction committed 100 while the script was sleeping. This means that when the script wakes up, the value in the database is guaranteed to be 100. In fact, that's what I see in the rows fetched from the database when I turn sqlalchemy debug logging on. However, first print after SELECT FOR UPDATE still shows that the model field has the value 25:

INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s FOR UPDATE
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 24)
t.field value is 24
t.field value before flush is 25
INFO:sqlalchemy.engine.base.Engine:UPDATE test_models SET field=%(field)s WHERE test_models.id = %(test_models_id)s
INFO:sqlalchemy.engine.base.Engine:{'field': 25, 'test_models_id': 1}
t.field value after flush is 25
INFO:sqlalchemy.engine.base.Engine:COMMIT
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 25)
t.field value after transaction is 25
Sleeping for 5 seconds.

Here I execute begin; update test_models set field = 100 where id=1; commit; in database shell. Next iteration of the loop looks as follows

INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s FOR UPDATE
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100)
t.field value is 25
t.field value before flush is 26
INFO:sqlalchemy.engine.base.Engine:UPDATE test_models SET field=%(field)s WHERE test_models.id = %(test_models_id)s
INFO:sqlalchemy.engine.base.Engine:{'field': 26, 'test_models_id': 1}
t.field value after flush is 26
INFO:sqlalchemy.engine.base.Engine:COMMIT
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 26)
t.field value after transaction is 26
Sleeping for 5 seconds.

Note the line DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100) . It seems that sqlalchemy has retrieved 100 from the database, but still gets 25 from model fields.

2) Second, if the external transaction has updated the row but did not commit when the script has waken up, then SELECT FOR UPDATE will block. Postgres default isolation level, read committed, will block the SELECT FOR UPDATE and return updated value (100) when the external transaction commits. This is indeed the case and I can even see that SELECT FOR UPDATE retrieves 100 from the database, but the model's field still has 25:

INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s FOR UPDATE
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 23)
t.field value is 23
t.field value before flush is 24
INFO:sqlalchemy.engine.base.Engine:UPDATE test_models SET field=%(field)s WHERE test_models.id = %(test_models_id)s
INFO:sqlalchemy.engine.base.Engine:{'field': 24, 'test_models_id': 1}
t.field value after flush is 24
INFO:sqlalchemy.engine.base.Engine:COMMIT
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 24)
t.field value after transaction is 24
Sleeping for 5 seconds.

Here I execute begin; update test_models set field = 100 where id=1;. When the script wakes up, it blocks with the following output:

INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s FOR UPDATE
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}

If here I commit the external transaction, the script will output the rest:

DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100)
t.field value is 24
t.field value before flush is 25
INFO:sqlalchemy.engine.base.Engine:UPDATE test_models SET field=%(field)s WHERE test_models.id = %(test_models_id)s
INFO:sqlalchemy.engine.base.Engine:{'field': 25, 'test_models_id': 1}
t.field value after flush is 25
INFO:sqlalchemy.engine.base.Engine:COMMIT
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}
DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field')
DEBUG:sqlalchemy.engine.base.Engine:Row (1, 25)
t.field value after transaction is 25
Sleeping for 5 seconds.

Note again the line DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100) that again implies that sqlalchemy retrieved correct value from the database but refused to refresh the instance.

The interesting part is that if I remove the line

        print(f"t.field value after transaction is {t.field}")

from the script, so that there is no access to the model outside of transaction, then everything works as expected in both of the cases above. This leads me to the conclusion that after expired model has been refreshed while processing attribute access in print, SELECT FOR UPDATE has no effect on the model instance as, presumably, sqlalchemy thinks that the model is in its most recent state and does not need refreshing.

As far as I can understand, in your message you were trying to convey the same point: that is, model holds the most recent value and does not need refreshing. However, I find this behavior highly confusing and error-prone. Could you please clarify why sqlalchemy behaves in such a way and what role does autocommit play here?

Note the line DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100) . It seems that sqlalchemy has retrieved 100 from the database, but still gets 25 from model fields.

that is correct. the value of 100 is ignored by default, we already have this row.

From https://docs.sqlalchemy.org/en/13/orm/session_basics.html#querying:

When Query returns results, each object instantiated is stored within the identity map. When a row matches an object which is already present, the same object is returned. In the latter case, whether or not the row is populated onto an existing object depends upon whether the attributes of the instance have been expired or not. A default-configured Session automatically expires all instances along transaction boundaries, so that with a normally isolated transaction, there shouldn鈥檛 be any issue of instances representing data which is stale with regards to the current transaction.

if you want to have the value overwritten on a per-query basis without previously expiring the objects, use the populate_existing method.

1. Second, if the external transaction has updated the row but did not commit when the script has waken up, then `SELECT FOR UPDATE` will block. [Postgres default isolation level](https://www.postgresql.org/docs/9.5/transaction-iso.html), `read committed`, will block the `SELECT FOR UPDATE` and return updated value (`100`) when the external transaction commits. This is indeed the case and I can even see that `SELECT FOR UPDATE` retrieves `100` from the database, but the model's field still has `25`:

same thing

INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SELECT test_models.id AS test_models_id, test_models.field AS test_models_field 
FROM test_models 
WHERE test_models.id = %(param_1)s FOR UPDATE
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}

If here I commit the external transaction, the script will output the rest:

`` DEBUG:sqlalchemy.engine.base.Engine:Col ('test_models_id', 'test_models_field') DEBUG:sqlalchemy.engine.base.Engine:Row (1, 100) t.field value is 24 t.field value before flush is 25 Note again the lineDEBUG:sqlalchemy.engine.base.Engine:Row (1, 100)` that again implies that sqlalchemy retrieved correct value from the database but refused to refresh the instance.

same thing

The interesting part is that if I remove the line

        print(f"t.field value after transaction is {t.field}")

from the script, so that there is no access to the model outside of transaction, then everything works as expected in both of the cases above.

yes, because in this case you are using the Session outside of a transaction (there is still a transaction, but it only lasts long enough for the SELECT to occur). This is the "autocommit=True" pattern that is confusing and deprecated.

This leads me to the conclusion that after expired model has been refreshed while processing attribute access in print, SELECT FOR UPDATE has no effect on the model instance as, presumably, sqlalchemy thinks that the model is in its most recent state and does not need refreshing.

"SELECT FOR UPDATE" does not affect ORM models except indirectly. it only means that the "FOR UPDATE" string will be added to the actual SELECT statement. in this case, it prevents your SELECT from proceeding until the row is available and will also ensure that once in the block, other UPDATE statements will be blocked from modifying the row.

As far as I can understand, in your message you were trying to convey the same point: that is, model holds the most recent value and does not need refreshing. However, I find this behavior highly confusing and error-prone. Could you please clarify why sqlalchemy behaves in such a way and what role does autocommit play here?

This is discussed at the FAQ entry here:

https://docs.sqlalchemy.org/en/13/faq/sessions.html#i-m-re-loading-data-with-my-session-but-it-isn-t-seeing-changes-that-i-committed-elsewhere

if you turn autocommit=False, then all operations on the Session are implicitly within a transaction block that does not complete until you call session.commit() or session.rollback(). Try writing your program in this way and see if you can make it do something non-intuitive that is beyond what REPEATABLE READ isolation normally does. the line of code which you have invoking "outside" of a transaction would now be invoking inside of a "session / commit/rollback" block. At the end of this block, the number "25" would be erased and your SELECT FOR UPDATE would use the value of 100.

For a further deep dive you can check out my talk at https://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth .

I hope my answers have been helpful if not disappointing? reopen if you have more issues thanks!

Was this page helpful?
0 / 5 - 0 ratings