Sqlalchemy: document the use of populate_existing() with with_for_update()

Created on 9 Sep 2020  路  2Comments  路  Source: sqlalchemy/sqlalchemy

Describe the bug

Calling this a bug may be a stretch but I'm not sure where else to file it.. this is somewhere between bug and feature request.

Imagine a function which takes a primary key of a database row as its only argument. This function is responsible for incrementing the value of a particular field of the row identified by that key.

If the function is called in two processes simultaneously, it should increment the field twice. To prevent these simultaneous executions of this method from interfering with one another, it is useful to use 'FOR UPDATE' queries to lock the row.

You end up with a function looking something like this:

session = my_connect()

def my_increment(row_id):
    instance = session.query(MyModel).filter(MyModel.id == row_id).with_for_update().one()
    instance.increment_this_field += 1
    session.commit()

Upon a naive reading of this function, it seems like nothing could go wrong.

Now, imagine we have a script which calls this method like so:

row_id = 888
instance = session.query(MyModel).get(row_id)
my_increment(instance.id)

And imagine two of those scripts run nearly simultaneously. Upon doing so, you'll find that the field is not incremented by 2, but by 1.

The reason is, when the second execution calls one(), it does not return the object returned by the FOR UPDATE query, instead it returns the object from the session's "identity map". This is despite the fact that SQLAlchemy actually fetches the fresh row from the cursor. That precious row is discarded once SQLAlchemy sees the object exists in the identity map.

I found this to be very counterintuitive; the row retrieved by with_for_update().one() should be guaranteed to be the most up-to-date, locked row, but there's no way to guarantee you will get that row instead of a stale version in the identity map.

All the workarounds lead to the object getting fetched twice unnecessarily. For example my_increment could look like this:

    ...
    instance = session.query(MyModel).filter(MyModel.id == row_id).with_for_update().one()
    session.refresh(instance)
    ...

Or my_increment could require an instance parameter:

```
def my_increment(instance):
session.refresh(instance, with_for_update=True)
instance.increment_this_field += 1
session.commit()
````

But again this would mean fetching the instance twice, once to pass it in, and a second time for the locking refresh

I discovered this issue in our production credit card processing code, where it caused a customer to be billed twice, and it took many hours to figure out how this happened.

So again it's not really a "bug" in the traditional sense, but I found it to be very counter-intuitive and misleading, and it makes using with_for_update() require double-fetching objects when the contents of the identity map are unknown, something the manual never warns users about.

documentation duplicate question

All 2 comments

this has been brought up before, kind of a documenation issue for the moment. the other issue is #4774.

short answer, use populate_existing() when you use with_for_update():

https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=populate_existing#sqlalchemy.orm.query.Query.populate_existing

also i know it's a compelling argument that the populate_existing switch should be inherent in the with_for_update() operation, but I'm not ready to do that part yet. there is actually a compelling argument that populate_existing should be the default in all cases, if it could be changed to not overwrite attributes that have end-user changes on them. once we move to the 1.4/2.0 querying model fully, a lot of these questions will have a clearer space in which they can be answered.

Was this page helpful?
0 / 5 - 0 ratings