Django-rest-framework: Use `update_fields` on PATCH requests.

Created on 7 Mar 2015 · 17Comments · Source: encode/django-rest-framework

See #2622.

Needs design decision

Source

tomchristie

👍1

Most helpful comment

I know this is closed, but as we recently got bitten by this I thought I'd just pitch in saying that it's absolutely non-obvious that "partial_update()" in DRF will actually read/write the entire DB row. We were operating under the assumption the partial_update indeed only updates the partial fields to the DB.

bwesen on 24 Feb 2020

👍9

All 17 comments

Yes please :+1:
There's no reason why the serializer should save all the fields by default.

thedrow on 9 Mar 2015

Be interesting to see what code exactly would need to change to support this.
That'd help us make the assessment of if it's a worthwhile change.

tomchristie on 23 Jun 2015

I just found this thread through a random Google search, wondering if DRF supports such a thing. I'll contribute some thoughts I had on this.

update_fields based on the fields specified via a PATCH request will work, except for cases where .save is overridden, or a pre_save signal is used, etc., which could cause some field not among the list of fields specified for PATCH to not be saved to a database, when they should. I think it would be possible to add update_fields support like this, but also add support for specifying something like a list of fields which should always be updated for PATCH, or maybe a method for returning the list of fields to use for update_fields.

So in my imagined codebase, it would use update_fields by default, but mention in the documentation "override get_update_fields(...) if you want to change which fields will be saved."

Just some thoughts I had on this.

w0rp on 20 Jan 2016

👍1

Thanks @w0rp.

except for cases where .save is overridden, or a pre_save signal is used, etc.

Right. Those are good enough reasons for use to _not_ do this in core. If anyone wants this then it ought to be as an optional third party package, that eg provides custom serializer classes.

tomchristie on 20 Jan 2016

As i wrote on #2622 i believe that decision is right as i did not consider the full complexity of messing with the save functionality when i made the original PR. Using a case specific serializer is clearly the best approach here.

I would however suggest that this is documented more explicit. For certain scenarios the likelihood of hitting a race condition here (if a custom serializer is not used) is quite high.

Writing something on the serialializers api-guide or alike would be recommendable.

askholme on 21 Jan 2016

👍2

Welcome to consider submissions. Otherwise probably not high enough on our todo list right now to treat as an active issue.

tomchristie on 21 Jan 2016

@tomchristie I did not yet have a chance to try this myself but would it be possible for DRF to use transaction.atomic() and .select_for_update() to serialize PATCH requests touching the same record? I'm not sure how viable that is given that most of the implementation is method-agnostic.

patrys on 30 Aug 2017

@patrys - Possibly, yes. In discussion at https://github.com/encode/django-rest-framework/issues/4675.

tomchristie on 18 Sep 2017

select_for_update seems like a better tack here than update_fields. If nothing else it's a nice simple approach. I think we'd want to rely on the user deciding if they want to set ATOMIC_REQUESTS themselves, rather than introducing an explicit transaction ourself. Possibly make sure to document that both in the docs, and as a comment in the get_queryset method on the generic view.

tomchristie on 18 Sep 2017

I recently open PR for adding this as well (#6391). @rpkilby was kind enough to point me to this discussion here.

I believe there are two main concerns raised here:

update_fields won't work with m2m fields (Django raises an error if you attempt this). I don't believe this is actually an issue, as currently all the m2m fields are handled separately in the update method. And thus they are being saved separately already.

        for attr, value in validated_data.items():
            if attr in info.relations and info.relations[attr].to_many:
                field = getattr(instance, attr)
                field.set(value)
            else:
                setattr(instance, attr, value)
        instance.save()

Users having overridden save methods or custom pre_save signals, where other fields are updated . This is the unfortunate one :(

The imagined scenario would be a save method like:

    def save(self, *args, **kwargs):
        self.some_field= "Foo"
        super().save(*args, **kwargs)

If this method is called with update_fields then indeed some_field will NOT be updated, if it is not included update_fields. This seems like an extremely easy thing to miss, and it is made even worse by Django documentation, which advocates the usage of the magic variables in these cases:

Django will, from time to time, extend the capabilities of built-in model methods, adding new arguments. If you use args, *kwargs in your method definitions, you are guaranteed that your code will automatically support those arguments when they are added.

Thus obfuscating the fact that update_fields exists as an argument and that it can have an impact. So in the end, sadly, I do have to agree that adding this proposed changed would lead to a lot of headaches to a lot of users of this great library.

However I do believe that having a race condition like this can also lead to issues. It seems like the kind of problem a lot of people might have run into, but can be easily brushed off as a fluke.

It might be worth to (a) make this an option to turn on or (b) add a warning about this in the documentation and/or point a third-party extension that does this. E.g. I've seen drf-extensions's mentioned in these discussions, although I haven't tried it out to see how it works.

Would you be open to adding any of these suggestions? I'd be happy to contribute if so.

alexkiro on 5 Jan 2019

Hi @alexkiro - thanks for the additional writeup. I'm sure this will be useful in the future to anyone tackling the issue.

In regards to overridden save methods, this might be relevant: https://github.com/encode/django-rest-framework/issues/6471#issuecomment-466925465

Any answer here will be complicated, because there are a lot of caveats to how objects are created in Django's ORM.

rpkilby on 2 Mar 2019

❤1 👍1

Hi there,

Thank you all for the interesting discussion.
I would like to add a comment regarding the concerns mentioned by @alexkiro

I think the main problem comes from bad habits in overriding the save() method.
Indeed, the django.db.models.base.Model.save method signature is the following:

    def save(self, force_insert=False, force_update=False, using=None,
             update_fields=None):

Overriding with def save(self, *args, **kwargs) is fast, easy, and... not very rigorous (I have to admit I almost always do it :) ).
Optional parameters are meant to be used for overloading, not overriding.
Overriding with a different signature is not a good practice.

As a suggestion, I attach a python file containing an inherited serializer that overrides the update() method and uses a rigorous call to instance.save()
mymodelserializer.py.zip

Of course, this does not solve the problem mentioned in the previous posts when save() is overridden or when a pre_save signal is connected.
However, I think this is the developer responsibility, DRF default behaviour should be rigorous and comply with django ORM's (including its caveats).

Overriding save() using the correct signature makes visible that one must handle update_fields:

  def save(self, force_insert=False, force_update=False, using=None,
             update_fields=None):
    self.other_field = "test"
    update_fields.append("other_field")
    super().save(force_insert, force_update, using, update_fields)

I might be wrong or miss something. What do you think?

aboutofpluto on 31 Oct 2019

Hi @aboutofpluto. I think there are two items to consider:

The downstream effects of this change.
If update_fields is a sensible default.

As to the former, this is well explained in https://github.com/encode/django-rest-framework/issues/2648#issuecomment-451697353. The reality is that it's very common to see save() methods and pre_save handlers that don't account for update_fields, and by introducing this change in DRF, it's going to generate a lot of confusing headaches.

Regardless, even in an ideal world where we didn't have to consider the above, I'm not convinced that it makes sense to use update_fields by default. It's a tool to be used by the developer, and it's perfectly valid to not take advantage of it. Even though its use is sensible in a lot of cases, it can negatively impact others.

Given the above, I don't think it makes sense to change the serializer's default behavior. That said, it might be worth considering an argument to save()/update() that enables this (but is disabled by default). I'm not sure what exactly this would look like however.

rpkilby on 1 Nov 2019

Hi @rpkilby,
Thank you for your answer. I agree.

Just to explain why I am interesting in update_fields, I use post_save hooks to update fields depending on related models and fields. These hooks trigger an update only if some specific fields are updated. Having update_fields saves a lot of resources (mainly CPU and database queries).

Anyway, here is a version of the serializer using update_fields as an option with default to False. It gives three possibilities (I have not commented nor docstringed yet):

False: Default instance.save()
True: Use validated_data to build update_fields
sequence of strings: Use the parameter as update_fields to instance.save()

mymodelserializer.py.zip

aboutofpluto on 1 Nov 2019

bwesen on 24 Feb 2020

👍9

Has anyone come up with a proper reusable (pluggable) solution? I don't care about custom save's (if they are breaking the contract it's their fault, after all), but I do care both about race conditions and sub-par performance when DRF pulls/pushes the entire set of fields when only one is patched.