Django-rest-framework: Serializers many to many field not reflecting edits made in PUT/PATCH if prefetch_related used

Created on 22 Jan 2015  Â·  36Comments  Â·  Source: encode/django-rest-framework

If you use a viewset with a queryset using prefetch_related, any update to those prefetched many to many fields is not reflected in the response to the client, although the changes have been saved to the database.

See below code for very simple example:

from django.conf.urls import patterns, include, url
from django.contrib import admin
from django.contrib.auth.models import User
from rest_framework import serializers, viewsets, routers


class UserSerializer(serializers.ModelSerializer):
    class Meta:
        model = User
        fields = ('id', 'username', 'email', 'groups')


class UserViewSet(viewsets.ModelViewSet):
    queryset = User.objects.all().prefetch_related('groups')
    serializer_class = UserSerializer


router = routers.DefaultRouter()
router.register(r'users', UserViewSet)

urlpatterns = patterns('',
    url(r'^', include(router.urls)),
    url(r'^api-auth/', include('rest_framework.urls',
                               namespace='rest_framework')),
    url(r'^admin/', include(admin.site.urls)),
)

With the above code, assume there is an existing user with 2 groups assigned:

{
    "id": 1, 
    "username": "chris", 
    "email": "[email protected]", 
    "groups": [
        1,
        2
    ]
}

Submitting a PUT to /users/1/ changing groups to [1], the response will still show groups as [1, 2], but if you then did a GET request for /users/1/ you would see the reflected changes.

You can see this behaviour using the browseable api also.

Removing prefetch_related from the queryset restores normal behaviour and edits are again reflected in the response.

This example is very trivial, but if people are using prefetch_related to try and optimise their database queries they may encounter this issue.

Bug

Most helpful comment

Documenting this would solve a lot of potential head scratching for others. The users can avoid this problem by just re-fetching the instance from the database. In the update method in the Serializer as long as they have something like below they will be fine.

def update(self, instance, validated_attrs):
    ...
    instance.save()
    instance = self.Meta.model.objects.get(id=instance.id)
    return instance

All 36 comments

Interesting - nice work tracking that down.
Sounds like it might be pretty gnarly to deal with.

There's a closed Django ticket that looks relevant.

This is not a bug, I'm afraid. The results of DB queries that have already been run are never updated automatically by the Django ORM, and that is by design. ...

Changing this would really require an identity mapper, and a very fundamental change to the way the Django ORM works.

Nice find @carltongibson - I guess we _may_ still want to consider if there's anything more graceful we can do here. This _may_ just fall into documentation, as we're unlikely to want to do try to automagically determine when we hit this case, but I'm open to other proposals that can be demonstrated as feasible and not overly complex.

At first guess I'd suspect it'd hard to do better than "implement get_queryset and check request.method before using prefetch_related"

Good call, yup. Of course this is only an issue on writable nested serializers, so something of an edge case.

Documenting this would solve a lot of potential head scratching for others. The users can avoid this problem by just re-fetching the instance from the database. In the update method in the Serializer as long as they have something like below they will be fine.

def update(self, instance, validated_attrs):
    ...
    instance.save()
    instance = self.Meta.model.objects.get(id=instance.id)
    return instance

Indeed - tho unclear if the extra query there would defeat the point of the prefetch related or not.

True, for my case I was only using prefetch_related to speed up GET on large collections, with some nested serialization, an update on a single resource can do without it, at least in my case.

my solution was to to override BaseSerializer (We are still on 2.x.x branch):

def save(self, **kwargs):
        """
        Due to DRF bug with M2M fields we refresh object state from database
        directly if object is models.Model type and it contains m2m fields

        See: https://github.com/tomchristie/django-rest-framework/issues/1556
        """
        self.object = super(BaseSerializer, self).save(**kwargs)

        if isinstance(self, serializers.ModelSerializer):
            model = self.Meta.model
            if model._meta.model._meta.local_many_to_many and self.object.pk:
                self.object = model.objects.get(pk=self.object.pk)
        return self.object

So it only hits database if model had m2m keys. Ideally it should be more complex check, but I was fine with this one due to aggresive caching.

See also #2704, #2727 (and the closed #3028.)
Haven't assessed if both of those should be closed as duplicates of this or not.
Thoughts/review anyone?

It looks like they're all duplicates, except #2704 which seems a slightly different issue. Your suggestion in #3028 of overriding get_queryset to do prefetch only on GET requests, would have worked for my case.

On reflection I _think_ that #2704 and #2727 are duplicates of each other (queryset reevaluation on list field / many=True fields) , but are not the same as this issue. (Output representation does not properly support updates if prefetch_replace has been used.)

This bug just bit us soooo hard! We've had two engineers on it for a week thinking it was a bug in the application not the endpoint.

Example solution:

class InspectionItemViewSet(viewsets.ModelViewSet):
    queryset = InspectionItem.objects.all()

    def get_queryset(self):
        qs = super(InspectionItemViewSet, self).get_queryset()
        if self.action == 'retrieve' or self.action == 'list':
            qs = qs.prefetch_related('tags', 'failures')
            if self.get_depth() >= 1:
                qs = qs.prefetch_related('contractor', 'type')
        return qs

@aidanlister :-/ Prioritizing this as a result. Right now I suppose that the sensible thing to do is just to try to highlight select_related and prefetch_related in the documentation with a suitable warning.

Of course we'll still end up having folks getting hit with this (can't expect that everyone will always have read every last bit of the documentation always, all the time), but at least if we have it documented then it's more likely to be resolved more quickly.

Yeah, given it's not a DRF issue I think a warning in the documentation is probably the best you can do. Keep up the good work Tom!

@tomchristie This is the issue we discussed on Tuesday.

@aaugustin @tomchristie — was there any breakthrough?

My take on this is -- if a request handler makes changes to an object, it should refetch if from the database before serializing it in the response. Otherwise you're always going to fail on some edge cases like changes performed by database triggers.

OK. That ties in with the conclusion we reached (I think)

Thanks!

@pySilver's solution worked for me in DRF 3.1, just had to switch to setting self.instance rather than self.object (per 2.x.x)

class MyModelSerializer(ModelSerializer):
    def save(self, **kwargs):
        """
        Due to DRF bug with M2M fields we refresh object state from database
        directly if model contains m2m fields

        Avoids ModelSerializers returning old results for m2m relations
        when the queryset had calls to prefetch_related

        See: https://github.com/tomchristie/django-rest-framework/issues/1556
             https://github.com/tomchristie/django-rest-framework/issues/2442
        """
        self.instance = super(MyModelSerializer, self).save(**kwargs)

        model = self.Meta.model
        if model._meta.model._meta.local_many_to_many and self.instance.pk:
            self.instance = model.objects.get(pk=self.instance.pk)

        return self.instance

I am facing similar issue with djangorestframework==3.3.1
http://stackoverflow.com/questions/36397357/how-to-update-values-in-instance-when-have-a-custom-update-to-update-many-to

I have three models Player, Team, and Membership, Where Player and Team have many-to-many relationship using Membership as intermediary model.

class Player(models.Model):
    name = models.CharField(max_length=254)
    rating = models.FloatField(null=True)
    install_ts = models.DateTimeField(auto_now_add=True, blank=True)
    update_ts = models.DateTimeField(auto_now_add=True, blank=True)


class Team(models.Model):
    name = models.CharField(max_length=254)
    rating = models.FloatField(null=True)
    players = models.ManyToManyField(
            Player,
            through='Membership',
            through_fields=('team', 'player'))
    is_active = models.BooleanField(default=True)
    install_ts = models.DateTimeField(auto_now_add=True, blank=True)
    update_ts = models.DateTimeField(auto_now_add=True, blank=True)


class Membership(models.Model):
    team = models.ForeignKey('Team')
    player = models.ForeignKey('Player')
    #date_of_joining = models.DateTimeField()
    install_ts = models.DateTimeField(auto_now_add=True, blank=True)
    update_ts = models.DateTimeField(auto_now_add=True, blank=True)

Now I was required to update this membership using django rest framework. I tried update those using Writable nested serializers by writing a custom .update() of team serializer.

@transaction.atomic
def update(self, instance, validated_data):
    '''
    Cutomize the update function for the serializer to update the
    related_field values.
    '''

    if 'memberships' in validated_data:
        instance = self._update_membership(instance, validated_data)

        # remove memberships key from validated_data to use update method of
        # base serializer class to update model fields
        validated_data.pop('memberships', None)

    return super(TeamSerializer, self).update(instance, validated_data)


def _update_membership(self, instance, validated_data):
    '''
    Update membership data for a team.
    '''
    memberships = self.initial_data.get('memberships')
    if isinstance(membership, list) and len(memberships) >= 1:
        # make a set of incoming membership
        incoming_player_ids = set()

        try:
            for member in memberships:
                incoming_player_ids.add(member['id'])
        except:
            raise serializers.ValidationError(
                'id is required field in memberships objects.'
            )

        Membership.objects.filter(
            team_id=instance.id
        ).delete()

        # add merchant member mappings
        Membership.objects.bulk_create(
            [
                Membership(
                    team_id=instance.id,
                    player_id=player
                )
                for player in incoming_player_ids
            ]
        )
        return instance
    else:
        raise serializers.ValidationError(
                'memberships is not a list of objects'
            )

Now this works well for updating values in database for membership table. Only problem I am facing is that I am not able to update prefetched instance in memory, which on PATCH request to this API updates values in database but API response is showing outdated data.

I have encountered the save problem.
What' the status?

Bumping the milestone on this to re-assess. Not at all clear what we can do, but needs looking at.

So clearly we need to refetch in this case, but... can we determine if a prefetch_related has been applied to the queryset?

?

Mists of time... I recall referencing a closed Django issue on this one... It said to the effect, "User needs to be aware, and prepared to refresh themselves if using prefetch_related"

(Would find issue but Current status is: Ripped front off of iPad whilst fixing screen, so I can't really 😬)

@carltongibson - Yup, but we're doing this in the generic views, so the user isn't able to handle it themselves. We either need to see "don't use prefetch_related in XYZ situation" which is complicated, and users will anyways miss. Or we automatically handle the re-fetch on their behalf.

Ripped front off of iPad whilst fixing screen

"fixing" :-/

"Replacing" then :-)

Something like

we_need_to_do_a_refresh = hasattr(obj, '_prefetched_objects_cache')

Is that the thought?

Yup, something along those lines.

How can I achieve this manually in the meantime? Is the right place to handle this in the serializer or in the views?

Simplest solution is _don't_ use prefetch related on PUT/PATCH - you can handle that in get_queryset.

Alternatively write the view manually, and have a second call to get_object which is used for populating the response data.

@karesh The discussion group is the best place to take this discussion and other usage questions. Thanks!

Thanks.

Was this page helpful?
0 / 5 - 0 ratings