Django-rest-framework: Guard against calling `serializer.data` before `serializer.save()`

Created on 12 Aug 2015 · 13Comments · Source: encode/django-rest-framework

It appears that accessing serializer.data has side effects. Merely accessing the property inside perform_create() will cause the response data to change. I created a simple Django project with a test case that illustrates the bug: django-rest-bug.

The expected behavior is for response.data to contain the serialized version of the created object:

class AlbumViewSet(mixins.CreateModelMixin, viewsets.GenericViewSet):

    # ... snip ...

    def perform_create(self, serializer):
        serializer.save()

# >>> post_data = {'title': 'The Wall', 'artist': 'Pink Floyd'}
# >>> response = self.client.post(reverse('album-list'), post_data)
# >>> print response.data
# {'id': 1, 'artist': u'Pink Floyd', 'title': u'The Wall'}

But when serializer.data is present it will cause response.data to be equal to the initially posted data (ex: the id is missing from response):

class AlbumViewSet(mixins.CreateModelMixin, viewsets.GenericViewSet):

    # ... snip ...

    def perform_create(self, serializer):
        serializer.data  # This line being present changes the value of `response.data`
        serializer.save()

# >>> post_data = {'title': 'The Wall', 'artist': 'Pink Floyd'}
# >>> response = self.client.post(reverse('album-list'), post_data)
# >>> print response.data
# {'artist': u'Pink Floyd', 'title': u'The Wall'}

Even if serializer.data is not "safe", I think it's incorrect for property access to result in side effects. I _think_ the culprit is somewhere in BaseSerializer.data but am not certain.

Cleanup

Source

kylefox

All 13 comments

I'd be interested in hearing why you want to access the serializer.data within the view.

xordoquy on 12 Aug 2015

👀1 😕1

The flow you've demonstrated there isn't intended usage, and it's possible that its something we should guard against. (Ie raise an exception if .save() is called _after_ accessing .data)

It's valid to access either .initial_data or .validated_data prior to .save(), but it shouldn't be valid to access .data and then subsequently call .save().

What's happening internally is that serializer.data is serializing the _unsaved_ instance, because you've called it prior to save.

When we properly restrict the flow then effectively our properties don't have side effects, but this is a case we've obviously left unguarded.

tomchristie on 12 Aug 2015

Retitled this issue to reflect the change we should make.

tomchristie on 12 Aug 2015

@xordoquy I was attempting to use it for some validation/permission logic before creating the object. Let's say a User is only allowed to create Messages attached to a Project they have access to. In that case MessageViewSet. perform_create would look like this:

class MessageViewSet(mixins.CreateModelMixin, viewsets.GenericViewSet):

    def perform_create(self, serializer):
        project = serializer.data['project']
        if project in self.request.user.projects.all():
            serializer.save(project=project)
        else:
            raise exceptions.PermissionDenied

It feels wrong to put this into the view, but I wasn't sure where else to put it. I guess this might be a case for creating a validation method in my serializer which uses .context to access request.user and validate there?

@tomchristie Good to know, thanks. I actually found, through experimentation, that .validated_data worked properly in this scenario. Glad to know this is the correct way to access it :) Though as I mentioned above I think now I understand the correct way to implement what I'm trying to do (.context). I suppose it's not _as critical_ to prevent side effects when this flow/scenario isn't intended usage :sweat_smile:

kylefox on 12 Aug 2015

Not critical no, but preferable, so let's keep this open for now.

tomchristie on 12 Aug 2015

I guess this might be a case for creating a validation method in my serializer which uses .context to access request.user and validate there?

Yup. Validation should really be kept inside of the serializer. Or at least it should happen before perform_create is called.

I usually recommend people limit the [serializer_field].queryset property during the initialization of the serializer so the user gets an error if they try to set the relation to something invalid.

kevin-brown on 12 Aug 2015

👍1

@kevin-brown Thanks for pointing me in that direction — this is what I wound up with in my serializer:

class MessageSerializer(serializers.ModelSerializer):

    class Meta:
        model = Message
        fields = ('id', 'project', 'body')

    def __init__(self, *args, **kwargs):
        super(MessageSerializer, self).__init__(*args, **kwargs)
        self.user = self.context['request'].user
        self.fields['project'].queryset = self.user.projects.all()

    def validate_project(self, project):
        if project not in self.user.projects.all():
            raise exceptions.PermissionDenied
        return project

Because I'm already limiting the queryset in __init__, maybe I can remove validate_project() entirely...

kylefox on 13 Aug 2015

👍1

If the side effect of calling .data is just the caching of the representation on _data, wouldn't it make sense that .save() just invalidates the cache, instead of refusing to perform the save?

dyeray on 22 May 2017

@dyeray one shouldn't access serializer.data before saving. That's a sign of a possible bigger issue although one may not be aware of it already. It is just not meant for that.

xordoquy on 22 May 2017

I don't understand why in tutorial http://www.django-rest-framework.org/tutorial/3-class-based-views/ the example shows exactly the opposite.
```
serializer = SnippetSerializer(data=request.data)
if serializer.is_valid():
serializer.save()

```If I try this example, I get the exception "You cannot call '.save()' after accessing serializer.data. So what's wrong ?