Django-rest-framework: `initial_data` not as expected with many=True

Created on 21 Aug 2017  Â·  17Comments  Â·  Source: encode/django-rest-framework

Checklist

  • [X] I have verified that that issue exists against the master branch of Django REST framework.
  • [X] I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • [X] This is not a usage question. (Those should be directed to the discussion group instead.)
  • [X] This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • [X] I have reduced the issue to the simplest possible case.
  • [ ] I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

Create a test serializer that uses initial_data:

class TestSerializer(serializers.Serializer):
    number = serializers.IntegerField()
    def validate(self, attrs):
        print self.initial_data
        return attrs

Use it:

>>> serializer = TestSerializer(data={'number': 1})
>>> serializer.is_valid()
{'number': 1}

initial_data is a dict as expected

Use it with many = True:

>>> list_serializer = TestSerializer(data=[{'number': 1}, {'number': 1}], many=True)
>>> list_serializer.is_valid()
[{'number': 1}, {'number': 2}]
[{'number': 1}, {'number': 2}]

initial_data is a list, even though as far as the TestSerializer class is concerned, it operates on individual objects.

Expected behavior

The ListSerializer should update the initial_data of it's child serializer as it is iterating. The expected output would be:

>>> list_serializer = TestSerializer(data=[{'number': 1}, {'number': 1}], many=True)
>>> list_serializer.is_valid()
{'number': 1}
{'number': 2}

Actual behavior

Both times the TestSerializer class is used it gets initial_data as a list. Therefore the class needs to know that it was called from a ListSerializer context to operate. This is surprising.

Bug help wanted

Most helpful comment

I have a need for this feature (correct value as self.initial_data with Many=True) in a different use-case: not in the validators as previously discussed here, but in the update() method:
I would like to recursively call children serializers there.

For that, I need to pass part of the initial_data to the child serializer, and with many=True I get the full list instead of the current item, with no way to know which item I am on.

I cannot simply pass the validated_data to the child serializer as at that point validated_data is in internal format ("complex data", "complex type"), and the child serializer data expects external data ("native Python datatype"): both representations may be incompatible (for example when a child uses a non-default source Field constructor parameter).

I currently have to do a big hack: try to quickfix the fields names difference between external and internal formats (for non-default source fields) before calling the child serializer...

I also stumbled upon the bug on ListSerializer.get_initial():

return self.to_representation(self.initial_data)

I have no idea why it calls self.to_representation with initial_data: to_representation expects internal format/complex data as input, not external format/native python datatype. It then returns as external format, what we initially expects: couldn't we just simply return self.initial_data?

@carltongibson Should I open a new issue? Or reopen this issue? Or is this comment enough?

All 17 comments

Even worst when calling self.get_initial() instead of self.initial_data.
Note for later, documentation mentions the use of initial_data

@daggaz Any chance you could bundle your examples here into a failing test case in a PR? (That would be amazing! 👍)

@daggaz I'm looking at this now. It's a curious one.

Can I ask, why are you trying to access initial_data in validate()?

In ListSerializer.to_internal_value we already iterate the incoming data and pass the current _item_ to the child serialiser for validation etc. Your attrs parameter will be that data item (as it came through to_internal_value). In theory you should be accessing that...

This seems a limitation of .initial_data in this context. (As per the docs it really does get unmodified data...)

We should document this.

As @xordoquy notes, from the child, we do get an error calling get_initial():

self.get_initial()
*** AttributeError: 'list' object has no attribute 'get'

... because we're expecting a dict here.

We should patch this up.

I'm going to de-milestone this for now. I'm not sure on an exact fix at this moment and it doesn't strike me as urgent.

I ran into this problem when attempting to validate that data posted to the API didn't contain any extra keys, something like:

def validate(self, attrs):
    unknown =  set(self.initial_data) - set(self.fields)
    if unknown:
        raise ValidationError("Unknown field(s): {}".format(", ".join(unknown)))
    return attrs

I would probably argue that the ListSerializer instance (the one created in-place of the actual serializer class you instantiate) should get the unmodified list, and the ListSerializer's child serializer should get the unmodified list items.

It seems wrong that code in a serializer that works fine with many=False stops working with many=True.

As I said originally, as far as the a serializer class is concerned, it operates on individual objects, not lists.

@daggaz Thanks for the follow up.

Taking the second point first, yes, in theory, I agree with you. I looked at it. It's just not something I can justify allocating time to as a priority.

There is not one child serialiser per data item. There's one child serialiser which we re-use for each item. So we could keep track of the index while enumerating and update initial_data accordingly, and write all the tests for that. But... well... _meh_...

Taking your particular case: let clients send whatever fields they like: the serialiser will only accept/use the values it's expecting. As with Django Forms, that's one of it's core features. attrs will never contain unknown fields — that's one part of the service to_internal_value performs. So, be liberal in what you accept, why check? (I see you want this: I'm just expressing how I see it here.)

The quickest workaround seems to be to be prepared for the dual usage and validate the whole list either in parent serialiser (validate_<list_field>) or the first time you hit the child. (You might want a flag if you did that...) — I grant neither of these are perfect, but I don't think what you're trying to do counts as intended usage.

Would happily review a PR resolving this _properly™_.

I understand the priority issue. Fair enough.

With regards to the use case point, take for example a partial update, where a user of your API has misspelled the field name. It seems like a validation error would be appropriate, rather than a silently broken (in terms of the users in intention) update.

OK. Fair enough.

As I say, I'd be very happy to look at a PR. It would need to have a small footprint — this is core code vs something I'm not sure we officially™ support — but it doesn't hurt to look.

I'm going to close this off. It's not something I want to treat as an active priority. For me, it's an oddity/edge-case which should be treated as a known limitation.

As I said, I'm very happy to review a PR with a small footprint here.

For anyone wishing to pick this up in the future, please check out #5407 which contains a test case.

There's also #5537, which takes a look at this. I closed that on the basis that:

  • I didn't think the change justified the extra complexity. (i.e. that "small footprint" above)
  • And, it would be easy to make into a third-party package.

We can review that if there's serious demand for this. (I'm just not currently convinced there is.)

I have a need for this feature (correct value as self.initial_data with Many=True) in a different use-case: not in the validators as previously discussed here, but in the update() method:
I would like to recursively call children serializers there.

For that, I need to pass part of the initial_data to the child serializer, and with many=True I get the full list instead of the current item, with no way to know which item I am on.

I cannot simply pass the validated_data to the child serializer as at that point validated_data is in internal format ("complex data", "complex type"), and the child serializer data expects external data ("native Python datatype"): both representations may be incompatible (for example when a child uses a non-default source Field constructor parameter).

I currently have to do a big hack: try to quickfix the fields names difference between external and internal formats (for non-default source fields) before calling the child serializer...

I also stumbled upon the bug on ListSerializer.get_initial():

return self.to_representation(self.initial_data)

I have no idea why it calls self.to_representation with initial_data: to_representation expects internal format/complex data as input, not external format/native python datatype. It then returns as external format, what we initially expects: couldn't we just simply return self.initial_data?

@carltongibson Should I open a new issue? Or reopen this issue? Or is this comment enough?

Hi @thomas-riccardi. I don't think anything has changed here. It's not at all clear that the extra complexity merits the use-case for the core framework. You are welcome to create you own ListSerializer class to provide this behaviour. If it looks good you can always open a PR to see if it'd be worth including, but likely we'd advise putting it in a third-party package.

@carltongibson thanks for the reply. I understand your reservations about including all this in core. I just wanted to document a different use-case, and also talk about the side-issue of get_initial(), which could maybe be fixed in the core framework independently of the main issue.

@thomas-riccardi Happy to review a PR with a concrete suggestion... 🙂

For the get_initial() issue: what is the goal of calling to_representation? The quickfix for AttributeError: 'list' object has no attribute 'get' may be to just return self.initial_data, but it would not be much useful indeed (just less confusing/incoherent).

I have a solution for this, but how do I contribute. I am completely new to open source contributions for projects like django. Any guidances would be helpful

Hi @ajaidanial. If you're completely new to open source, then I'd recommend reading GitHub's introductory guide to their platform. That will give you the basics on how to fork a project and open a Pull Request, which is how you submit contributions.

As to the contents of the pull request, you'll need to submit the fix as well as a test case that demonstrates the failure. You can look at both #5407 and #5537 and adapt their test cases for you own PR.

In this specific case, keep in mind Carlton's https://github.com/encode/django-rest-framework/issues/5345#issuecomment-344522817 above - the fix shouldn't be overly complicated.

Was this page helpful?
0 / 5 - 0 ratings