master branch of Django REST framework.Create a test serializer that uses initial_data:
class TestSerializer(serializers.Serializer):
number = serializers.IntegerField()
def validate(self, attrs):
print self.initial_data
return attrs
Use it:
>>> serializer = TestSerializer(data={'number': 1})
>>> serializer.is_valid()
{'number': 1}
initial_data is a dict as expected
Use it with many = True:
>>> list_serializer = TestSerializer(data=[{'number': 1}, {'number': 1}], many=True)
>>> list_serializer.is_valid()
[{'number': 1}, {'number': 2}]
[{'number': 1}, {'number': 2}]
initial_data is a list, even though as far as the TestSerializer class is concerned, it operates on individual objects.
The ListSerializer should update the initial_data of it's child serializer as it is iterating. The expected output would be:
>>> list_serializer = TestSerializer(data=[{'number': 1}, {'number': 1}], many=True)
>>> list_serializer.is_valid()
{'number': 1}
{'number': 2}
Both times the TestSerializer class is used it gets initial_data as a list. Therefore the class needs to know that it was called from a ListSerializer context to operate. This is surprising.
Even worst when calling self.get_initial() instead of self.initial_data.
Note for later, documentation mentions the use of initial_data
@daggaz Any chance you could bundle your examples here into a failing test case in a PR? (That would be amazing! 👍)
@daggaz I'm looking at this now. It's a curious one.
Can I ask, why are you trying to access initial_data in validate()?
In ListSerializer.to_internal_value we already iterate the incoming data and pass the current _item_ to the child serialiser for validation etc. Your attrs parameter will be that data item (as it came through to_internal_value). In theory you should be accessing that...
This seems a limitation of .initial_data in this context. (As per the docs it really does get unmodified data...)
We should document this.
As @xordoquy notes, from the child, we do get an error calling get_initial():
self.get_initial()
*** AttributeError: 'list' object has no attribute 'get'
... because we're expecting a dict here.
We should patch this up.
I'm going to de-milestone this for now. I'm not sure on an exact fix at this moment and it doesn't strike me as urgent.
I ran into this problem when attempting to validate that data posted to the API didn't contain any extra keys, something like:
def validate(self, attrs):
unknown = set(self.initial_data) - set(self.fields)
if unknown:
raise ValidationError("Unknown field(s): {}".format(", ".join(unknown)))
return attrs
I would probably argue that the ListSerializer instance (the one created in-place of the actual serializer class you instantiate) should get the unmodified list, and the ListSerializer's child serializer should get the unmodified list items.
It seems wrong that code in a serializer that works fine with many=False stops working with many=True.
As I said originally, as far as the a serializer class is concerned, it operates on individual objects, not lists.
@daggaz Thanks for the follow up.
Taking the second point first, yes, in theory, I agree with you. I looked at it. It's just not something I can justify allocating time to as a priority.
There is not one child serialiser per data item. There's one child serialiser which we re-use for each item. So we could keep track of the index while enumerating and update initial_data accordingly, and write all the tests for that. But... well... _meh_...
Taking your particular case: let clients send whatever fields they like: the serialiser will only accept/use the values it's expecting. As with Django Forms, that's one of it's core features. attrs will never contain unknown fields — that's one part of the service to_internal_value performs. So, be liberal in what you accept, why check? (I see you want this: I'm just expressing how I see it here.)
The quickest workaround seems to be to be prepared for the dual usage and validate the whole list either in parent serialiser (validate_<list_field>) or the first time you hit the child. (You might want a flag if you did that...) — I grant neither of these are perfect, but I don't think what you're trying to do counts as intended usage.
Would happily review a PR resolving this _properly™_.
I understand the priority issue. Fair enough.
With regards to the use case point, take for example a partial update, where a user of your API has misspelled the field name. It seems like a validation error would be appropriate, rather than a silently broken (in terms of the users in intention) update.
OK. Fair enough.
As I say, I'd be very happy to look at a PR. It would need to have a small footprint — this is core code vs something I'm not sure we officially™ support — but it doesn't hurt to look.
I'm going to close this off. It's not something I want to treat as an active priority. For me, it's an oddity/edge-case which should be treated as a known limitation.
As I said, I'm very happy to review a PR with a small footprint here.
For anyone wishing to pick this up in the future, please check out #5407 which contains a test case.
There's also #5537, which takes a look at this. I closed that on the basis that:
We can review that if there's serious demand for this. (I'm just not currently convinced there is.)
I have a need for this feature (correct value as self.initial_data with Many=True) in a different use-case: not in the validators as previously discussed here, but in the update() method:
I would like to recursively call children serializers there.
For that, I need to pass part of the initial_data to the child serializer, and with many=True I get the full list instead of the current item, with no way to know which item I am on.
I cannot simply pass the validated_data to the child serializer as at that point validated_data is in internal format ("complex data", "complex type"), and the child serializer data expects external data ("native Python datatype"): both representations may be incompatible (for example when a child uses a non-default source Field constructor parameter).
I currently have to do a big hack: try to quickfix the fields names difference between external and internal formats (for non-default source fields) before calling the child serializer...
I also stumbled upon the bug on ListSerializer.get_initial():
return self.to_representation(self.initial_data)
I have no idea why it calls self.to_representation with initial_data: to_representation expects internal format/complex data as input, not external format/native python datatype. It then returns as external format, what we initially expects: couldn't we just simply return self.initial_data?
@carltongibson Should I open a new issue? Or reopen this issue? Or is this comment enough?
Hi @thomas-riccardi. I don't think anything has changed here. It's not at all clear that the extra complexity merits the use-case for the core framework. You are welcome to create you own ListSerializer class to provide this behaviour. If it looks good you can always open a PR to see if it'd be worth including, but likely we'd advise putting it in a third-party package.
@carltongibson thanks for the reply. I understand your reservations about including all this in core. I just wanted to document a different use-case, and also talk about the side-issue of get_initial(), which could maybe be fixed in the core framework independently of the main issue.
@thomas-riccardi Happy to review a PR with a concrete suggestion... 🙂
For the get_initial() issue: what is the goal of calling to_representation? The quickfix for AttributeError: 'list' object has no attribute 'get' may be to just return self.initial_data, but it would not be much useful indeed (just less confusing/incoherent).
I have a solution for this, but how do I contribute. I am completely new to open source contributions for projects like django. Any guidances would be helpful
Hi @ajaidanial. If you're completely new to open source, then I'd recommend reading GitHub's introductory guide to their platform. That will give you the basics on how to fork a project and open a Pull Request, which is how you submit contributions.
As to the contents of the pull request, you'll need to submit the fix as well as a test case that demonstrates the failure. You can look at both #5407 and #5537 and adapt their test cases for you own PR.
In this specific case, keep in mind Carlton's https://github.com/encode/django-rest-framework/issues/5345#issuecomment-344522817 above - the fix shouldn't be overly complicated.
Most helpful comment
I have a need for this feature (correct value as
self.initial_datawithMany=True) in a different use-case: not in the validators as previously discussed here, but in theupdate()method:I would like to recursively call children serializers there.
For that, I need to pass part of the
initial_datato the child serializer, and withmany=TrueI get the full list instead of the current item, with no way to know which item I am on.I cannot simply pass the
validated_datato the child serializer as at that pointvalidated_datais in internal format ("complex data", "complex type"), and the child serializerdataexpects external data ("native Python datatype"): both representations may be incompatible (for example when a child uses a non-defaultsourceField constructor parameter).I currently have to do a big hack: try to quickfix the fields names difference between external and internal formats (for non-default
sourcefields) before calling the child serializer...I also stumbled upon the bug on
ListSerializer.get_initial():I have no idea why it calls
self.to_representationwithinitial_data:to_representationexpects internal format/complex data as input, not external format/native python datatype. It then returns as external format, what we initially expects: couldn't we just simply returnself.initial_data?@carltongibson Should I open a new issue? Or reopen this issue? Or is this comment enough?