I'm using marshmallow with flask, which has a jsonify() function that sorts keys, and i'm using that to return the validation errors to the client. In some situations, the dicts with validation errors returned by marshmallow combine int and str keys, which results in a TypeError while attempting to serialize them.
Here's a somewhat minimized test case:
import json
import marshmallow as ma
class RoleSchema(ma.Schema):
name = ma.fields.Str()
class UserSchema(ma.Schema):
roles = ma.fields.Nested(RoleSchema, many=True)
user, errors = UserSchema().load({'roles':['name']})
print(errors)
print(json.dumps(errors))
print(json.dumps(errors, sort_keys=True))
Output:
{'roles': {0: {}, '_schema': ['Invalid input type.']}}
{"roles": {"0": {}, "_schema": ["Invalid input type."]}}
Traceback (most recent call last):
File "asd.py", line 13, in <module>
print(json.dumps(errors, sort_keys=True))
File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
TypeError: unorderable types: str() < int()
I think the relevant function is ErrorStore.get_errors(index=0)
https://github.com/marshmallow-code/marshmallow/blob/2.9.0/marshmallow/marshalling.py#L46
Stringifying the indexes in that function fixes it for me. Not sure if this is the best way to fix it though.
--- a/marshalling.py
+++ b/marshalling.py
@@ -45,6 +45,7 @@
def get_errors(self, index=None):
if index is not None:
+ index = str(index)
errors = self.errors.get(index, {})
self.errors[index] = errors
else:
It looks like @maximkulkin introduced the list indexing behavior in #382. The PR originally used string indexes, but used integers at the request of @sloria.
Not being able to sort the error dict keys is unfortunate, but currently the error dictionary is allowed to have integer keys, so it is not guaranteed to be orderable.
Stringifying the indexes in that function fixes it for me. Not sure if this is the best way to fix it though.
That is what I would recommend. You could try to coax a different error out of your schema by altering the validation, but ensuring the error keys are orderable seems like a better long term solution.
Thanks for reporting your issue. If there is anything else I can help with let me know.
Actually, I'm curious if JSON libraries should automatically convert dictionary keys to strings. It seems like a right thing to do since strings is only form allowed by JSON specification. @dequis Maybe you can send request for this feature to JSON library that you're using.
@maximkulkin it's the python built in json module. Kinda hard to change.
A compromise would be for marshmallow to give list errors their own dictionary, so the integer keys are isolated from string keys.
json.dumps({'roles': {'_items': {0: {}}, '_schema': ['Invalid input type.']}}, sort_keys=True)
# '{"roles": {"_items": {"0": {}}, "_schema": ["Invalid input type."]}}'
It does seem like a Python bug, because json.dumps() is going to cast all the keys to strings anyway. Python could have done this before trying to compare the keys.
I would rather fix this by post processing errors output recursively and changing all non-string dictionary keys into strings. OR you can just skip sorting errors output.
It would be nice to maintain compatibility with built-in python modules, especially when they are directly related to serialization.
This is a python 3 specific problem that has an open bug report.
https://bugs.python.org/issue25457
Python 3 (justifiably) said that incomparable types should be incomparable rather than silently behaving in non-intuitive ways, hiding errors.
A decision was made in python 3 to stop supporting dicts with incomparable types. They aren't explicitly disallowed, but they aren't supported.
Since mixing list indexes with dictionary keys is a weird thing to do in the first place, I propose wrapping the list indexes in a dict called _items. I'm not sure if this change would need to go into v3. #382 seems to indicate that it is OK to change the error structure without a major version bump.
I would opt for changing indexes to strings and not go after various creative ways to skrew everything up. Actually Im not sure what was the rationale for changing reported indexes to integers (per @sloria)
UserSchema().load({'roles':['name']})
{'roles': {0: {}, '_schema': ['Invalid input type.']}}
I'm not sure about the pertinence of this output structure anyway.
From a naive perspective, I would say that either what is passed for roles is not an array, then I expect
{'roles': ['Invalid input type.']}
or an array is passed with wrong types in it and I should get
{'roles': {0: ['Invalid input type.']}}
Should Nested be modified to produce error structures like this?
The output of List(Nested(Schema)) for this use case is interesting:
{'roles': {0: {'_schema': ['Invalid input type.']}}}
That is probably how Nested(many=True) should be formatted. Roles shouldn't have a schema error, because it is correctly supplied as a list. This isn't the first time we have ran into inconsistencies with many=True where List(Nested) did the right thing. https://github.com/marshmallow-code/marshmallow/issues/315#issuecomment-228788849
This isn't the first time we have ran into a inconsistencies with many=True where List(Nested) did the right thing.
Nested(many=True) is somewhat surprising to a new user. Since List(Nested()) is available, I wonder why Nested(many=True) even exists in the first place.
I suppose removing Nested(many=True) would be a bit of a revolution, but does it bring anything List(Nested()) doesn't?
It looks like Nested(many=True) is a very old feature. I am generally against having two ways to do the same thing, but maybe there is a valid use case for providing a variable to the many argument. It would be nice if many=True was just a thin wrapper around List(Nested(Schema)) under the hood. I bet that would be a net code reduction. :fire:
IIUC, Nested(many=True) is instantiated at Schema definition time. At this point, you could use a condition to choose between Nested and List(Nested). It would look a bit ugly, but that's quite a corner case anyway. Or am I missing some use case?
This issue can probably be removed from the v3 milestone.