Say I have following schema definition
from marshmallow import Schema, fields
class ObjectSchema(Schema):
id = fields.Integer(required=True)
name = fields.String(required=True)
class CollectionSchema(Schema):
data = fields.Nested(ObjectSchema, many=True, required=True)
Here's how this schema works for several invalid inputs
In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})
In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={})
In [8]: CollectionSchema().load({'data': [{}]})
Out[8]: UnmarshalResult(data={'data': [{}]}, errors={'data': {0: {'name': ['Missing data for required field.'], 'id': ['Missing data for required field.']}}})
IMO Out[8] is the only correct validation error from the examples - it correctly describes the problem - all required fields are missing in nested item. For other examples I actually expect something like:
In [6]: CollectionSchema().load({})
Out[6]: UnmarshalResult(data={}, errors={'data': ['Missing data for required field.']})
shows that 'data' field is required and missing. I'm missing this one badly.
2.
In [7]: CollectionSchema().load({'data': []})
Out[7]: UnmarshalResult(data={'data': []}, errors={'data': ['Field may not be empty list.']})
shows that field contains empty list - for my usecase it's the same as missing field. Well, this one is arguable, maybe Nested field requires another parameter to enforce validation errors in case of empty nested value.
So currently marshmallow cannot distinguish between these three cases which complicates automated error handling in JSON api I'm working on now - as I said errors do not describe the problem properly. What do you think - is it an issue or I'm missing something and there's better approach to this problem?
Thanks for the thorough report, @chekunkov .
For case 2: required validation applies only to missing inputs; empty lists, empty strings, zero, etc. are still considered input. If you want to disallow empty lists, you could do:
from marshmallow import Schema, fields, validate
class ObjectSchema(Schema):
id = fields.Integer(required=True)
name = fields.String(required=True)
class CollectionSchema(Schema):
data = fields.Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'),
many=True,
required=True)
print(CollectionSchema().load({'data': []}).errors)
# {'data': ['Field may not be an empty list']}
Case 1 is less clearly cut. #235 was implemented such that nested required fields would be validated if the outer field was missing. @max-orhai, can you comment on the rationale for this decision?
For the time being, you could work around the issue with a custom Nested field:
from marshmallow import Schema, fields, validate, missing
class Nested(fields.Nested):
def _validate_missing(self, value):
if value is missing and getattr(self, 'required', False):
self.fail('required')
return super()._validate_missing(value)
class ObjectSchema(Schema):
id = fields.Integer(required=True)
name = fields.String(required=True)
class CollectionSchema(Schema):
data = Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'),
many=True,
required=True)
print(CollectionSchema().load({}).errors)
# {'data': ['Missing data for required field.']}
The rationale for _validate_missing is as follows:
Consider a web API using a nested schema in which both inner and outer fields are required, so valid JSON data looks like (e.g.) {outerRequired: {innerRequired: true}}.
Now suppose an API client submits a request containing valid data for
the outer schema, but missing the key for the inner schema, like
{outerRequired: {extra: true}}. Clearly, the client should be informed
that the _inner_ schema was missing a required field. That was how
Marshmallow worked before #235.
But what if the client submits a request, say {extra: true}, that
doesn't have a key for the _outer_ field? Before #235, the client would
have to submit an additional request just to discover the inner
required field. After #235, the client can be informed in a single
response of _everything_ missing from the request data, now matter how
many levels deep.
In short, _validate_missing is an aid to discoverability. I hope that
clears things up for you.
On Sun, Nov 8, 2015, at 05:18 PM, Steven Loria wrote:
Thanks for the thorough report, @chekunkov[1] .
For case 2: required validation applies only to missing inputs; empty
lists, empty strings, zero, etc. are still considered input. If you
want to disallow empty lists, you could do:from marshmallow import Schema, fields, validate
class ObjectSchema(Schema):
id = fields.Integer(required=True) name =
fields.String(required=True)class CollectionSchema(Schema):
data = fields.Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'),
many=True, required=True)print(CollectionSchema().load({'data': []}).errors)
{'data': ['Field may not be an empty list']}
Case 1 is less clearly cut. #235[2] was implemented such that nested
required fields would be validated if the outer field was missing. @max-
orhai[3], can you comment on the rationale for this decision?For the time being, you could work around the issue with a custom
Nested field:from marshmallow import Schema, fields, validate, missing
class Nested(fields.Nested):
def _validate_missing(self, value): if value is missing and
getattr(self, 'required', False): self.fail('required')
return super()._validate_missing(value)class ObjectSchema(Schema):
id = fields.Integer(required=True) name =
fields.String(required=True)class CollectionSchema(Schema):
data = Nested(ObjectSchema,
validate=validate.Length(min=1,
error='Field may not be an empty list'), many=True,
required=True)print(CollectionSchema().load({}).errors)
{'data': ['Missing data for required field.']}
— Reply to this email directly or view it on GitHub[4].
Links:
thanks for perfect examples and workarounds @sloria !
But what if the client submits a request, say {extra: true}, that
doesn't have a key for the _outer_ field?
@max-orhai honestly I don't think it's a good idea to overcomplicate "A lightweight library for converting complex objects to and from simple Python datatypes" to handle edge cases like this. maybe it works for your example, when nested field contains required single nested object, but above I provided an example of the case where this advanced deep validation leads to a confusion. by saving some time for client with some particular schema this validation gives ambiguous error responses for others.
what if nested fields is a list, what if it can be empty {outer: []}? in this case the only valid response for submitted request {extra: True} would be
{'outer': ['Missing data for required field.'],
'extra': ['Rogue field]}
not
{'outer': {0: {'innerRequiredField': ['Missing data for required field.']}},
'extra': ['Rogue field]}
or what if outer should be of length 10? following your explanation current validation logic should returns something like
{'outer':
{0: {'innerRequiredField': ['Missing data for required field.']}},
{1: {'innerRequiredField': ['Missing data for required field.']}},
{2: {'innerRequiredField': ['Missing data for required field.']}},
# ...
{9: {'innerRequiredField': ['Missing data for required field.']}},
}
but instead it returns {'outer': ['Length must be between 10 and 10.']} which doesn't look consistent with that error example you provided - which once again leads to a confusion.
@chekunkov Thanks for sharing your concerns. This feature is helpful to my project's clients, which is why I submitted #235. I don't think it is overcomplicated, but it should be easy to disable: I'd be happy to help review a pull request that made _validate_missing an optional parameter to the Nested field constructor, if you submit one.
I agree that the interaction between missing data for Nested, many=True fields and the Length validator could be enhanced, but I don't see any ambiguity beyond what's already present without _validate_missing, assuming that _a list of required data_ should never be empty. Suppose we have the following schema:
class Inner(Schema):
num = fields.Number
url = fields.URL
class Outer(Schema):
inners = fields.Nested(Inner, many=True, required=True, validate=Length(min=2, max=2))
Now, when deserializing {} through an Outer, we get back the error {'inners': ['Length must be between 2 and 2.']} -- which is valid, although not quite as helpful as we might like, since Length doesn't know anything about the Inner schema. If it did, we could have the more helpful
{'inners': [
{'num': ['Missing data for required field.', 'Must be a number'],
'url': ['Missing data for required field.', 'Must be a URL']},
{'num': ['Missing data for required field.', 'Must be a number'],
'url': ['Missing data for required field.', 'Must be a URL']}
]}
Note that the list of errors for the 'inners' field contains exactly two dicts, each in the format described by the Inner schema, just like valid data for that field would. If inner was constructed without the Length validator, _validate_missing places a single dict in that list, again under the assumption that a list of _required_ dicts should not be empty. If an empty list _should_ be acceptable data there, say to let a client explicitly declare the absence of something, then it would indeed be useful to disable _validate_missing. If that is actually a common case, then maybe this is a feature that should have to be explicitly enabled.
For what it's worth, I think that much of the confusion around this issue originates in the many=True convention for Nested fields. I'd prefer to be able to write fields.List(fields.Nested(inner_schema)) instead: in this case, it would make it clearer that the outer list itself is required, apart from any validation of its length or contents.
@max-orhai Thanks for the detailed explanation.
I don't think it is overcomplicated, but it should be easy to disable: I'd be happy to help review a pull request that made _validate_missing an optional parameter to the Nested field constructor, if you submit one.
Sounds fair. Unfortunately I've chosen another library for my project, so I'm not sure I'll be able to submit a PR in the nearest future - it requires a bit deeper diving into the sources and I don't have much time to do that now. But maybe I'll try to use Marshmallow again for the next project and chances are I'll have a chance to spend more time contributing to it.
I think the intention behind the deeply nested error messages is well founded. It does not provide so much additional information that it hinders usability.
I would suggest using a custom validator that creates less verbose errors instead of changing the required errors.
The validation for required is handled separately because it must precede execution of the validators callables. If a required value is not passed, validation should fail fast (as there is no value to pass to the rest of the validators).
After reading https://github.com/marshmallow-code/marshmallow/issues/334#issuecomment-205313000 I realized that the required validator provides a shortcut behavior that a custom validator cannot. It seems like the custom error string returned by a required callable in #334 would override the nested error dictionary, which would resolve this issue.
@max-orhai @deckar01 This functionality is all ways broken.
Consider the case when you have deeply nested JSON structure:
{'foo': {'bar': {'baz': {'bam': {'quux': true}}}}}
User sends request
{'foo': {'bap': {'baz': {'bam': {'quux': true}}}}}
Imagine his confusion when he gets error like that ?!
{'foo': {'bar': {'baz': {'bam': {'quux': ['Missing required field']}}}}}
Then user starts complaining about API being insane and all ways broken, because the value for 'quux' is just there.
What he really should get is
{'foo': {'bar': ['Missing required field']}}
As with required fields, you could have a bunch of validators that will trigger only if some fields are present (e.g. validating string with regex). Even if user will know after sending request once which fields are required, he will still need to do multiple iterations to figure out what type should particular field be and what are other constraints for it exist.
More of this in #495
It might be a good idea to fall back to the original behavior by default and require an explicit parameter to enable more verbose error messages. I think it is important for error messages to be clear and concise by default.
I would suggest to disable it completely because:
1) It looks questionable
2) It complicates code
3) It does not work with Nested schemas inside other containers (e.g. fields.List)
Quick comment from my side, since this issue currently has the status "Feedback welcome": I agree with @chekunkov and @maximkulkin that the current functionality is very surprising/weird. I was expecting mm.fields.Nested(SomeSchema, many=True, required=True) to give me strict feedback when that field was missing completely. Instead I get error messages about missing data within the list, even though nothing was provided at all for that field.
I would strongly prefer if the change from #235 was reverted back to the old, simple behavior (or at least that should be the default behavior). Marshmallow shouldn't try to be too clever in what it does. If people want this advanced functionality, it should be on an opt-in basis.
In general, I think a good approach for the API-design of a validation-library should be to favor strictness and simplicity over convenience. Don't do any nested validation of data that doesn't even exist.
I'm running into a few issues with Marshmallow, as I don't see any way to make a field required, even if it's length can be zero.
I have a bulk request, and a valid bulk request is a request with no operations. However, there is no way I can express this in Marshmallow.
As a workaround I've created a new Nested class:
class NestedAllowEmpty(fields.Nested):
def _validate_missing(self, value):
if value is missing and getattr(self, 'required', False):
self.fail('required')
if len(value) > 0:
return super()._validate_missing(value)
Please reconsider the default behavior.
I was just caught by this while working on https://github.com/marshmallow-code/marshmallow/issues/378 and I also think current behaviour is surprising and this change should be reverted.
I disagree with the rationale presented in https://github.com/marshmallow-code/marshmallow/issues/319#issuecomment-166770421. There are ways to document an API, for instance using apispec to generate an OpenAPI doc. I can't really picture my clients using this trial and error approach to learn what data is required. (OK, in fact, I'm pretty sure that's what they do, we all do... But this is wrong.) The error message is not meant to document the API. We don't expect the clients to send garbage just to get an error message specifying what the server expects. Besides, for those using the trial and error approach, reverting this change will just make it a few more steps.
I'd like to change this in v3.0. Any objection?
To be explicit, I'm talking about case 1 in OP. Regarding case 2, I agree with @sloria's answer, this is the right behaviour.
I just proposed a PR that removes the feature: #754.
I agree with @lafrech 's analysis above. Thanks for the PR. Let's make it so.
Most helpful comment
I was just caught by this while working on https://github.com/marshmallow-code/marshmallow/issues/378 and I also think current behaviour is surprising and this change should be reverted.
I disagree with the rationale presented in https://github.com/marshmallow-code/marshmallow/issues/319#issuecomment-166770421. There are ways to document an API, for instance using apispec to generate an OpenAPI doc. I can't really picture my clients using this trial and error approach to learn what data is required. (OK, in fact, I'm pretty sure that's what they do, we all do... But this is wrong.) The error message is not meant to document the API. We don't expect the clients to send garbage just to get an error message specifying what the server expects. Besides, for those using the trial and error approach, reverting this change will just make it a few more steps.
I'd like to change this in v3.0. Any objection?
To be explicit, I'm talking about case 1 in OP. Regarding case 2, I agree with @sloria's answer, this is the right behaviour.