Marshmallow: "unknown" option should apply to nested fields so that "exclude" works

Created on 14 Jan 2020  Â·  14Comments  Â·  Source: marshmallow-code/marshmallow

The fact that exclude takes dotted notation for nested fields, but unknown does not apply to nested fields, means that exclude doesn't work for nested fields in some common use cases unless the nested schema(s) have the desired Meta.unknown setting, and then they are not runtime-flexible, which is the great benefit of Schema and load taking exclude. Somewhat new to marshmallow so maybe there's a way to accomplish this (setting exclude on nested fields at load time) in a different way?

Happy to contribute a PR if you guys think this is a good idea.

docs

All 14 comments

Can you provide a reproducible example of this issue? Are you getting errors about unknown nested fields when you are expecting them to be excluded?

Schema and load taking exclude

The Schema constructor takes an exclude argument, but the load() method doesn't.

https://marshmallow.readthedocs.io/en/stable/api_reference.html#marshmallow.Schema.load


I tried building a repro case from the description. Let me know if I missed anything.

from marshmallow import Schema, fields, EXCLUDE

class Foo(Schema):
    data = fields.Str()

class Test(Schema):
    foo = fields.Nested(Foo)

schema = Test(exclude=['foo.data'], unknown=EXCLUDE)
schema.load({'foo': {'data': 'bar'}})
# marshmallow.exceptions.ValidationError: {'foo': {'data': ['Unknown field.']}}

In this example, unknown=EXCLUDE does not get inherited by Foo, so it will continue to raise when it encounters unknown keys. exclude=['foo.data'] propagates down to Foo and removes data from its internal list of fields. When Foo tries to load data it is treated as an unknown key and raises an error.

I initially though this was expected behavior, but the docs for exclude say it contains "fields to exclude in the serialized result". This seems to indicate that the excluded field shouldn't be treated as unknown, but just be excluded from the serialized object. It seems like exclude should take precedence over unknown.

https://marshmallow.readthedocs.io/en/stable/api_reference.html#marshmallow.Schema

A minimal repro case for this can be constructed without Nested or unknown.

from marshmallow import Schema, fields

class Foo(Schema):
    data = fields.Str()

schema = Foo(exclude=['data'])
schema.load({'data': 'bar'})
# marshmallow.exceptions.ValidationError: {'data': ['Unknown field.']}

IIUC, behaviour is correct and conforms to the documentation:

exclude – Blacklist of the declared fields to exclude when instantiating the Schema.

The field is excluded, so treated as unknown.

Indeed, the "fields to exclude in the serialized result" phrasing is not precise enough and could be reworded. The part I quote above is clearer.

The issue seems to be that unknown does not propagate. This was already discussed here and we settled on "don't propagate". Generally, people will use a base schema for that.

I copied from Meta.exclude, but linked to Schema(exclude). The constructor parameter definition is clear and matches the implementation. The Meta.exclude definition should probably be fixed to match that.

Hi Jared

Sorry for the slow reply. Yes, your first test case reproduces the issue
I'm describing. Jerome also describes it correctly below:

The issue seems to be that unknown does not propagate.

Jerome's proposed solution ("Generally, people will use a base schema for
that"), which, I think means setting Meta.unknown on the child schema(?) is
what I ended up using.

This was the solution I ended up on, but my comment was that this kind of
obviates the usefulness of using exclude and unknown at schema
instantiation or load time.

If you've already deliberated on this and this is an intentional design
decision, that's fine. Would be good to make it more explicit in the docs
however. Thanks for looking at this so quickly.

On Wed, Jan 15, 2020 at 7:20 AM Jared Deckard notifications@github.com
wrote:

I copied from Meta.exclude, but linked to Schema(exclude). The schema
definition is clear and matches the implementation. The description for
Meta.exclude should probably be fixed to match that.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/marshmallow-code/marshmallow/issues/1490?email_source=notifications&email_token=AN36JUKMQ2RA6ACZGG7ASCTQ54SVVA5CNFSM4KG2CYZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJAVT2A#issuecomment-574708200,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AN36JUJ2VVLCAPIB5PGU5DLQ54SVVANCNFSM4KG2CYZA
.

--
Pattabi Seshadri
Engineering (Access)

Discussion on proposals to recursively inherit unknown can be found on #980 and #1428. Any suggestions or PRs for updating the docs would be appreciated.

Ok, thanks for those links. It's clear now why you made the decision not
to have unknown propagate. My suggestion for updating the docs would be to
add a sentence in every description of unknown for Schema(), load, and
loads. See the blue text below as an example:

Whether to exclude, include, or raise an error for unknown fields in the
data. (Note: does not propagate to nested fields.) Use EXCLUDE, INCLUDE or
RAISE.

On Wed, Jan 15, 2020 at 10:28 AM Jared Deckard notifications@github.com
wrote:

Discussion on proposals to recursively inherit unknown can be found on

980 https://github.com/marshmallow-code/marshmallow/issues/980 and

1428 https://github.com/marshmallow-code/marshmallow/issues/1428. Any

suggestions or PRs for updating the docs would be appreciated.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/marshmallow-code/marshmallow/issues/1490?email_source=notifications&email_token=AN36JUNLWT6S6GFFSN4IS5TQ55IVXA5CNFSM4KG2CYZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBJZIA#issuecomment-574790816,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AN36JUK2NTMYUPELALWZUKTQ55IVXANCNFSM4KG2CYZA
.

--
Pattabi Seshadri
Engineering (Access)

Seeing how many times this definition is duplicated, and considering how we just noticed the definitions for exclude were out of sync, we might want to consider defining once, and referencing with links. The restructured text code reference syntax should make this fairly painless.

Hi there! We definitely need to implement this functionality. Here's a live example:

from marshmallow import Schema, fields, EXCLUDE, INCLUDE


class SpamSchema(Schema):
    meat = fields.String()


class CanSchema(Schema):
    spam = fields.Nested(SpamSchema)


def main():
    can = CanSchema(unknown=EXCLUDE)

    # just passing expected data
    data = can.load({
        'spam': {'meat': 'pork'},
    })
    print(data)
    # prints {'spam': {'meat': 'pork'}}

    # passing unknown field at the top level
    data = can.load({
        'spam': {'meat': 'pork'},
        'foo': 'bar'
    })
    print(data)
    # prints {'spam': {'meat': 'pork'}}

    # or with include
    data = can.load(
        {
            'spam': {'meat': 'pork'},
            'foo': 'bar'
        },
        unknown=INCLUDE,
    )
    print(data)
    # prints {'spam': {'meat': 'pork'}, 'foo': 'bar'}

    # now passing unknown in a nested field
    can.load({
        'spam': {
            'meat': 'pork',
            'add-on': 'eggs',
        },
    })
    # fails with marshmallow.exceptions.ValidationError: {'spam': {'add-on': ['Unknown field.']}}

    # even if stating directly, it does not help
    can.load({
        'spam': {
            'meat': 'pork',
            'add-on': 'eggs',
        },
    }, unknown=EXCLUDE)
    # fails with marshmallow.exceptions.ValidationError: {'spam': {'add-on': ['Unknown field.']}}


if __name__ == '__main__':
    main()

Looking at this issue from the perspective of webargs, I see two (related) issues with #1575. I think I have a good idea of how to resolve them, but want to see if people like the results before I try to write it.

  • If unknown propagates automatically in _any_ context (as in #1575), it will be backwards incompatible
  • If unknown behaves inconsistently between Meta, schema instantiation, and load, it will cause confusion

And then, as an ancillary concern, if someone is writing a library on top of marshmallow (👋 hi guys!), they'll want to be able to control this or offer control to their users.

Maybe backwards incompatibility isn't an issue in this case -- if it's considered a bug that's being fixed -- but I think it will be a problem for someone.


So my idea:

  • add a new parameter in all of the same places as unknown, propagate_unknown
  • propagate_unknown can be True or False (by default, for backwards compat) and can be removed (True everywhere) in marshmallow v4
  • if propagate_unknown is True, behave as in #1575 when unknown is passed to load, and if False, behave as today

I would take #1575 as a basis for this work. It looks solid, has nice tests, and credit where it's due -- I just don't quite agree with the behavior.

The downsides are that we have a new attribute to control unknown, which might feel messy, and that we don't get the desired behavior by default. To me, those are worthwhile in the first case and a necessary evil in the second.

You can do it all with just one attribute, but I promise that it's worse.

This is just for fun. Don't take this suggestion seriously. Please. 😄

We _could_ change the constants a bit...

RAISE = 1
INCLUDE = 2
EXCLUDE = 4
PROPAGATE = 8
...
schema.load(data, unknown=EXCLUDE|PROPAGATE)

but then we're just asking for trouble because you have to check for INCLUDE|EXCLUDE, which makes no sense.

Here's another demonstration of this issue, as I just bumped into it as well here:

from dataclasses import dataclass
import desert  # similar to marshmallow-dataclass
import marshmallow
from typing import Optional

@dataclass
class Address:
    city: Optional[str]

@dataclass
class User:
    name: Optional[str]
    address: Address

>>> user1_data = { "name": "johnny", "address": { "city": "Seoul", "line": "21 Noksapyeong-daero 3-gil" } }

# desert.schema() just returns a marshmallow.Schema
>>> schema = desert.schema(User, meta={"unknown": marshmallow.EXCLUDE, "partial": True})

>>> schema.load(user1_data)

yields...

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/.../lib/python3.8/site-packages/marshmallow/schema.py", line 723, in load
    return self._do_load(
  File "/.../lib/python3.8/site-packages/marshmallow/schema.py", line 905, in _do_load
    raise exc

marshmallow.exceptions.ValidationError: {'address': {'line': ['Unknown field.']}}

But of course it works for the first "layer" of nesting, because that's where we specifically made the Schema:

>>> user2_data = {"name": "bobby", "last": "hill", "address": { "city": "Suwon" } }
>>> schema.load(user2_data)

User(name='bobby', address=Address(city='Suwon'))

The workaround that @sirosen suggested in his documentation proposal (#1634) wouldn't really work for this use case as far as I can tell, and so it would be a design constraint for marshmallow extensions if a change is not included in the base Marshmallow itself, as we're not able to make use of schema inheritance when dataclasses are involved as far as I know. I'm hopefully just ignorant of something, though.

Best case scenario, third party extensions would just have to find some way to cope with their Schema wrapper classes like with the one shown above.

any news here?

I like solution suggested by @sirosen in #1490. Patched marshmallow in my project with propagate_unknown

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DenisKuplyakov picture DenisKuplyakov  Â·  4Comments

lupodellasleppa picture lupodellasleppa  Â·  3Comments

lassandroan picture lassandroan  Â·  3Comments

nickretallack picture nickretallack  Â·  4Comments

rastikerdar picture rastikerdar  Â·  3Comments