Marshmallow: RFC: Remove Nested(many=True) to keep only the List(Nested()) form

Created on 19 Apr 2018 · 38Comments · Source: marshmallow-code/marshmallow

See discussion in https://github.com/marshmallow-code/marshmallow/issues/493#issuecomment-380468027.

We have two ways to do the same thing. They sometimes behave differently, which generally means one of them is buggy.

IMHO, and from what I have seen from users around me, it is more natural for a new user to write List(Nested) than Nested(many). Besides, it is consistent with Dict(values=Nested).

I realize this is quite a breaking change, both from user perspective, because Nested(many=True) is probably the most commonly used form, and from developer perspective, because it involves a lot of code changes (but also a lot of code removal, which is nice).

There could be things that can be done with Nested(many=True) but not with List(Nested), because in the latter case, the nested Schema has access to the whole data, while in the former, it is called on each element. Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

Anyway, since the question was raised in a comment in another issue, I figured I'd give it more visibility.

backwards incompat feedback welcome

Source

lafrech

👍7

Most helpful comment

I might try to dig into a perf dump and see where the time is going. That seems like a lot of time to just be caused by an extra layer of indirection. I wonder if Nested isn't doing something it should be or if List is doing something it doesn't need.

deckar01 on 15 Jul 2019

👍3

All 38 comments

Is there any documentation outlining the differences between these two? I find the distinction confusing as well.

seansfkelley on 14 May 2018

I don't think there is supposed to be a difference.

deckar01 on 14 May 2018

I think deprecating Nested(many=True) is a good idea. Would someone be up for doing an analysis of the current differences between the two APIs?

sloria on 30 May 2018

Do you mean deprecate in 2.X and remove in 3 or deprecate in 3?

Here are a few differences:

Partial data if an item fails validation [#766] (Fixed in #1052)
Validation error issue with Nested(many=True) [#493] (List(Nested) does the right thing.)
List is not propagating the context between a nested field and the parent schema [#922] (Fixed)
only argument inconsistent between Nested(S, many=True) and List(Nested(S)) [#946] (Fixed in #1229)
Different error message when deserializing an invalid input (not a list) (List(Nested) does the right thing.)
partial not propagated to List(Nested) (Fixed in #1230)

lafrech on 30 May 2018

@lafrech I was thinking of deprecating in 3. As you pointed out, Nested(many=True) is the probably the more common usage, and removing it would break many users' code.

sloria on 30 May 2018

👍2

How would a self-referencing schema work with this? In other words, how would we express

friends = Nested('self', many=True, exclude=('friends',))

sloria on 11 Jun 2018

👍3

I would expect friends = List(Nested('self', exclude=('friends',))) to work the same way, but it currently errors. Using a reference to the schema with python 3 works, so I suspect this is just an edge case that hasn't been tested.

deckar01 on 11 Jun 2018

👍2

Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?

I've got production code that relies on the fact that a nested field with many=True and @pre_dump(pass_many=True) on its schema will execute this pre_dump only once.

The benefit is making sure expensive code that is needed to be executed before nested fields can be dumped is not repeated for each element in the fields.List.

I don't really see this as an edge case though, pretty sure doing something to the whole set of related objects before dumping/loading them is a common pattern.

anka-sirota on 13 Aug 2018

Currently a list of nested fields does not propagate the schema context correctly, but nested many does. See #424 and #922.

deckar01 on 22 Aug 2018

Yep, I just updated the list above.

lafrech on 22 Aug 2018

I saw the ref on the issue, but I couldn't figure out where it came from. Thanks.

deckar01 on 22 Aug 2018

@anka-sirota yes it seems like a common use case, but it very well could be done on parent schema or by subclassing List and doing custom pre_dump there. All that is possible without handling a separate many case, which I would I agree with @lafrech, just source of inconsistent in the library.

Looking at bigger picture, that this issue already have been here for couple months now, and there still few bugs before many=True can be complete eliminated from the library, I would suggest just calling it deprecated in marshmallow 3 release.
Such move will:

prevent further delay of marshmallow 3
make it so that there is more interest in List(Nested) and that it works consistently with Nested(many=True)

rooterkyberian on 31 Aug 2018

@lafrech another difference to be added to the list https://github.com/marshmallow-code/marshmallow/issues/946

rooterkyberian on 11 Sep 2018

👍1

@rooterkyberian, I was just searching for this thread to reference your issue.

lafrech on 11 Sep 2018

All differences listed above are either fixed or on their way to be fixed. Once this is done, we should be able to deprecate Nested(many=True).

lafrech on 4 Dec 2018

👍1

There is also a difference between Nested(Schema, many=True) and List(Nested(Schema)) when loading partial data. Below tests are showing the difference in behaviour when using both methods:

from marshmallow import Schema
from marshmallow.fields import List, Nested, Str


class Employee(Schema):
    name = Str(required=True)
    title = Str(required=True)


def test_nested_partial_load_with_list():
    class Company(Schema):
        employees = List(Nested(Employee))

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with list type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'


def test_nested_partial_load_with_nested_many():
    class Company(Schema):
        employees = Nested(Employee, many=True)

    payload = {
        'employees': [{
            'name': 'Arthur',
        }],
    }
    # Partial loading with Nested(many=True) type shouldn't generate any errors.
    result = Company().load(payload, partial=True)
    assert result['employees'][0]['name'] == 'Arthur'

Partial loading in nested schemas was introduced in version 3.0.0rc1 in this PR: #849. Problem is that when a field is of type List, it is not getting the same kwargs as Nested so it can't ignore missing data. I think it was even mentioned in referenced discussion.

bartelemi on 29 Dec 2018

👍1

Thans @0xpsilocybe, I think this is the object of @sloria's comment in #1066. I shall update the PR to propagate all needed kwargs to Nested.

lafrech on 29 Dec 2018

What about the Pluck(many=True) use case ?

timc13 on 25 Mar 2019

Removing many would allow a few lines of explicit many handling to be removed from Pluck. The rest of the changes would automatically be inherited from Nested.

deckar01 on 27 Mar 2019

@lafrech Is there anything else to be addressed before we can deprecate Nested(many=True)? I think we'll want to investigate https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987 .

In any case, I don't think we should block on this for releasing 3.0. We can always deprecate within the 3.x line.

sloria on 15 Jun 2019

👍1

@sloria :+1: on depreciating stuff rather than stacking the deck with more BC breaks. I'd much rather have 8 releases with 8 breaking changes than 8 breaking changes in one release...

I don't know about anyone else, but we've got hundreds - maybe even a thousand schemas to migrate. I'd rather take that in strides!

fuhrysteve on 17 Jun 2019

😕1

@fuhrysteve, about this, please see the conversation in https://github.com/marshmallow-code/marshmallow/issues/928.

lafrech on 17 Jun 2019

👍2

Changed the milestone on this since there are still further considerations to discuss (deprecating Nested(many=True)), but I don't think there's any action items to complete before 3.0 final.

sloria on 7 Jul 2019

Indeed, deprecation can be done in 3.x so it doesn't have to hold 3.0.

I sent https://github.com/marshmallow-code/marshmallow/pull/1290 with a fix for that last List(Nested(...)) glitch we identified above (https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987).

lafrech on 8 Jul 2019

Before we officially recommend List(Nested(Schema)), it would be good to do some benchmarking and profiling. A quick-n-dirty benchmark shows that List(Nested(Schema)) is ~35% slower than Nested(Schema, many=True): https://gist.github.com/sloria/2e0b141b3d05fe0dafbdeafd294ebae8

sloria on 8 Jul 2019

👍2

deckar01 on 15 Jul 2019

👍3

I ran the benchmarks above using python 3.7.2 against the dev branch. I had to make a copy of Schema to get sane cProfile output, because recursion merges calls into a single total and hides the problem.

Nested(many)

Screen Shot 2019-07-16 at 11 03 23 AM

List(Nested)

Screen Shot 2019-07-16 at 11 03 42 AM

It looks like the slowdown is caused by Nested(many) calling schema.dump(many) once and getting to loop on schema._serialize(), but List(Nested) having to loop on schema.dump().

About 20% of that slowdown is caused by calls to _has_processors() which is cacheable and ErrorStore() which could possibly be removed or refactored as a shared instance. Optimizing these would also make schema.dump() calls slightly faster in general.

The rest of the slow down seems to be memory allocation and conditional evaluation in dump(). The easiest way to fix that is probably to probe the list's inner field type and allow it to hit the optimized schema path once via Nested._serialize(many).

deckar01 on 16 Jul 2019

👍2

1304 makes `dump()` ~18% faster for both paths. #1306 makes `dump()` ~21% faster for `List(Nested)`, which makes the two paths run in effectively the same amount of time (< 1%).

deckar01 on 17 Jul 2019

👍1

Great.

Is there the same kind of gap between Nested(many) and List(Nested) when loading?

lafrech on 17 Jul 2019

Ya, load()has the same ~35% slowdown when comparing List(Nested) to Nested(many). The same strategy should work there.

Edit: I added the fix for load() in #1306 and confirmed that it makes them run in the same time without slowing down Nested(many).

deckar01 on 17 Jul 2019

Does it make sense to remove Integer(many=True) and the many argument on other fields too?

jtrakk on 18 Aug 2019

@jtrakk many isn't an argument to other fields.

sloria on 18 Aug 2019

👍1

I see. FWIW, I've seen people (including myself) pass Integer(many=True) and been confused when it doesn't work. I know attr.ib() uses the metadata= namespace instead of taking **metadata, to avoid that problem, at the cost of a little extra verbosity. I'm not sure how I feel about the tradeoff, though I lean toward the explicit namespace.

jtrakk on 18 Aug 2019

Yeah, it's a known issue. The reasons we use **kwargs instead of e.g. metadata= are mostly historical. The original decision was that storing kwargs 1) was more concise and 2) saved us from having to come up with an appropriate name... "metadata" didn't seem right because there are use cases where the things your storing aren't really metadata. At this point, it's not worth breaking the API.

Not the best reasons, but I think it's not terrible. We've discussed adding a whitelist of metadata keys in the past, but we decided it wasn't worth the added API surface.

sloria on 18 Aug 2019

At this point, it's not worth breaking the API.

I suppose so.

That being said, 3.0.0 would be a good time to do it, and it wouldn't be hard to write an automatic converter using libCST or similar that'd move all the current **metadata calls into metadata=dict(...).

jtrakk on 18 Aug 2019

@jtrakk It's too late for 3.0 (planning to release the final today 😬 ), but I've opened https://github.com/marshmallow-code/marshmallow/issues/1350 for further discussion.

sloria on 18 Aug 2019

With this change and the lambda change proposed here, we can also probably deprecate the only, exclude, and unknown parameters of Nested.

# before
albums = fields.Nested(AlbumSchema, many=True, only=('id', 'title', ))
# after (possible now)
albums = field.List(fields.Nested(AlbumSchema(only=('id', 'title'))))

# before
artist = fields.Nested('ArtistSchema', only=('id', ))
# after (proposed)
artist = fields.Nested(lambda: ArtistSchema(only=('id',)))

sloria on 8 Sep 2019

While working on #1498, @Meallia uncovered that Nested(many) does not support enabling allow_none for list items. Schema.load() has no concept of allow_none, so it will always implicitly be true.

deckar01 on 3 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings