See discussion in https://github.com/marshmallow-code/marshmallow/issues/493#issuecomment-380468027.
We have two ways to do the same thing. They sometimes behave differently, which generally means one of them is buggy.
IMHO, and from what I have seen from users around me, it is more natural for a new user to write List(Nested) than Nested(many). Besides, it is consistent with Dict(values=Nested).
I realize this is quite a breaking change, both from user perspective, because Nested(many=True) is probably the most commonly used form, and from developer perspective, because it involves a lot of code changes (but also a lot of code removal, which is nice).
There could be things that can be done with Nested(many=True) but not with List(Nested), because in the latter case, the nested Schema has access to the whole data, while in the former, it is called on each element. Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?
Anyway, since the question was raised in a comment in another issue, I figured I'd give it more visibility.
Is there any documentation outlining the differences between these two? I find the distinction confusing as well.
I don't think there is supposed to be a difference.
I think deprecating Nested(many=True) is a good idea. Would someone be up for doing an analysis of the current differences between the two APIs?
Do you mean deprecate in 2.X and remove in 3 or deprecate in 3?
Here are a few differences:
Nested(many=True) [#493] (List(Nested) does the right thing.)List is not propagating the context between a nested field and the parent schema [#922] (Fixed)only argument inconsistent between Nested(S, many=True) and List(Nested(S)) [#946] (Fixed in #1229)List(Nested) does the right thing.)partial not propagated to List(Nested) (Fixed in #1230)@lafrech I was thinking of deprecating in 3. As you pointed out, Nested(many=True) is the probably the more common usage, and removing it would break many users' code.
How would a self-referencing schema work with this? In other words, how would we express
friends = Nested('self', many=True, exclude=('friends',))
I would expect friends = List(Nested('self', exclude=('friends',))) to work the same way, but it currently errors. Using a reference to the schema with python 3 works, so I suspect this is just an edge case that hasn't been tested.
Perhaps some pre_load edge cases might suffer from the change. Any real-life example, anyone?
I've got production code that relies on the fact that a nested field with many=True and @pre_dump(pass_many=True) on its schema will execute this pre_dump only once.
The benefit is making sure expensive code that is needed to be executed before nested fields can be dumped is not repeated for each element in the fields.List.
I don't really see this as an edge case though, pretty sure doing something to the whole set of related objects before dumping/loading them is a common pattern.
Currently a list of nested fields does not propagate the schema context correctly, but nested many does. See #424 and #922.
Yep, I just updated the list above.
I saw the ref on the issue, but I couldn't figure out where it came from. Thanks.
@anka-sirota yes it seems like a common use case, but it very well could be done on parent schema or by subclassing List and doing custom pre_dump there. All that is possible without handling a separate many case, which I would I agree with @lafrech, just source of inconsistent in the library.
Looking at bigger picture, that this issue already have been here for couple months now, and there still few bugs before many=True can be complete eliminated from the library, I would suggest just calling it deprecated in marshmallow 3 release.
Such move will:
List(Nested) and that it works consistently with Nested(many=True)@lafrech another difference to be added to the list https://github.com/marshmallow-code/marshmallow/issues/946
@rooterkyberian, I was just searching for this thread to reference your issue.
All differences listed above are either fixed or on their way to be fixed. Once this is done, we should be able to deprecate Nested(many=True).
There is also a difference between Nested(Schema, many=True) and List(Nested(Schema)) when loading partial data. Below tests are showing the difference in behaviour when using both methods:
from marshmallow import Schema
from marshmallow.fields import List, Nested, Str
class Employee(Schema):
name = Str(required=True)
title = Str(required=True)
def test_nested_partial_load_with_list():
class Company(Schema):
employees = List(Nested(Employee))
payload = {
'employees': [{
'name': 'Arthur',
}],
}
# Partial loading with list type shouldn't generate any errors.
result = Company().load(payload, partial=True)
assert result['employees'][0]['name'] == 'Arthur'
def test_nested_partial_load_with_nested_many():
class Company(Schema):
employees = Nested(Employee, many=True)
payload = {
'employees': [{
'name': 'Arthur',
}],
}
# Partial loading with Nested(many=True) type shouldn't generate any errors.
result = Company().load(payload, partial=True)
assert result['employees'][0]['name'] == 'Arthur'
Partial loading in nested schemas was introduced in version 3.0.0rc1 in this PR: #849. Problem is that when a field is of type List, it is not getting the same kwargs as Nested so it can't ignore missing data. I think it was even mentioned in referenced discussion.
Thans @0xpsilocybe, I think this is the object of @sloria's comment in #1066. I shall update the PR to propagate all needed kwargs to Nested.
What about the Pluck(many=True) use case ?
Removing many would allow a few lines of explicit many handling to be removed from Pluck. The rest of the changes would automatically be inherited from Nested.
@lafrech Is there anything else to be addressed before we can deprecate Nested(many=True)? I think we'll want to investigate https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987 .
In any case, I don't think we should block on this for releasing 3.0. We can always deprecate within the 3.x line.
@sloria :+1: on depreciating stuff rather than stacking the deck with more BC breaks. I'd much rather have 8 releases with 8 breaking changes than 8 breaking changes in one release...
I don't know about anyone else, but we've got hundreds - maybe even a thousand schemas to migrate. I'd rather take that in strides!
@fuhrysteve, about this, please see the conversation in https://github.com/marshmallow-code/marshmallow/issues/928.
Changed the milestone on this since there are still further considerations to discuss (deprecating Nested(many=True)), but I don't think there's any action items to complete before 3.0 final.
Indeed, deprecation can be done in 3.x so it doesn't have to hold 3.0.
I sent https://github.com/marshmallow-code/marshmallow/pull/1290 with a fix for that last List(Nested(...)) glitch we identified above (https://github.com/marshmallow-code/marshmallow/issues/779#issuecomment-396354987).
Before we officially recommend List(Nested(Schema)), it would be good to do some benchmarking and profiling. A quick-n-dirty benchmark shows that List(Nested(Schema)) is ~35% slower than Nested(Schema, many=True): https://gist.github.com/sloria/2e0b141b3d05fe0dafbdeafd294ebae8
I might try to dig into a perf dump and see where the time is going. That seems like a lot of time to just be caused by an extra layer of indirection. I wonder if Nested isn't doing something it should be or if List is doing something it doesn't need.
I ran the benchmarks above using python 3.7.2 against the dev branch. I had to make a copy of Schema to get sane cProfile output, because recursion merges calls into a single total and hides the problem.
Nested(many)


List(Nested)


It looks like the slowdown is caused by Nested(many) calling schema.dump(many) once and getting to loop on schema._serialize(), but List(Nested) having to loop on schema.dump().
About 20% of that slowdown is caused by calls to _has_processors() which is cacheable and ErrorStore() which could possibly be removed or refactored as a shared instance. Optimizing these would also make schema.dump() calls slightly faster in general.
The rest of the slow down seems to be memory allocation and conditional evaluation in dump(). The easiest way to fix that is probably to probe the list's inner field type and allow it to hit the optimized schema path once via Nested._serialize(many).
dump() ~18% faster for both paths. #1306 makes dump() ~21% faster for List(Nested), which makes the two paths run in effectively the same amount of time (< 1%).Great.
Is there the same kind of gap between Nested(many) and List(Nested) when loading?
Ya, load()has the same ~35% slowdown when comparing List(Nested) to Nested(many). The same strategy should work there.
Edit: I added the fix for load() in #1306 and confirmed that it makes them run in the same time without slowing down Nested(many).
Does it make sense to remove Integer(many=True) and the many argument on other fields too?
@jtrakk many isn't an argument to other fields.
I see. FWIW, I've seen people (including myself) pass Integer(many=True) and been confused when it doesn't work. I know attr.ib() uses the metadata= namespace instead of taking **metadata, to avoid that problem, at the cost of a little extra verbosity. I'm not sure how I feel about the tradeoff, though I lean toward the explicit namespace.
Yeah, it's a known issue. The reasons we use **kwargs instead of e.g. metadata= are mostly historical. The original decision was that storing kwargs 1) was more concise and 2) saved us from having to come up with an appropriate name... "metadata" didn't seem right because there are use cases where the things your storing aren't really metadata. At this point, it's not worth breaking the API.
Not the best reasons, but I think it's not terrible. We've discussed adding a whitelist of metadata keys in the past, but we decided it wasn't worth the added API surface.
At this point, it's not worth breaking the API.
I suppose so.
That being said, 3.0.0 would be a good time to do it, and it wouldn't be hard to write an automatic converter using libCST or similar that'd move all the current **metadata calls into metadata=dict(...).
@jtrakk It's too late for 3.0 (planning to release the final today 😬 ), but I've opened https://github.com/marshmallow-code/marshmallow/issues/1350 for further discussion.
With this change and the lambda change proposed here, we can also probably deprecate the only, exclude, and unknown parameters of Nested.
# before
albums = fields.Nested(AlbumSchema, many=True, only=('id', 'title', ))
# after (possible now)
albums = field.List(fields.Nested(AlbumSchema(only=('id', 'title'))))
# before
artist = fields.Nested('ArtistSchema', only=('id', ))
# after (proposed)
artist = fields.Nested(lambda: ArtistSchema(only=('id',)))
While working on #1498, @Meallia uncovered that Nested(many) does not support enabling allow_none for list items. Schema.load() has no concept of allow_none, so it will always implicitly be true.
Most helpful comment
I might try to dig into a perf dump and see where the time is going. That seems like a lot of time to just be caused by an extra layer of indirection. I wonder if
Nestedisn't doing something it should be or ifListis doing something it doesn't need.