Hi. I've been digging around and couldn't find the answer to this.
Say I've got a model like this:
class AlbumSchema(Schema):
year = fields.Int()
class ArtistSchema(Schema):
name = fields.Str()
albums = ...
I want albums to be a dict of AlbumSchema, so that ArtistSchema serializes as
{ 'albums': { 'Hunky Dory': {'year': 1971},
'The Man Who Sold the World': {'year': 1970}},
'name': 'David Bowie'}
Naively, I would expect syntaxes like this to work:
fields.List(Schema)
fields.Dict(Schema)
or maybe
fields.List(fields.Nested(Schema))
fields.Dict(fields.Nested(Schema))
Serializing a list of Schema can be achieved through Nested(Schema, many=True), which I find less intuitive, and I don't know about a dict of Schema.
Is there any way to do it? Or a good reason _not_ to do it?
(Question also asked on SO.)
I want
albumsto be adictofAlbumSchema. Is there any way to do it?
Currently you must either provide explicitly named fields or use the Dict field and abandon all notion of an underlying schema.
Or a good reason _not_ to do it?
Taking a collection of like objects, plucking out a key, and using it as the key in a dictionary destroys the order.
Since marshmallow seems to strive for expressivity, I think this use case represents a void in the marshmallow interface. If an API can index homogeneous collections using strings, marshmallow probably should too.
I would call this interface NestedDict and implement it as a thin wrapper around Nested(many=True).
class NestedDict(Nested):
def __init__(self, nested, key, *args, **kwargs):
super(NestedDict, self).__init__(nested, many=True, *args, **kwargs)
self.key = key
def _serialize(self, nested_obj, attr, obj):
nested_list = super(NestedDict, self)._serialize(nested_obj, attr, obj)
nested_dict = {item[self.key]: item for item in nested_list}
return nested_dict
def _deserialize(self, value, attr, data):
raw_list = [item for key, item in value.items()]
nested_list = super(NestedDict, self)._deserialize(raw_list, attr, data)
return nested_list
The usage would look very similar to Nested except that a field name is provide to index the dictionary, and many=True is implicitly applied.
from marshmallow import fields, Schema
class AlbumSchema(Schema):
name = fields.Str()
year = fields.Int()
class ArtistSchema(Schema):
name = fields.Str()
albums = fields.NestedDict(AlbumSchema, key='name')
artist_schema = ArtistSchema()
obj, errors = artist_schema.load({
'name': 'Artist Name',
'albums': {
'Album A': {'name': 'Album A', 'year': 1999},
'Album B': {'name': 'Album B', 'year': 2005}
}
})
print(obj)
# {'name': 'Artist Name', 'albums': [{'name': 'Album A', 'year': 1999}, {'name': 'Album B', 'year': 2005}]}
data, errors = artist_schema.dump(obj)
print(data)
# {'name': 'Artist Name', 'albums': {'Album A': {'name': 'Album A', 'year': 1999}, 'Album B': {'name': 'Album B', 'year': 2005}}}
馃憤 , would find NestedDict very useful.
Hi @deckar01. Thank you for your feedback.
I think I have been unclear in my question.
My point is not to serialize a list as a dict but to serialize/deserialize a dict of like objects of known schema. In the original object, the data is stored as a dict already.
In other words, how would you write a Schema to serialize/deserialize such an object?
{'name': 'Artist Name', 'albums': {'Album A': {'year': 1999}, 'Album B': {'year': 2005}}}
It should be close to this, but there is a missing piece:
class AlbumSchema(Schema):
year = fields.Int()
class ArtistSchema(Schema):
name = fields.Str()
albums = ...
Currently, we don't know how to serialize this. We had to modify the object to let albums be a list and the album name be a name attribute of each album. The downside of this is that we can't call artist['Album_A'], obviously. We have to either recreate a dict or search the list.
In fact, I don't really care how the object is serialized, as long as I get it deserialized properly. I just don't see any reason not to represent it like this:
{ 'albums': { 'Hunky Dory': {'year': 1971},
'The Man Who Sold the World': {'year': 1970}},
'name': 'David Bowie'}
I hope this is clearer now.
I suspect this use case has not occurred before, because most users are serializing records from a relational database.
It sounds like you can control the schema since you made it a list, but you still want a dictionary to aide in the lookup process. Most users would probably just dump the records in their database and query by the name if they needed to.
Can you provide more specific details about where your data is coming from, why you are accessing it through a dictionary, and what you are doing with the data when you are done?
Indeed, the use case does not involve a database.
My colleague is writing an application and he wants a way to store user data. (Basically, the application runs numeric simulations, so the user data is made of simulation parameters and sets of results.)
He can do that quick and dirty using pickle, but I suggested him to serialize his objects into text files. And since I use Marshmallow on other projects (for database or API related stuff), I introduced him to Marshmallow.
The only issue was that dict. He currently made it a list as a workaround, losing the lookup feature in the process.
Besides that, it all went smooth and it helped him get the serialization part out of his business objects, so he's happy with it and he'll most probably be sticking to Marshmallow anyway. (Unless it is a wrong choice because it was not designed for such use cases ?)
I was a bit surprised to be blocking on what I thought would be a rather simple use case.
Do you think it would make sense to add such a possibility to Marshmallow?
Do you think it would make sense to add such a possibility to Marshmallow?
It seems like a reasonable feature to me.
The implementation may be more complex than the requirements make it sound though. Marshmallow is built on the assumption that homogeneous collections are lists.
For a Nested field to be able to handle a dictionary when many=True, the marshmallow core would need to use the more generic iterable interface instead of the list interface.
I am going to mull this over and make sure there is not a simpler solution I am overlooking.
fields.Dict(fields.Nested(Schema))
This did not fully sink it when I first read the issue. I like this. This would give new purpose to the otherwise unstructured Dict field.
Proof of concept: https://github.com/marshmallow-code/marshmallow/compare/dev...deckar01:483-structured-dict
It doesn't handle ordered dicts yet, but I thought I would get some feedback before it gets too involved.
@sloria If this looks like a viable option I can flesh out the docs and add support for ordered dicts.
This looks great. Thanks @deckar01.
As a sidenote, for someone discovering Marshmallow, an equivalent syntax for List
fields.List(fields.Nested(Schema))
could be more intuitive than fields.Nested(Schema, many=True), especially if fields.Dict(fields.Nested(Schema)) is implemented.
I don't mean to break everything in Marshmallow's core. And maybe it would be hiding the underlying principles too much.
It's all about
Marshmallow is built on the assumption that homogeneous collections are lists.
Once this is understood, the fields.Nested(Schema, many=True) makes sense. But coming from another serialization lib, it can be surprising.
MongoEngine's ListField, for instance, works like this:
class Page(Document):
tags = ListField(StringField(max_length=50))
I just checked in docs/source/tests. MongoEngine has a both a DictField that acts like Marshmallow's Dict and a MapField that enforces a given field type for its items. However, it looks like MapField(StringField()) is just a shortcut for DictField(field=StringField()). I guess there are historical reasons for this. But I believe it is clearer to just have Dict with an optional (and first positional) field type argument.
+1 for this feature, my use case is for a wizard where a user can create a bunch of "phases" for a workflow, they'll be serialized to Yaml and I wanted to use marshmallow for that.
Why not having validation for keys? I'll need to check some properties in the keys of the dict too. Something like fields.dict(key=fields.Str(), values=fields.Nested(SomeSchema)) would be helpful.
I like fields.Dict(key=fields.Str(), values=fields.Nested(SomeSchema)).
We use Marshmallow in a MongoDB ODM: uMongo.
If I want to serialize this in Mongo
{2011: 12, 2012: 15, 2013: 16, 2014: 18}
using a Dict field does not allow me to enforce a schema. I need to create a dedicated nested structure and put it in a list:
class DatedValue(MyBaseObject):
year = Int()
value = Int()
class MyObject(MyBaseObject):
dated_values = List(DatedValue)
and then I only get a list I can't access by keys (unless I create a dict from the list each time I load the object).
It would definitely be much less cumbersome if I could just write:
class MyObject(MyBaseObject):
values = Dict(keys=Int(), values=Int())
Other benefits:
Edit: Actually, using a Schema for keys in uMongo would be a bad idea for MongoDB specific reasons, so we'll stick to string keys there, but adding schemas to keys could make sense in Marshmallow anyway.
This is something I'd like to review for 3.0. I think there are valid use cases for validating keys, and I think @lsenta 's and @lafrech 's proposed API is reasonable.
@lafrech Would you like to send a PR implementing your proposed API?
@sloria I'm afraid I won't be able to do this any soon, but if I get the time, I'll be happy to give it a go.
Note that it wouldn't be a breaking change, so it could be added in a later 3.x.
Did you get the chance to look at @deckar01's proposal?
No problem, @lafrech . @deckar01 's proposal is on the right track; I think it would also be nice to have validation for keys, as suggested in https://github.com/marshmallow-code/marshmallow/issues/483#issuecomment-285936314
Is there any way to get around this limitation right now? I need this functionality replicated. Can this be achieved with a pre_dump? It appears to load ok somehow, but doesn't know how to dump to the correct schema.
This is a bugger (and a surprise it doesn't work out of the box). Anyone care to revive @deckar01's suggestion?
@deckar01 Would you be up for sending a PR with your proposal?
+1 for suggestion....Another use case is (de)serializing to protocol buffer's map field that accept string, bool, or int as key and any type as value (enum, message (~ python class), scalar types).
I'm trying to decide between Cerebrus and Marshmallow. Cerebrus has one critical feature of allowing one to set keyschema and valueschema. I'm more keen on Marshmallow's ecosystem and prefer its class based schemas. A bit dead in the water without this but trying to find a workaround. It strikes me as odd that a foundational data structure, mappings, isn't a consideration. Anybody found a way to support this?
I will rebase my branch and work on a PR.
I needed something similar to what @christian-storm needs, and I ended up taking parts (or maybe all, can't remember) of what @deckar01 had worked on a few months ago, and hacked on it until it did what i needed it to do: https://gist.github.com/ArthurPBressan/4f6dc8b7826e352884f0561ac79d6898
Maybe it's useful for someone as a starting point, since I removed some functionality that I didn't really need, and didn't implement tests.
@sloria How should deserialization errors be communicated for invalid keys? Maybe prefix the error message to indicate that the message is for the key? Invalid key: {}?
@sloria How should I handle the error message when a key and it's value have errors? Concatenate the error message lists together?
This is released in version 3.0.0b5. Thanks everyone for your feedback!
This is unrelated to this issue. Please open a new one.
@sloria @deckar01
With 3.0.0rc5, I still can't understand how OP's situation is solved.
I understand solution should look like this:
class ArtistSchema(Schema):
name = fields.Str()
albums = fields.Dict(keys=fields.Str(), values=AlbumSchema())
But Dict accepts only fields.ABC type for fields (not Schema).
Am I missing something?
Thanks!
class ArtistSchema(Schema):
name = fields.Str()
albums = fields.Dict(keys=fields.Str(), values=fields.Nested(AlbumSchema()))
Worked !
When will v3.0 be released?
pip install marshmallow => 2.19.1
I know I can add a --pre to get it, but the framework I use (responder) does not have it
There's still a few things to finish. No ETA. Please refer to https://github.com/marshmallow-code/marshmallow/milestone/10.
Most helpful comment
This is released in version 3.0.0b5. Thanks everyone for your feedback!