I'm using 2.0.0rc2 to validate input data on HTTP requests and to load SQLAlchemy models to JSON on HTTP responses. And i've stumbled upon 2 problems:
First, while loading data from JSON on HTTP PUT request, i want to populate all missing fields as None, to correctly overwrite data in SQLAlchemy. Right now i'm using following code:
for name, field in schema.fields.iteritems():
if field.missing == ma.missing:
schema.fields[name].missing = None
It works, but i suppose it's bugged since i'm messing with marshmallow.Field instance attached to Schema class. And after disposing Schema instance all fields we patched will stuck with new missing instead of default one.
Second, while dumping data from SQLAlchemy to JSON all missing fields are resolved as None, and JSON populated with {"key": null, } data. It's unwanted behaviour and i'm cleaning them on post_dump trigger.
@post_dump
def clean_missing(self, data):
for key in filter(lambda key: data[key] is None, data):
data.pop(key)
return data
Same as previous, it's working but includes creating some BaseSchema class witch passes this logic to all inherited classes.
I've searched documentation for while, and didn't find any correct way to swap this behaviours i.e. skip fields on dumping and populate fields with None on loading. Am I missing something or marshmallow don't provide such functions?
What is wrong with creating a BaseSchema? This is a common usage pattern with marshmallow. You'll often want shared behavior across all your schemas.
You can use the newly-introduced on_bind_field hook to override the missing attribute. So your BaseSchema would look something like:
from marshmallow import Schema, fields, pre_load, post_dump, missing
class BaseSchema(Schema):
def on_bind_field(self, field_name, field_obj):
# Override default missing attribute so
# that missing values deserialize to None
if field_obj.missing == missing:
field_obj.missing = None
field_obj.allow_none = True
@post_dump
def clean_missing(self, data):
ret = data.copy()
for key in filter(lambda key: data[key] is None, data):
del ret[key]
return ret
class MySchema(BaseSchema):
foo = fields.Field()
bar = fields.Field()
s = MySchema()
s.load({'bar': 42}).data # {'bar': 42, 'foo': None}
s.dump({'foo': None, 'bar': 42}).data # {'bar': 42}
Hey, @sloria, thanks for the answer! I hope you don't mind if I annoy you for a bit more :)
on_bind_result is brilliant hint, but on second question - i'm concerned about processors execution order. I want clean_missing to always execute before any other processors, so I've tried to overwrite _invoke_dump_processors method. I'm not sure if it's best way to do it, or should i just rearrange self.__processors__ in __init__
from marshmallow import Schema, missing, fields
from marshmallow.decorators import POST_DUMP
class BaseSchema(Schema):
def __init__(self, set_default=missing, set_missing=missing,
clear_dump_missing=False, *args, **kwargs):
self.set_default = set_default
self.set_missing = set_missing
self.clear_dump_missing = clear_dump_missing
super(BaseSchema, self).__init__(*args, **kwargs)
def on_bind_field(self, field_name, field_obj):
if (self.set_default is not missing and
field_obj.default is missing):
if self.set_default is None:
field_obj.allow_none = True
field_obj.default = self.set_default
if (self.set_missing is not missing and
field_obj.missing is missing):
if self.set_missing is None:
field_obj.allow_none = True
field_obj.missing = self.set_missing
def _invoke_dump_processors(self, tag_name, data,
many, original_data=None):
if self.clear_dump_missing:
if many:
data = [self.clear_missing(d) for d in data]
else:
data = self.clear_missing(data)
return super(BaseSchema, self)._invoke_dump_processors(
tag_name, data, many, original_data
)
def clear_missing(self, data):
if not self.clear_dump_missing:
return data
result = data.copy()
for key in filter(lambda key: data[key] is None, data):
del result[key]
return result
Also I want to pass instance behaviour (i.e. set_missing, set_default, clear_dump_missing) to nested schemas. And I'm not sure where to start.
class SimpleSchema(BaseSchema):
foo = fields.Field()
bar = fields.Field()
class NestedSchema(BaseSchema):
simple_list = fields.List(fields.Nested(SimpleSchema))
simple_nested = fields.Nested(SimpleSchema)
schema = SimpleSchema(set_missing=None, clear_dump_missing=True)
print schema.dump({'foo': None, 'bar': 1})
print schema.load({'bar': 1})
schema = NestedSchema(set_missing=None, clear_dump_missing=True)
print schema.dump({'simple_list': [{'foo': None, 'bar': 1}, {'foo': 1, 'bar': None}], 'simple_nested': {'foo': 1, 'bar': None}})
print schema.load({'simple_nested': {'foo': 1}})
You could do something like
class BaseSchema(Schema):
@post_dump
def _post_dump(self, data):
processed = self._clear_dump_missing(data)
return self.post_dump(processed)
def _clear_dump_missing(self, data):
# ...
# Subclasses can override this
def post_dump(self, data):
return data
As far as passing additional arguments to nested schemas, you should pass schema instances to Nested:
simple_nested = fields.Nested(SimpleSchema(set_missing=None, clear_dump_missing=True))
Is there a way to pass additional arguments to nested schemas when they're 'self'?
@cwisecarver you can pass many, only, and exclude.
fields.Nested('self', only=('id', ), many=True)
Ah, @sloria, I was hoping to be able to pass random kwargs to the self schema.
@cwisecarver I've reopened https://github.com/marshmallow-code/marshmallow/issues/302 because I think it might meet your use case. Feel free to comment there.
Most helpful comment
What is wrong with creating a
BaseSchema? This is a common usage pattern with marshmallow. You'll often want shared behavior across all your schemas.You can use the newly-introduced
on_bind_fieldhook to override themissingattribute. So yourBaseSchemawould look something like: