Marshmallow: Forcing `None` on load and skipping `None` on dump

Created on 23 Sep 2015  路  7Comments  路  Source: marshmallow-code/marshmallow

I'm using 2.0.0rc2 to validate input data on HTTP requests and to load SQLAlchemy models to JSON on HTTP responses. And i've stumbled upon 2 problems:

First, while loading data from JSON on HTTP PUT request, i want to populate all missing fields as None, to correctly overwrite data in SQLAlchemy. Right now i'm using following code:

for name, field in schema.fields.iteritems():
    if field.missing == ma.missing:
        schema.fields[name].missing = None

It works, but i suppose it's bugged since i'm messing with marshmallow.Field instance attached to Schema class. And after disposing Schema instance all fields we patched will stuck with new missing instead of default one.

Second, while dumping data from SQLAlchemy to JSON all missing fields are resolved as None, and JSON populated with {"key": null, } data. It's unwanted behaviour and i'm cleaning them on post_dump trigger.

@post_dump
def clean_missing(self, data):
    for key in filter(lambda key: data[key] is None, data):
        data.pop(key)
    return data

Same as previous, it's working but includes creating some BaseSchema class witch passes this logic to all inherited classes.

I've searched documentation for while, and didn't find any correct way to swap this behaviours i.e. skip fields on dumping and populate fields with None on loading. Am I missing something or marshmallow don't provide such functions?

question

Most helpful comment

What is wrong with creating a BaseSchema? This is a common usage pattern with marshmallow. You'll often want shared behavior across all your schemas.

You can use the newly-introduced on_bind_field hook to override the missing attribute. So your BaseSchema would look something like:

from marshmallow import Schema, fields, pre_load, post_dump, missing

class BaseSchema(Schema):

    def on_bind_field(self, field_name, field_obj):
        # Override default missing attribute so
        # that missing values deserialize to None
        if field_obj.missing == missing:
            field_obj.missing = None
            field_obj.allow_none = True

    @post_dump 
    def clean_missing(self, data):
        ret = data.copy()
        for key in filter(lambda key: data[key] is None, data):
            del ret[key]
        return ret


class MySchema(BaseSchema):
    foo = fields.Field()
    bar = fields.Field()

s = MySchema()
s.load({'bar': 42}).data  # {'bar': 42, 'foo': None}
s.dump({'foo': None, 'bar': 42}).data  # {'bar': 42}

All 7 comments

What is wrong with creating a BaseSchema? This is a common usage pattern with marshmallow. You'll often want shared behavior across all your schemas.

You can use the newly-introduced on_bind_field hook to override the missing attribute. So your BaseSchema would look something like:

from marshmallow import Schema, fields, pre_load, post_dump, missing

class BaseSchema(Schema):

    def on_bind_field(self, field_name, field_obj):
        # Override default missing attribute so
        # that missing values deserialize to None
        if field_obj.missing == missing:
            field_obj.missing = None
            field_obj.allow_none = True

    @post_dump 
    def clean_missing(self, data):
        ret = data.copy()
        for key in filter(lambda key: data[key] is None, data):
            del ret[key]
        return ret


class MySchema(BaseSchema):
    foo = fields.Field()
    bar = fields.Field()

s = MySchema()
s.load({'bar': 42}).data  # {'bar': 42, 'foo': None}
s.dump({'foo': None, 'bar': 42}).data  # {'bar': 42}

Hey, @sloria, thanks for the answer! I hope you don't mind if I annoy you for a bit more :)

on_bind_result is brilliant hint, but on second question - i'm concerned about processors execution order. I want clean_missing to always execute before any other processors, so I've tried to overwrite _invoke_dump_processors method. I'm not sure if it's best way to do it, or should i just rearrange self.__processors__ in __init__

from marshmallow import Schema, missing, fields
from marshmallow.decorators import POST_DUMP

class BaseSchema(Schema):

    def __init__(self, set_default=missing, set_missing=missing,
                 clear_dump_missing=False, *args, **kwargs):
        self.set_default = set_default
        self.set_missing = set_missing
        self.clear_dump_missing = clear_dump_missing
        super(BaseSchema, self).__init__(*args, **kwargs)

    def on_bind_field(self, field_name, field_obj):
        if (self.set_default is not missing and
                field_obj.default is missing):
            if self.set_default is None:
                field_obj.allow_none = True
            field_obj.default = self.set_default
        if (self.set_missing is not missing and
                field_obj.missing is missing):
            if self.set_missing is None:
                field_obj.allow_none = True
            field_obj.missing = self.set_missing

    def _invoke_dump_processors(self, tag_name, data,
                                many, original_data=None):
        if self.clear_dump_missing:
            if many:
                data = [self.clear_missing(d) for d in data]
            else:
                data = self.clear_missing(data)
        return super(BaseSchema, self)._invoke_dump_processors(
            tag_name, data, many, original_data
        )

    def clear_missing(self, data):
        if not self.clear_dump_missing:
            return data
        result = data.copy()
        for key in filter(lambda key: data[key] is None, data):
            del result[key]
        return result

Also I want to pass instance behaviour (i.e. set_missing, set_default, clear_dump_missing) to nested schemas. And I'm not sure where to start.

class SimpleSchema(BaseSchema):
    foo = fields.Field()
    bar = fields.Field()

class NestedSchema(BaseSchema):
    simple_list = fields.List(fields.Nested(SimpleSchema))
    simple_nested = fields.Nested(SimpleSchema)

schema = SimpleSchema(set_missing=None, clear_dump_missing=True)
print schema.dump({'foo': None, 'bar': 1})
print schema.load({'bar': 1})

schema = NestedSchema(set_missing=None, clear_dump_missing=True)
print schema.dump({'simple_list': [{'foo': None, 'bar': 1}, {'foo': 1, 'bar': None}], 'simple_nested': {'foo': 1, 'bar': None}})
print schema.load({'simple_nested': {'foo': 1}})

You could do something like

class BaseSchema(Schema):

    @post_dump
    def _post_dump(self, data):
        processed = self._clear_dump_missing(data)
        return self.post_dump(processed)

    def _clear_dump_missing(self, data):
        # ...

    # Subclasses can override this
    def post_dump(self, data):
        return data

As far as passing additional arguments to nested schemas, you should pass schema instances to Nested:

simple_nested = fields.Nested(SimpleSchema(set_missing=None, clear_dump_missing=True))

Is there a way to pass additional arguments to nested schemas when they're 'self'?

@cwisecarver you can pass many, only, and exclude.

fields.Nested('self', only=('id', ), many=True)

Ah, @sloria, I was hoping to be able to pass random kwargs to the self schema.

@cwisecarver I've reopened https://github.com/marshmallow-code/marshmallow/issues/302 because I think it might meet your use case. Feel free to comment there.

Was this page helpful?
0 / 5 - 0 ratings