Marshmallow: Creating additional fields on the fly

Created on 24 Jul 2014 · 12Comments · Source: marshmallow-code/marshmallow

I'm trying to create a very flexible serializer, such that users can generate additional fields in the future. Let's say that today they only need the defaults I've provided

class PostSerializer(Serializer):
    id = fields.String()
    title = fields.String(default="Untitled")
    body = fields.String(default=None)
    author = fields.List(fields.String)

The user creates several posts, and they decide they want a field for "category." I provide an interface where they set a new category field. Now perhaps I store this field in a dictionary.

additional_fields = {
    "category" : "list"
}

When I modify the serializer on the fly (the only way that seems to work is via Meta.additional, setattr never seems to work)

s = PostSerializer
PostSerializer.Meta.additional = additional_fields.keys()

Posts which were created without the 'category' field will cause the following AttributeError:

AttributeError: "category" is not a valid field for {'id': '123456', 'title': 'Cool Post', 'body': 'Lorem Ipsum...', 'author': ['John', 'Steve']}

How can I maintain flexibility to add user generated fields, but also protect myself in the future? Is there a way to set a global default for additional fields?

Source

adregan

Most helpful comment

@skqr You can use a @post_load method to do the same thing as data_handler.

Yet another alternative is to update a schema's fields on __init__.

from marshmallow import Schema, fields

class MySchema(Schema):

    def __init__(self, additional_fields=None, **kwargs):
        super().__init__(**kwargs)
        self.declared_fields.update(additional_fields)

additional_fields = {
    'foo': fields.Int()
}

sch = MySchema(additional_fields=additional_fields)

print(sch.dump({'foo': '123'}).data)  # {'foo': 123}

sloria on 6 Mar 2017

👍11

All 12 comments

I suppose I can check through the list of posts before serializing and add the missing attributes (set to None), but if there is a way to set a global default, that would be awesome.

adregan on 24 Jul 2014

There are a couple of ways to handle your use case. Which one you choose will depend on your desired output.

Let's start with our "model" and a few instances:

class Post(object):
    def __init__(self, title):
        self.title = title

post_no_categories = Post('A post with no categories')
post_with_categories = Post('A post with categories')
post_with_categories.categories = ['music', 'video']

Option 1: Optional fields

If you want the serialized output to always contain the additional fields and it's fine if they are None, you can use optional fields, e.g., set required=False on the fields.

from marshmallow import Serializer, fields, pprint

class PostSerializer(Serializer):
    title = fields.String(default='Untitled')
    categories = fields.List(fields.String, required=False)

PostSerializer(post_no_categories).data
# {"categories": null, "title": "A post with no categories"}
PostSerializer(post_with_categories).data
# {"categories": ["music", "video"], "title": "A post with categories"}

Option 2: Post-processing function

If you want to only include the additional fields if they are defined on an instance, you could do so with a custom data handler. This is perhaps more flexible than option 1. See the docs on Transforming Data.

class PostSerializer2(Serializer):
    title = fields.String(default='Untitled')

additional_fields = {
    'categories': fields.List(fields.String())
}

# Register a custom data handler that will add the extra fields
@PostSerializer2.data_handler
def add_additional_fields(serializer, data, obj):
    for name, field_obj in additional_fields.items():
        if hasattr(obj, name):
            data[name] = field_obj.output(name, obj)
    return data

PostSerializer2(post_no_categories).data
# {"title": "A post with no categories"}
PostSerializer2(post_with_categories).data
# {"title": "A post with categories", "categories": ["music", "video"]}

Hope that helps!

sloria on 29 Jul 2014

Yes! Option 2 is fantastic. However, I'm finding the data_handlers are really persistent.

I'm creating instances of the PostSerializer on the fly and altering the fields via url arguments (ie. ?include=category) and I notice that the data_handler sticks around.

So if on one query I request /posts?include=category so that the posts include the category, but on the next query I want to include only the subcategory /posts?include=subcategory, the category will still persist.

Is there a way to "zero out" the data_handler between requests?

adregan on 30 Jul 2014

It seems as though applying the decorator the new instance of the class passed it to the parent. I'm creating a new child class now and setting the decorators on the child, and that seems to be working much better. Thank you.

adregan on 31 Jul 2014

Glad to hear that worked!

sloria on 3 Aug 2014

And what would we be doing after data_handler was removed in 2.0, @sloria ?

Thanks!

skqr on 6 Mar 2017

@skqr You can use a @post_load method to do the same thing as data_handler.

Yet another alternative is to update a schema's fields on __init__.

from marshmallow import Schema, fields

class MySchema(Schema):

    def __init__(self, additional_fields=None, **kwargs):
        super().__init__(**kwargs)
        self.declared_fields.update(additional_fields)

additional_fields = {
    'foo': fields.Int()
}

sch = MySchema(additional_fields=additional_fields)

print(sch.dump({'foo': '123'}).data)  # {'foo': 123}

sloria on 6 Mar 2017

👍11

So tired. I can not access original object via post_dump.
My case: I have many models in my django project with various text data, and I going to integrate django-modeltranslation to i18ning my models. And I would not add manually all fields with translations to schemas, but I want to do it on fly depending on settings.LANGUAGES, for example.
Some code:

class InterfaceI18N(models.Model):
    label = models.CharField('string id', max_length=150, db_index=True,

    # after django-modeltranslation integration this field will have copies for i18n purposes and this model will be extended by fields: value_de, value_en, value_fr and others
    value = models.TextField('value', default='')

    class Meta:
        verbose_name = 'interface string'
        verbose_name_plural = 'interface strings'

    def __str__(self):
        return self.label


class InterfaceI18NSchema(Schema):
    label = fields.String()
    value = fields.String()

    @post_dump
    def add_i18n_fields(self, *args, **kwargs):
        raise Exception('Can not add all value_XX fields for all languages :(')

What do you think about this case, @sloria ? I can not find any right way to do it avoiding write all i18n fields in schema manually.

MrYoda on 6 Mar 2017

@MrYoda I would create a custom metaclass that would take a list of I18n field names and creates clones and adds post_dump processors.

Although, unless you're writing a translation tool the whole idea looks weird.

maximkulkin on 6 Mar 2017

@MrYoda Here is an example of how to do that (if I understand problem correctly):

import marshmallow as m
import marshmallow.fields as mf
from marshmallow.compat import with_metaclass
from collections import namedtuple, OrderedDict

LANGUAGES = ['en', 'de', 'fr']

class I18NMeta(type):
    def __new__(cls, name, bases, attrs):
        new_attrs = OrderedDict(attrs)
        if 'Meta' in attrs:
            for name in getattr(attrs['Meta'], 'i18n_fields', []):
                if name not in attrs:
                    continue

                field = attrs[name]
                for lang in LANGUAGES:
                    new_attrs['%s_%s' % (name, lang)] = field

        return type(name, bases, new_attrs)

class ModelSchema(with_metaclass(I18NMeta, m.Schema)):
    class Meta:
        ordered = True
        i18n_fields = ['value']

    value = mf.String()

Model = namedtuple('Model', ['value', 'value_en', 'value_de', 'value_fr'])

print ModelSchema().dump(Model('Hello', 'Hello', 'Hallo', 'Bonjour'))
# => MarshalResult(data=OrderedDict([(u'value', u'Hello'), (u'value_en', u'Hello'), (u'value_de', u'Hallo'), (u'value_fr', u'Bonjour')]), errors={})

Alternatively, you can use special type to mark localized strings:

import marshmallow as m
import marshmallow.fields as mf
from marshmallow.compat import with_metaclass, iteritems
from collections import namedtuple, OrderedDict

LANGUAGES = ['en', 'de', 'fr']

class LocalizedString(mf.String):
    pass

class I18NMeta(type):
    def __new__(cls, name, bases, attrs):
        new_attrs = OrderedDict(attrs)
        for field_name, field in iteritems(attrs):
            if not isinstance(field, LocalizedString):
                continue

            for lang in LANGUAGES:
                new_attrs['%s_%s' % (field_name, lang)] = field

        return type(name, bases, new_attrs)

class ModelSchema(with_metaclass(I18NMeta, m.Schema)):
    value = LocalizedString()

Model = namedtuple('Model', ['value', 'value_en', 'value_de', 'value_fr'])

print ModelSchema().dump(Model('Hello', 'Hello', 'Hallo', 'Bonjour'))

maximkulkin on 6 Mar 2017

👍1

@maximkulkin Thank you for very good snippet!

MrYoda on 7 Mar 2017

Another data point for the record, in case someone else runs into this problem.

I had the same AttributeError: description is not a valid field for <some.Model object>. For me the problem was that the object had a description @property and this method tried to get a related item from a database and get the description from there. When no object was found, this gave an AttributeError. I fixed it by returning None in that case.

In other words: the real error might be hiding behind the AttributeError raised by marshmallow.