Marshmallow: How to make schema for dict fields with variable names?

Created on 30 Dec 2014  路  11Comments  路  Source: marshmallow-code/marshmallow

For example, suppose we have the following JSON schema:

...
"metrics": {
  "type": "object",
  "patternProperties": {
    "^[a-zA-Z]+$": {
      "type": "object",
      "required": ["v", "date"],
      "properties": {
        "v": {"type": "number"},
        "date": {"type": "string"}
      },
      "additionalProperties": false
    }
  }
}
...

How to define marshmallow's schema for it?

I can validate it like this:

@MetricsSchema.validator
def validate_metrics(schema, input_data):
    metric_key, metric_value = input_data['metrics'].items()[0]
    if not re.compile(r'^[a-zA-Z]+$').match(metric_key):
        raise ValidationError('Metric\'s name must be alphabetical symbols.')
    errors = PredefinedMetricSchema().validate(metric_value)
    if errors:
        raise ValidationError(errors)

But how to make schema for serializing? I think it should have clear decision, because this case can be occured in schemas in your new project named smore.

question

Most helpful comment

@vovanbo Apologies for the delayed response. Here's a stab at a DictField that might meet your use case:

from marshmallow import Schema, fields, validate

class DictField(fields.Field):

    def __init__(self, key_field, nested_field, *args, **kwargs):
        fields.Field.__init__(self, *args, **kwargs)
        self.key_field = key_field
        self.nested_field = nested_field

    def _deserialize(self, value):
        ret = {}
        for key, val in value.items():
            k = self.key_field.deserialize(key)
            v = self.nested_field.deserialize(val)
            ret[k] = v
        return ret

    def _serialize(self, value, attr, obj):
        ret = {}
        for key, val in value.items():
            k = self.key_field._serialize(key, attr, obj)
            v = self.nested_field.serialize(key, self.get_value(attr, obj))
            ret[k] = v
        return ret


class MetricSchema(Schema):
    value = fields.Float()
    date = fields.DateTime()

class UserSchema(Schema):
    metrics = DictField(
        fields.Str(validate=validate.Regexp(r'^[a-zA-Z]+$')), 
        fields.Nested(MetricSchema)
    )

metrics = {
    "metrics": {
        "firstMetricName": {
            "value": 100,
            "date": "2015-01-09T04:33:17+00:00"
        },
        "secondMetricName": {
            "value": 110,
            "date": "2015-01-09T04:33:17+00:00"
        }
    }
}

s = UserSchema(strict=True)
s.load(metrics).data
# {'metrics': {'firstMetricName': {'value': 100.0,
#    'date': datetime.datetime(2015, 1, 9, 4, 33, 17, tzinfo=tzutc())},
#   'secondMetricName': {'value': 110.0,
#    'date': datetime.datetime(2015, 1, 9, 4, 33, 17, tzinfo=tzutc())}}}

All 11 comments

Just to clarify the issue: you want to be able to generate a marshmallow Schema object from a JSON schema. Is this correct?

smore currently has some functionality to generate Swagger objects (which are based off the JSON schema spec) from marshmallow schemas. It shouldn't be too much of a stretch to go in the other direction (JSON schema -> marshmallow).

If this is the requested feature, could you please open an issue on the smore issue tracker?

No, I want to validate and serialize JSON data object, where fields can be named whatever (r'^[a-zA-Z]+$'), but each of these fields have a predictable and predefined structure. For example:

One object can be:

{
    "metrics": {
        "firstMetricName": {
            "v": 100,
            "date": "2015-01-09T04:33:17+00:00"
        },
        "secondMetricName": {
            "v": 110,
            "date": "2015-01-09T04:33:17+00:00"
        }
    }
}

Another object can be:

{
    "metrics": {
        "somethingOne": {
            "v": 100,
            "date": "2015-01-09T04:33:17+00:00"
        },
        "somethingAgain": {
            "v": 110,
            "date": "2015-01-09T04:33:17+00:00"
        }
    }
}

JSON schema allow to define structures like this. How to deal with it with marshmallow? Field's name in Schema can't be set as variable range of symbols or something like this.

I saw the smore project code and don't find anything about this case.

To be clear, structures like this can be defined in MongoEngine as:

class Metric(db.EmbeddedDocument):
    value = db.FloatField(required=True, db_field='v')
    date = db.DateTimeField(required=True)


class User(db.Document):
    metrics = db.DictField(db.StringField, db.EmbeddedDocumentField(Metric))

@vovanbo Apologies for the delayed response. Here's a stab at a DictField that might meet your use case:

from marshmallow import Schema, fields, validate

class DictField(fields.Field):

    def __init__(self, key_field, nested_field, *args, **kwargs):
        fields.Field.__init__(self, *args, **kwargs)
        self.key_field = key_field
        self.nested_field = nested_field

    def _deserialize(self, value):
        ret = {}
        for key, val in value.items():
            k = self.key_field.deserialize(key)
            v = self.nested_field.deserialize(val)
            ret[k] = v
        return ret

    def _serialize(self, value, attr, obj):
        ret = {}
        for key, val in value.items():
            k = self.key_field._serialize(key, attr, obj)
            v = self.nested_field.serialize(key, self.get_value(attr, obj))
            ret[k] = v
        return ret


class MetricSchema(Schema):
    value = fields.Float()
    date = fields.DateTime()

class UserSchema(Schema):
    metrics = DictField(
        fields.Str(validate=validate.Regexp(r'^[a-zA-Z]+$')), 
        fields.Nested(MetricSchema)
    )

metrics = {
    "metrics": {
        "firstMetricName": {
            "value": 100,
            "date": "2015-01-09T04:33:17+00:00"
        },
        "secondMetricName": {
            "value": 110,
            "date": "2015-01-09T04:33:17+00:00"
        }
    }
}

s = UserSchema(strict=True)
s.load(metrics).data
# {'metrics': {'firstMetricName': {'value': 100.0,
#    'date': datetime.datetime(2015, 1, 9, 4, 33, 17, tzinfo=tzutc())},
#   'secondMetricName': {'value': 110.0,
#    'date': datetime.datetime(2015, 1, 9, 4, 33, 17, tzinfo=tzutc())}}}

Thank you, @sloria! Good example of custom field. What about including this as one of standard fields in marshmallow 2.0?

I'm going to hold off on adding the Dict field to marshmallow in order to reduce maintenance burden and get 2.0-a out the door as soon as possible. Perhaps we can add it to the docs though. Closing this for now.

Would you reconsider adding a Dict field type as part of the standard library?

Would you reconsider adding a Dict field type as part of the standard library?

+1

I noticed you added a regular Dict(), but are there any plans to support the DictField() as presented above? I have lots of maps of named objects for which this functionality would be useful.

+1 for having built-in support for defining dict fields with prescribed schema for values.

I had the same requirement. I created my own NamedObjectMap field based on the code example above. The main additions are:

  • the add_to_schema method, which (if I remember correctly) is necessary to propagate context.
  • proper insertion of validation errors in the validation errors dict

Chris


class NamedObjectMap(fields.Field):

default_error_messages = {
    'invalid': 'Not a valid mapping type.'
}

def __init__(self, nested_field, *args, **kwargs):
    fields.Field.__init__(self, *args, **kwargs)
    self.name_field = fields.Str()
    self.nested_field = nested_field


def _add_to_schema(self, field_name, schema):
    super(NamedObjectMap, self)._add_to_schema(field_name, schema)
    self.nested_field.parent = self
    self.nested_field.name = field_name


def _deserialize(self, value, attr, data):

    # Make sure we have a map
    if not isinstance(value, collections.Mapping):
        self.fail('invalid')

    ret = {}
    errors = {}
    for key, val in value.items():
        k = self.name_field.deserialize(key)
        if val==None and self.nested_field.missing:
            v = self.nested_field.missing
        else:
            try:
                v = self.nested_field.deserialize(val)
            except ValidationError as e:
                errors[key] = e.messages
                continue

    if errors:
        raise ValidationError(errors)

    return ret


def _serialize(self, value, attr, obj):
    ret = {}
    for key, val in value.items():
        k = self.name_field._serialize(key, attr, obj)
        v = self.nested_field._serialize(val, key, obj)
        ret[k] = v
    return ret
Was this page helpful?
0 / 5 - 0 ratings