Marshmallow: RFC: replace load_from and dump_to with a single parameter

Created on 5 Jan 2018 · 10Comments · Source: marshmallow-code/marshmallow

See discussion in https://github.com/marshmallow-code/marshmallow/pull/714.

In the general case, dump/load is symmetric and if you have a model field that is exposed with another name, you use the API name in the schema, and specify the model name as attribute.

         class MySchema(Schema):
             ApiName = fields.String(attribute='model_name')

dump_to and load_from introduce asymmetry, allowing to specify a different key for dump and load:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_load_name', dump_to='api_dump_name')

If you want to reproduce use case 1 using these, you need to specify both load_from and dump_to and make sure they match:

         class MySchema(Schema):
             model_name = fields.String(load_from='api_name', dump_to='api_name')

The flexibility brought by having both load_from and dump_to comes at a price: complexity for the user but also in the code (marshmallow and related libs like webargs, apispec...), potential undefined cases for some weird combinations...

Assuming symmetry is not required, a single parameter should be enough. There is however a limitation with attribute. It can't be used for APIs where the keys are invalid Python variable names:

class MySchema(Schema):
    ApiKey = fields.String(attribute='apikey')
    weird-key = fields.String(attribute='weird_key')  # syntax error

Currently, load_from and dump_to can be used for this as shown above.

The proposal here is to introduce a new parameter to unite them all. Let's call it key for now, short of a better name:

class MySchema(Schema):
    apikey = fields.Int(key='ApiKey')
    weird_key = fields.Int(key='weird-key')

If we do that, then there is no point in attribute anymore (except backward compatibility). So we'd remove attribute, dump_to and load_from.

Any objection to this?

Any suggestion for a better name?

backwards incompat feedback welcome

Source

lafrech

❤3 👍1

Most helpful comment

For anyone interested in different keys for serialization/deserialization - here is hack for marshmallow>=3.0.0b8
https://github.com/vgavro/requests-client/blob/0d88c8f907ae2b8e9f77ae2c7144741032acc0b8/requests_client/schemas.py#L90

vgavro on 11 Oct 2018

👍6 ❤1

All 10 comments

I was reading the discussion on PR #174. Just to add some context and see if I understand well (the doc is not always very clear):

We have a Python Object that by means of an Schema instance we serialize (dump) into a Serialized Representation, that we can take the backwards process and deserialize (load) into a Python Object.

A Schema have Fields that knows how the data can be de/serialized. That Fields can take an attribute from a Python Object and serialize it into a key/name in a map of a Serialized Representation, and back.

Some comments answering the questions:

attribute should be kept to decouple the schemas from your Python models/objects. The attribute parameter represents the name of the attribute of the Python Object where the data should be serialized from / deserialize to. If not present (None) Marshmallow assumes that the object attribute have the same name as the schema field. Same current behavior.
load_from, dump_to arguments should be consolidated into one that represents the key/name on the Serialized Representation, with the mirror behavior of attribute but for the Serialized Representation side: if is not present (None) Marshmallow should assume that the key/name is he same as the schema field name.
If you still need different values in load_from and dump_to what you need is two different schemas
A key name could be too terse without a clear doc on what it represents. Other names could be: key_name, name (JSON), field (OpenAPI), field_name, field_key, etc.

tinproject on 14 Jan 2018

👍3

@tinproject, thanks for stating this clearly, in a perhaps less convoluted way than I did.

I agree with everything you said.

The only I part wasn't totally convinced about is why we need to decouple the schema from the model. I use the schema to decouple API I/O from model objects, but I don't see the need for a schema field with a name that differs from both the model and the API. From an architectural point of view, I mean. From a practical perspective, I just identified a use case. setattr allows you to set attributes with names that are not legal Python variable names (such as 'my-attribute'). I don't see a reason for doing so, but assuming you're dealing with such an object, having the attribute parameter comes in handy. For this, plus compatibility with existing code, I agree we might as well keep attribute.

The new parameter sort of overlaps with attribute, as

my_field = Field(attribute='field')

would be equivalent to

field = Field(key='my_field')

but it does not matter IMHO. Keeping both shouldn't be an issue in terms of code maintenance. I'm more worried about the load_from / dump_to parameters. They lead to confusion, and undefined cases (see an example here: https://github.com/marshmallow-code/apispec/issues/178), so these are the one I'd like to get rid of.

Regarding the name of the new attribute, I think name is as unintuitive as key. I like field. My only concern: the elements of the schemas are also called "fields", and this parameter does not expect a schema field name but the name of a field in the serialiazed representation. Yet, I don't think this is blocking and it doesn't take 20 lines of documentation to make it clear, so field would be fine for me.

lafrech on 14 Jan 2018

@lafrech

One good case to have the option to decouple the model from the schema is when refactor the model, not having to change the schema and the related tests, only the attribute names. Also if you have models with names coming directly from a database, and you want to have different names (more clear) in your schema to use in validations for example.

I believe that flexibility, and symmetry on design is well worth.

To simplify the code both attribute and key/name can be initialized with the name of the field at __init__ and then used directly in de/serialization without looking to the field name.

tinproject on 15 Jan 2018

👍2

Hmmm, at __init__, the field does not know its name, does it?

And I suppose people with enough imagination could come up with use cases where a field's name is changed during execution, or a Field instance is used in several places of a Schema.

I'm afraid this simplification is not possible. Too bad, because it sounded nice.

Or am I missing something?

lafrech on 18 Jan 2018

I just pushed a draft (#724).

I'm still unsure about the name. field is a bit disturbing as it is the name of the Marhmallow field, so you'd be writing field.field = ....

I would have liked to avoid key_name of field_name or ser_key (ser as "serialized") to keep symmetry with attribute.

Maybe key would be fine, after all. Any idea anyone?

lafrech on 18 Jan 2018

This is a great RFC. I really like the idea.

I would suggest data_key as a name. It's relatively terse, but is still more informative than just key, and is consistent with the argument names in the Schema method signatures:

    def dump(self, obj, many=None, update_fields=True, **kwargs):
    def load(self, data, many=None, partial=None):

I do think it makes sense to drop attribute, but maybe this is opinionated. In my view, the schema should match either the Python object or the serialized representation. It wouldn't really make much sense for the schema not to match one or the other. Given that, for my purposes at least, I've generally found it much easier to make the schema match the Python object.

As such, while something like this "key" parameter is very useful, attribute just makes it hard to do things like "find the field for a given attribute on my object", which is something that comes up more in my own code than the reverse of "find the attribute for a given data key" (which instead just happens via the deserialization process itself).

As an example, suppose I'm building a CRUD HTTP endpoint, and I want to enable filtering (?size=3) on one of the columns in the backing model. It's very convenient in that case to be able to map together model columns and schema fields. Explicitly mapping data keys to schema fields happens much less often.