Marshmallow: Two Way Serialization

Created on 18 Nov 2015 · 5Comments · Source: marshmallow-code/marshmallow

Trying to cope with this without writing two schemes for the same stuff.

Let's say I'm taking in one API and it gives me something like this:

{ 
 "CamelCased": "Stuff",
 "somethingElse": "More stuff",
 "OhLookAList": [1,2,3,4]
}

And I'm repackaging it, doing some fun stuff internally, and then dumping it out the other side into my own API. Don't ask why I can't keep the same names.

{
 "camel_cased": "That's a lie now.",
 "something_else": "More stuff",
 "oh_look_a_list": [1,2,3,4]
}

Normally I'd just use dump_to/load_from and be done with it all...however, I need to reverse the flow and send information from my API back to the external API. So someone will post:

{
 "camel_cased": "That's a lie now.",
 "something_else": "More stuff",
 "oh_look_a_list": [1,2,3,4]
}

And the external API will get...

{ 
 "CamelCased": "Stuff",
 "somethingElse": "More stuff",
 "OhLookAList": [1,2,3,4]
}

In the example text for marshmallow-enum I used a (nasty, in my opinion) pre_load/pre_dump workaround that looked at the schema context to see if the incoming data was the internal source or external source, and then switched a value on the field. However, I'm hesitant to do that for an _entire_ schema.

The best way I can think to handle this with Marshmallow is to just some sort of schema factory that'd create external/internal schemas on demand, potentially a less crappy version of this:

def make_two_way_schema(context):
     class SomeSchema(Schema):
         id = fields.Str(load_from=context['id']['read'], dump_to=context['id']['write'])
     return SomeSchema(...)

However, I'm unplussed about this idea or the prospect of managing two sets of schemas. I could see manufacturing the schemas before hand -- using the above mechanism -- and then at dump/load time choose the correct one. Maybe something like...

def make_two_way_schema(mapping):
      internal_to_external = ...
      external_to_internal = ...

      def schema_choose(...): # all the same args as Schema.__init__
          return internal_to_external if context['source'] == 'internal' else external_to_internal
      return schema_choose

This strikes me as the lesser of two evils but not by much... Any ideas?

Source

justanr

Most helpful comment

They way I've handled this is not always dump. load_from is an additional field to the one specified to the left of the equal sign.

SomeSchema

class SomeSchema(Schema):
    long_name = fields.String(load_from="longName", dump_to="longName")

Comes in as camelCase, and we want camelCase out

some_schema = SomeSchema()
data, errors = some_schema.load({"longName" : "Marshmallows for the masses"})
dump_data, errors = some_schema.dump(data)

# dump_data will return
# {"longName" : "Marshmallows for the masses"}

Comes in as camelCase, and we want snake_case out

some_schema = SomeSchema()
data, errors = some_schema.load({"longName" : "Marshmallows for the masses"})

# data will return
# {"long_name" : "Marshmallows for the masses"}

Comes in as snake_case, and we want camelCase out

some_schema = SomeSchema()
data, errors = some_schema.load({"long_name" : "Marshmallows for the masses"})
dump_data, errors = some_schema.dump(data)

# dump_data will return
# {"longName" : "Marshmallows for the masses"}

Comes in as snake_case, and we want snake_case out

some_schema = SomeSchema()
data, errors = some_schema.load({"long_name" : "Marshmallows for the masses"})

# data will return
# {"long_name" : "Marshmallows for the masses"}

Let me know if that helps!

jhitze on 25 Mar 2016

👍2

All 5 comments

They way I've handled this is not always dump. load_from is an additional field to the one specified to the left of the equal sign.

SomeSchema

class SomeSchema(Schema):
    long_name = fields.String(load_from="longName", dump_to="longName")

Comes in as camelCase, and we want camelCase out

some_schema = SomeSchema()
data, errors = some_schema.load({"longName" : "Marshmallows for the masses"})
dump_data, errors = some_schema.dump(data)

# dump_data will return
# {"longName" : "Marshmallows for the masses"}

Comes in as camelCase, and we want snake_case out

some_schema = SomeSchema()
data, errors = some_schema.load({"longName" : "Marshmallows for the masses"})

# data will return
# {"long_name" : "Marshmallows for the masses"}

Comes in as snake_case, and we want camelCase out

some_schema = SomeSchema()
data, errors = some_schema.load({"long_name" : "Marshmallows for the masses"})
dump_data, errors = some_schema.dump(data)

# dump_data will return
# {"longName" : "Marshmallows for the masses"}

Comes in as snake_case, and we want snake_case out

some_schema = SomeSchema()
data, errors = some_schema.load({"long_name" : "Marshmallows for the masses"})

# data will return
# {"long_name" : "Marshmallows for the masses"}

Let me know if that helps!

jhitze on 25 Mar 2016

👍2

@jhitze I think your solution assumes that the transformation is just mapping between names, but @justanr is indicating that he needs to customize how field values are processed depending on which interface he is dumping to.

This is an interesting use case. It reminds me of services like Stripe that consume a 3rd party API and expose it as a Stripe API. Although your use case sounds mostly pass through, you are still relying on some amount of processing between the APIs assuming your schema is enforcing type validation.

| | Legacy API | Schema_AB | Processing Layer | Schema_BC | New API |
| --- | --- | --- | --- | --- | --- |
| ⇝ | respond A_o | load A_o to B_i | process | dump B_o to C_i | receive C_i |
| ⇜ | receive A_i | dump B_o to A_i | process | load C_o to B_i | respond C_o |

This diagram represents 2 schemas that have the intermediate schema representation (B) in common.

Your use case seems to assume that the processing layer (B) will use the same naming convention as the new API (C), so the second schema feels redundant.

Classic Big Mac vs. Two Cheese Burgers dilemma :stuck_out_tongue_winking_eye:.

I think you are describing a serialization pipeline that separates data processing from schema naming. I am imagining a SchemaInterface class that represents one side of a schema naming and a SchemaPipeline that generates the intermediate Schemas from a sequence of SchemaInterfaces. A SchemaInterfaceMapping would have to be created to associate the field names and transform the data.

This would allow you to define one interface for your legacy API and one interface for your new API. The second interface would be reused for the processing layer, and a mapping would be created between the legacy interface and the processing interface.

That might be a more generic solution than you were looking for, but I think the solution probably requires some abstraction on top of Schema that knows how to reuse a common set of fields to construct two different schemas.

@justanr I'm curious if/how you ended up implementing your use case.

deckar01 on 28 Jun 2016

@jhitze Some how missed your response. Translating between names is about 60% of what these current schemas do. But as @deckar01 pointed out, there's a hefty amount of transformation needed sometimes.

@deckar01 You hit the nail on the head with Stripe. I work on a team that aggregates a bunch of APIs (some friendly than others) for our customer service team.

Right now we're using DRF serializers, which are all right but our name translation is janky and doesn't handle optional or missing values that well, and any attempt to seduce it otherwise breaks something else.

Normally, I'd just say screw it and leave it be but there's talk of open sourcing some of our code base but the parts that are under consideration aren't necessarily tied to Django or DRF. So I suggested Marshmallow as a potential replacement, given my experience with (and meager contributions to) it.

I like your idea of a Schema like class. I'll pay with it and see how it turns out.

But I'll admit, since most of the contracts are static maintaining two schemas isn't that terrible of an idea. Especially when coupled with the idea that we should respect all data sent to us by our sources and only validate and reject on data submitted by end users.

justanr on 29 Jun 2016

Closing for now, as this has become stale. Feel free to reopen if further discussion is needed.