Pydantic: JSON-like serialization on .dict()

Created on 21 Apr 2020  路  15Comments  路  Source: samuelcolvin/pydantic

Feature Request

pydantic version: 1.5
pydantic compiled: True
install path: C:\Work\repos\data-purge.venv\Lib\site-packages\pydantic
python version: 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)]
platform: Windows-10-10.0.18362-SP0
optional deps. installed: []

I'm using a framework (lambda) that expects the lambdas to return a simple dict to be passed along to another step. This is done through marshaling that I don't really have a lot of access to modifying. I'm using pydantic models for all of my de/serialization, and it had been working great to just say something to the effect of return result.dict()

However, now that I've added a datetime field to the model, I get a failure on: Unable to marshal response: Object of type datetime is not JSON serializable

Now, I know this is a pretty standard python issue with anything Json related. But given that pydantic is such a great library for managing all this stuff, and it's not uncommon to have _another_ library expect to receive dicts, not json strings, I think it would be nice if we could have some way for .dict() to return a serializable format by default.

Essentially, I want to still have my pydantic model define the fields as a datetime, but I want the .dict step to convert those into a string format, rather than the datetime objects themselves. Is this at all possible/reasonable?

In [1]: from pydantic import BaseModel

In [2]: from datetime import datetime

In [3]: class Thing(BaseModel):
   ...:     x: datetime
   ...: 
In [5]: a = Thing(x="2020-01-01T00:00:00")

In [6]: a.dict()
Out[6]: {'x': datetime.datetime(2020, 1, 1, 0, 0)}

# preferred output:
Out[6]: {"x": "2020-01-01T00:00:00"}
...

Is this already possible? Could this be added in some kind of way? This isn't the first time I've hit this scenario, and I can't find an elegant way around it. If only python actually was reasonable about datetimes, I wouldn't expect pydantic or something else to solve it :/

feature request help wanted serialization

Most helpful comment

This is not a new idea @tiangolo submitted a PR for this way back in 2018 #317. I refused that PR, but I think we should reconsider it.

The reasons I refused that PR are explained in https://github.com/samuelcolvin/pydantic/pull/317#issuecomment-443689941 but it comes down to basically: "ujson.loads(m.dict() will likely be faster" and "not everyone uses JSON and people will want different behaviour". I also want to be able to reuse the serialization logic in pydantic_encoder.

I think we should do the following:

  • add a serialize kwarg to dict() which has type Union[bool, Callable] = False - when True the model is returned as a dict with only JSON types (this will use pydantic_encoder), when False (the default) the current behaviour is maintained, when a callable that callable will be called on all objects that aren't valid JSON to allow you to customise how things are serialised.
  • in v2, rename pydantic/json.py to pydantic/serialisation.py as it's not just related to JSON encoding.

We'll have to take some care over how we implement serialize=True to maintain good performance.

All 15 comments

The thing is, dicts are intended to have python objects in them. All pydantic models do have a .json() function which will automatically serialize common items into something json-serializable, and you can use the json_encoders config dict to customize the format if you wish, but by default, this is what you get with .json():

In [1]: from pydantic import BaseModel

In [2]: from datetime import datetime

In [3]: class Thing(BaseModel):
   ...:     x: datetime
   ...: 
In [4]: a = Thing(x="2020-01-01T00:00:00")

In [5]: a.json()
Out[5]: '{"x": "2020-01-01T00:00:00"}'

Which means that you can round-trip it back to a dict if you wish with:

In [6]: import json

In [7]: json.loads(a.json())
Out[7]: {'x': '2020-01-01T00:00:00'}

Which is how I handled things before exclude_none was a thing.

I can _probably_ accept that this is just a limitation conceptually in the way some of the underlying tools are defined. But I will say that conceptually, in principle, objects go through a transition of 4 states: Model > Dict w/ python objects > dict of serializable types > string of dict

It's unfortunate to me that it's not possible to actually step to the third state without doing something like skipping to step 4 and rolling back to step 3 by way of json.loads

You can also define a custom type that validates a datetime and returns the string version, something like:

In [1]: from datetime import datetime
   ...: from pydantic import BaseModel
   ...: from pydantic.datetime_parse import parse_datetime
   ...:
   ...:
   ...: class StringDate(datetime):
   ...:     @classmethod
   ...:     def __get_validators__(cls):
   ...:         yield parse_datetime
   ...:         yield cls.validate
   ...:
   ...:     @classmethod
   ...:     def validate(cls, v: datetime):
   ...:         return v.isoformat()
   ...:
   ...: class Thing(BaseModel):
   ...:     x: StringDate
   ...:
   ...: a = Thing(x="2020-01-01T00:00:00")
   ...:
   ...: a.dict()
Out[1]: {'x': '2020-01-01T00:00:00'}

Or, even more compactly, defining a custom validator on the model:

In [1]: from datetime import datetime
   ...: from pydantic import BaseModel, validator
   ...:
   ...:
   ...: class Thing(BaseModel):
   ...:     x: datetime
   ...:
   ...:     @validator("x")
   ...:     def datetime_to_string(cls, v):
   ...:         return v.isoformat()
   ...:
   ...: a = Thing(x="2020-01-01T00:00:00")
   ...:
   ...: a.dict()
Out[1]: {'x': '2020-01-01T00:00:00'}

I've had a similar problem as well. My use-case is that I'm writing an HTTP handler that traffics in Pydantic models, but in a framework that has baked-in opinions about how it wants to do things like encode datetimes that I don't agree with but have to work with.

The current pattern I use is something like json.loads(model.json()), which yields an object that's unambiguous dicts/lists/strs etc. without any fancy types, and then the HTTP framework happily encodes that object into a JSON string (again) without doing anything surprising to my datetimes, which are now in string form. (If I were to just pass model.json(), it would escape the already-JSON string and produce nonsense... argh!)

Relatedly, there are cases where I want to dict() something to pass to other Python code to continue processing, but also take advantage of json_encoders to convert some string-wrapping newtype (e.g. Mongo's bson.ObjectId) into a normal object (str).

In short, it would be nice if there were a method halfway between dict and json that basically produced vanilla Python objects that were unambiguously convertible to JSON types but without the performance penalty of round-tripping through a JSON string, that is, semantically equivalent to json.loads(model.json()). Strawman proposal: model.json(objects=True).

Also, the conceptualization in https://github.com/samuelcolvin/pydantic/issues/1409#issuecomment-620141422 and "rolling back" to step 3 is on-point. That's exactly my problem.

Another notable awkwardness (again, not a _problem_ but just a suggestion of why it feels inconsistent) is that if you have a sub-object that's a BaseModel, it's not like .dict() returns that as a BaseModel. It dict-ifies every pydantic model in the tree, but keeps pythony objects for everything else.

So one might argue that if .dict() serializes the BaseModel all the way down the tree, there might be a way to do the same for other types.

We had the same issue, to write into a dynamodb.

For this we use:

from fastapi.encoders import jsonable_encoder

class Thing(BaseModel):
    x: datetime

a = Thing(x="2020-01-01T00:00:00")

json_compatible_item_data = jsonable_encoder(a)

https://fastapi.tiangolo.com/tutorial/encoder/

This is not a new idea @tiangolo submitted a PR for this way back in 2018 #317. I refused that PR, but I think we should reconsider it.

The reasons I refused that PR are explained in https://github.com/samuelcolvin/pydantic/pull/317#issuecomment-443689941 but it comes down to basically: "ujson.loads(m.dict() will likely be faster" and "not everyone uses JSON and people will want different behaviour". I also want to be able to reuse the serialization logic in pydantic_encoder.

I think we should do the following:

  • add a serialize kwarg to dict() which has type Union[bool, Callable] = False - when True the model is returned as a dict with only JSON types (this will use pydantic_encoder), when False (the default) the current behaviour is maintained, when a callable that callable will be called on all objects that aren't valid JSON to allow you to customise how things are serialised.
  • in v2, rename pydantic/json.py to pydantic/serialisation.py as it's not just related to JSON encoding.

We'll have to take some care over how we implement serialize=True to maintain good performance.

I think it would be great to have some form of this :tada:

Just a note about the implementation, it could make sense to have an external function to do the serialization instead of a class method (or additional to?), that way it could support serializing an list of type List[SomeModel] and not only a sub-class of BaseModel (e.g. SomeModel).

Being able to serialize a list of models would still be needed either way for fields of a type like that, e.g.:

from pydantic import BaseModel

class SubModel(BaseModel):
    name: str

class Model(BaseModel):
    sequence_field: List[SubModel]

Here Model.sequence_field would need that serialization of list of models. But being able to use it outside of fields could be potentially useful and I think it would probably involve a similar effort.

Adding the serialize kwarg to dict() as described above, will be a significant step forward in clarity and performance for a use case that I frequently encounter - serializing to YAML. 馃檪

It makes sense that pydantic should not attempt to include output serialization for all data formats (not reasonable or sustainable). Exposing pydantic's capability to serialize complex data types into basic Python types provides excellent utility to everyone desiring to serialize to some other output format.

@samuelcolvin excited to see this gaining some traction! If I might make a suggestion:

I think I would weak-prefer (as in, fine if it doesn't happen, but just stating a preference) a pattern of .dict(), .serialize(), .json() over a .dict(serialize=), .json() as it's more functionally accurate that a pydantic model is essentially passing between three states. BaseModel -> Dict w/ types specified by model -> Dict with serializeable types -> json string. These states seem best represented by 3 independent functions, IMO.

Along with that, I might recommend the signature for serialize be something like:

def serialize(self, type_converters: Dict[Union[Type, Literal['_']], Callable[[Type], Any] = default_json_encoder_func) -> Dict[str, Any]:
    # pseudocode
    for field in fields:
        if type(field) in type_converters:
            converted_field = type_converters[type(field)](field)
        elif '_' in type_converters:
            converted_field = type_converters['_'](field)
        else:
            converted_field = field

Where instead of specifying a function that converts a whole object, we might say that someone wants to put a specific set of conversions on particular types. They could of course say { "_": convert_all_types()} which is implied to be the converter for any types not explicitly specified (the _ matching the default pattern matching indicator in PEP-622)

Now, some obvious caveats:

  • Perf concerns may make this kind of implementation more complex of a refactor, I'm uncertain as I haven't dug fully into the .dict and .json code yet.
  • names open to debate, obviously. You could make the argument that serialize implies something more than just a python dict return.
  • matching on type might be a perf concern as well, I'm not super confident in my understanding of python perf nuances at that level, so maybe the serialize function is best 馃し

Update: I'm working on a PR for this, now I understand pydantic_encoder better and see why the dict structure isn't really relevant. Will go with accepting a callable.

I don't agree about adding another method, see #1001

Is anyone working on this? I would like to work on it if nobody is

@Sheshtawy I have not had time to consider a new approach since my closed PRs, so go ahead.

@samuelcolvin In reference to your comments
Here:

We'll have to take some care over how we implement serialize=True to maintain good performance.

And here: https://github.com/samuelcolvin/pydantic/pull/1986#issuecomment-716188634

What concerns do you have regarding implementation here? What other things would like to see addressed in a PR addressing this feature? I know performance is an important priority for the library, but what does that imply in terms of implementation?

Was this page helpful?
0 / 5 - 0 ratings