pydantic version: 1.5
pydantic compiled: True
install path: C:\Work\repos\data-purge.venv\Lib\site-packages\pydantic
python version: 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)]
platform: Windows-10-10.0.18362-SP0
optional deps. installed: []
I'm using a framework (lambda) that expects the lambdas to return a simple dict to be passed along to another step. This is done through marshaling that I don't really have a lot of access to modifying. I'm using pydantic models for all of my de/serialization, and it had been working great to just say something to the effect of return result.dict()
However, now that I've added a datetime field to the model, I get a failure on: Unable to marshal response: Object of type datetime is not JSON serializable
Now, I know this is a pretty standard python issue with anything Json related. But given that pydantic is such a great library for managing all this stuff, and it's not uncommon to have _another_ library expect to receive dicts, not json strings, I think it would be nice if we could have some way for .dict() to return a serializable format by default.
Essentially, I want to still have my pydantic model define the fields as a datetime, but I want the .dict step to convert those into a string format, rather than the datetime objects themselves. Is this at all possible/reasonable?
In [1]: from pydantic import BaseModel
In [2]: from datetime import datetime
In [3]: class Thing(BaseModel):
...: x: datetime
...:
In [5]: a = Thing(x="2020-01-01T00:00:00")
In [6]: a.dict()
Out[6]: {'x': datetime.datetime(2020, 1, 1, 0, 0)}
# preferred output:
Out[6]: {"x": "2020-01-01T00:00:00"}
...
Is this already possible? Could this be added in some kind of way? This isn't the first time I've hit this scenario, and I can't find an elegant way around it. If only python actually was reasonable about datetimes, I wouldn't expect pydantic or something else to solve it :/
The thing is, dicts are intended to have python objects in them. All pydantic models do have a .json() function which will automatically serialize common items into something json-serializable, and you can use the json_encoders config dict to customize the format if you wish, but by default, this is what you get with .json():
In [1]: from pydantic import BaseModel
In [2]: from datetime import datetime
In [3]: class Thing(BaseModel):
...: x: datetime
...:
In [4]: a = Thing(x="2020-01-01T00:00:00")
In [5]: a.json()
Out[5]: '{"x": "2020-01-01T00:00:00"}'
Which means that you can round-trip it back to a dict if you wish with:
In [6]: import json
In [7]: json.loads(a.json())
Out[7]: {'x': '2020-01-01T00:00:00'}
Which is how I handled things before exclude_none was a thing.
I can _probably_ accept that this is just a limitation conceptually in the way some of the underlying tools are defined. But I will say that conceptually, in principle, objects go through a transition of 4 states: Model > Dict w/ python objects > dict of serializable types > string of dict
It's unfortunate to me that it's not possible to actually step to the third state without doing something like skipping to step 4 and rolling back to step 3 by way of json.loads
You can also define a custom type that validates a datetime and returns the string version, something like:
In [1]: from datetime import datetime
...: from pydantic import BaseModel
...: from pydantic.datetime_parse import parse_datetime
...:
...:
...: class StringDate(datetime):
...: @classmethod
...: def __get_validators__(cls):
...: yield parse_datetime
...: yield cls.validate
...:
...: @classmethod
...: def validate(cls, v: datetime):
...: return v.isoformat()
...:
...: class Thing(BaseModel):
...: x: StringDate
...:
...: a = Thing(x="2020-01-01T00:00:00")
...:
...: a.dict()
Out[1]: {'x': '2020-01-01T00:00:00'}
Or, even more compactly, defining a custom validator on the model:
In [1]: from datetime import datetime
...: from pydantic import BaseModel, validator
...:
...:
...: class Thing(BaseModel):
...: x: datetime
...:
...: @validator("x")
...: def datetime_to_string(cls, v):
...: return v.isoformat()
...:
...: a = Thing(x="2020-01-01T00:00:00")
...:
...: a.dict()
Out[1]: {'x': '2020-01-01T00:00:00'}
I've had a similar problem as well. My use-case is that I'm writing an HTTP handler that traffics in Pydantic models, but in a framework that has baked-in opinions about how it wants to do things like encode datetimes that I don't agree with but have to work with.
The current pattern I use is something like json.loads(model.json()), which yields an object that's unambiguous dicts/lists/strs etc. without any fancy types, and then the HTTP framework happily encodes that object into a JSON string (again) without doing anything surprising to my datetimes, which are now in string form. (If I were to just pass model.json(), it would escape the already-JSON string and produce nonsense... argh!)
Relatedly, there are cases where I want to dict() something to pass to other Python code to continue processing, but also take advantage of json_encoders to convert some string-wrapping newtype (e.g. Mongo's bson.ObjectId) into a normal object (str).
In short, it would be nice if there were a method halfway between dict and json that basically produced vanilla Python objects that were unambiguously convertible to JSON types but without the performance penalty of round-tripping through a JSON string, that is, semantically equivalent to json.loads(model.json()). Strawman proposal: model.json(objects=True).
Also, the conceptualization in https://github.com/samuelcolvin/pydantic/issues/1409#issuecomment-620141422 and "rolling back" to step 3 is on-point. That's exactly my problem.
Another notable awkwardness (again, not a _problem_ but just a suggestion of why it feels inconsistent) is that if you have a sub-object that's a BaseModel, it's not like .dict() returns that as a BaseModel. It dict-ifies every pydantic model in the tree, but keeps pythony objects for everything else.
So one might argue that if .dict() serializes the BaseModel all the way down the tree, there might be a way to do the same for other types.
We had the same issue, to write into a dynamodb.
For this we use:
from fastapi.encoders import jsonable_encoder
class Thing(BaseModel):
x: datetime
a = Thing(x="2020-01-01T00:00:00")
json_compatible_item_data = jsonable_encoder(a)
This is not a new idea @tiangolo submitted a PR for this way back in 2018 #317. I refused that PR, but I think we should reconsider it.
The reasons I refused that PR are explained in https://github.com/samuelcolvin/pydantic/pull/317#issuecomment-443689941 but it comes down to basically: "ujson.loads(m.dict() will likely be faster" and "not everyone uses JSON and people will want different behaviour". I also want to be able to reuse the serialization logic in pydantic_encoder.
I think we should do the following:
serialize kwarg to dict() which has type Union[bool, Callable] = False - when True the model is returned as a dict with only JSON types (this will use pydantic_encoder), when False (the default) the current behaviour is maintained, when a callable that callable will be called on all objects that aren't valid JSON to allow you to customise how things are serialised.pydantic/json.py to pydantic/serialisation.py as it's not just related to JSON encoding.We'll have to take some care over how we implement serialize=True to maintain good performance.
I think it would be great to have some form of this :tada:
Just a note about the implementation, it could make sense to have an external function to do the serialization instead of a class method (or additional to?), that way it could support serializing an list of type List[SomeModel] and not only a sub-class of BaseModel (e.g. SomeModel).
Being able to serialize a list of models would still be needed either way for fields of a type like that, e.g.:
from pydantic import BaseModel
class SubModel(BaseModel):
name: str
class Model(BaseModel):
sequence_field: List[SubModel]
Here Model.sequence_field would need that serialization of list of models. But being able to use it outside of fields could be potentially useful and I think it would probably involve a similar effort.
Adding the serialize kwarg to dict() as described above, will be a significant step forward in clarity and performance for a use case that I frequently encounter - serializing to YAML. 馃檪
It makes sense that pydantic should not attempt to include output serialization for all data formats (not reasonable or sustainable). Exposing pydantic's capability to serialize complex data types into basic Python types provides excellent utility to everyone desiring to serialize to some other output format.
@samuelcolvin excited to see this gaining some traction! If I might make a suggestion:
I think I would weak-prefer (as in, fine if it doesn't happen, but just stating a preference) a pattern of .dict(), .serialize(), .json() over a .dict(serialize=), .json() as it's more functionally accurate that a pydantic model is essentially passing between three states. BaseModel -> Dict w/ types specified by model -> Dict with serializeable types -> json string. These states seem best represented by 3 independent functions, IMO.
Along with that, I might recommend the signature for serialize be something like:
def serialize(self, type_converters: Dict[Union[Type, Literal['_']], Callable[[Type], Any] = default_json_encoder_func) -> Dict[str, Any]:
# pseudocode
for field in fields:
if type(field) in type_converters:
converted_field = type_converters[type(field)](field)
elif '_' in type_converters:
converted_field = type_converters['_'](field)
else:
converted_field = field
Where instead of specifying a function that converts a whole object, we might say that someone wants to put a specific set of conversions on particular types. They could of course say { "_": convert_all_types()} which is implied to be the converter for any types not explicitly specified (the _ matching the default pattern matching indicator in PEP-622)
Now, some obvious caveats:
.dict and .json code yet.serialize implies something more than just a python dict return.Update: I'm working on a PR for this, now I understand pydantic_encoder better and see why the dict structure isn't really relevant. Will go with accepting a callable.
I don't agree about adding another method, see #1001
Is anyone working on this? I would like to work on it if nobody is
@Sheshtawy I have not had time to consider a new approach since my closed PRs, so go ahead.
@samuelcolvin In reference to your comments
Here:
We'll have to take some care over how we implement serialize=True to maintain good performance.
And here: https://github.com/samuelcolvin/pydantic/pull/1986#issuecomment-716188634
What concerns do you have regarding implementation here? What other things would like to see addressed in a PR addressing this feature? I know performance is an important priority for the library, but what does that imply in terms of implementation?
Most helpful comment
This is not a new idea @tiangolo submitted a PR for this way back in 2018 #317. I refused that PR, but I think we should reconsider it.
The reasons I refused that PR are explained in https://github.com/samuelcolvin/pydantic/pull/317#issuecomment-443689941 but it comes down to basically: "
ujson.loads(m.dict()will likely be faster" and "not everyone uses JSON and people will want different behaviour". I also want to be able to reuse the serialization logic inpydantic_encoder.I think we should do the following:
serializekwarg todict()which has typeUnion[bool, Callable] = False- whenTruethe model is returned as a dict with only JSON types (this will usepydantic_encoder), whenFalse(the default) the current behaviour is maintained, when a callable that callable will be called on all objects that aren't valid JSON to allow you to customise how things are serialised.pydantic/json.pytopydantic/serialisation.pyas it's not just related to JSON encoding.We'll have to take some care over how we implement
serialize=Trueto maintain good performance.