Pydantic: TypedDict support

Created on 17 Aug 2019 · 15Comments · Source: samuelcolvin/pydantic

Feature Request

It would be great if pydantic supported TypedDict.

OS: MacOS
Python version import sys; print(sys.version): 3.7.4
Pydantic version import pydantic; print(pydantic.VERSION): 0.32.1

from pydantic import BaseModel
from mypy_extensions import TypedDict


class Data(TypedDict):
    a: int


class User(BaseModel):
    data: Data

if __name__ == '__main__':
    external_data = {
        'data': {
            'a': 'invalid',
        }
    }
    # should raise exception
    user = User(**external_data)

feature request help wanted

Source

roganov

👍17

Most helpful comment

Sometimes python amazes me.

NamedTuple
dataclass
TypedDict

It doesn't really fit with the zen of python:

There should be one-- and preferably only one --obvious way to do it.

samuelcolvin on 28 Nov 2019

👍10 😄4 😕2

All 15 comments

This is very close to dataclasses or pydantic's own models. What is the usecase where TypedDict would be preferable to dataclasses or models?

samuelcolvin on 17 Aug 2019

👍2

My primary use case is integrating pydantic into existing source code (which uses marshmallow currently) where plain dictionaries are used. Changing all the source code to using ModelBase instead of plain dicts is infeasible. I can obviously define a pydantic's model and then call .dict(), but it will most definitely incur a significant performance penalty (the project processes event stream and does hundreds or even thousands of validations per second).

roganov on 17 Aug 2019

👍2

I see.

Most of the overheads of parsing data are not in calling .dict(), it's in the actual parsing and (to a lesser extent) building the model of what what the data should be like.

As long as you're creating the models once and then just calling .dict() on them, the performance won't be very different from if we implemented support for TypedDict.

In raw models will be slightly faster, with dataclasses we create a hidden model to do the actual validation, we would probably have to do the same for TypedDict. So you would have normal pydantic performance + some overheads.

By the way dict(model) will be slightly faster than model.dict() since it doesn't have to worry about all the exclude/include logic.

samuelcolvin on 17 Aug 2019

👍1

Given integration requirements with non-pydantic packages/apis (where you would have to load to / dump from a pydantic model), I could also imagine TypedDict having some static type-checking benefits over subclasses of BaseModel (at least, without the pycharm and/or as-yet-unreleased mypy plugin), since replacing with BaseModel would drop the static checking of keyword arguments.

dmontagu on 23 Aug 2019

I looked into this a little, and it looks like it may be difficult to determine at runtime whether a given type is actually a TypedDict. So far, the best check I can find is:

def is_typed_dict_type(type_: AnyType) -> bool:
    return lenient_issubclass(type_, dict) and getattr(type_, '__annotations__', None)

@roganov If this is of critical importance to you, I think the same approach used to get validation for Literal types might work here. In particular, you'd need to write and incorporate TypedDict analogs of make_literal_validator and is_literal_type in the appropriate places. I don't currently have the time to implement this myself, but would review a pull request for it. (Though I would also understand if @samuelcolvin wanted to veto in favor of limiting scope creep of supported typing_extensions types.)

If you want to try implementing it yourself, here's a start, though I expect it may require some tweaks before it fully integrates into the field building process:

from typing import Any, Callable

from typing_extensions import TypedDict

from pydantic import BaseModel, AnyType


def make_typed_dict_validator(type_: Any) -> Callable[[Any], Any]:
    class TypedDictModel(BaseModel):
        __annotations__ = type_.__annotations__
    TypedDictModel.__name__ = type_.__name__

    def typed_dict_validator(v: Any) -> Any:
        return TypedDictModel(**v).dict()

    return typed_dict_validator


def is_typed_dict_type(type_: AnyType) -> bool:
    return issubclass(type_, dict) and getattr(type_, '__annotations__', None)


assert not is_typed_dict_type(BaseModel)
assert not is_typed_dict_type(dict)
assert not is_typed_dict_type(TypedDict)


class A(TypedDict):
    x: int
assert is_typed_dict_type(A)

validator = make_typed_dict_validator(A)

print(validator({"x": 1}))
# {'x': 1}
print(validator({"x": "x"}))
"""
pydantic.error_wrappers.ValidationError: 1 validation error for A
x
  value is not a valid integer (type=type_error.integer)
"""

Alternatively, you may be able to come up with a way to use code like the above to produce a custom type that you can use to validate your TypedDicts.

dmontagu on 23 Aug 2019

I can obviously define a pydantic's model and then call .dict(), but it will most definitely incur a significant performance penalty (the project processes event stream and does hundreds or even thousands of validations per second).

The nature of pydantic validation is that it is done via parsing -- I don't think that you'll be able to use pydantic to validate a TypedDict without essentially having it parse the dict as a model.

That said, thanks to cythonization, __slots__, and other performance-oriented design choices, using pydantic may not be much slower than hand-crafted checks anyway (at least, if they are implemented in python anyway).

dmontagu on 23 Aug 2019

Thanks @dmontagu I'll try implementing this hopefully next week.

roganov on 23 Aug 2019

This feature is now present in Python 3.8 (https://docs.python.org/3/library/typing.html#typing.TypedDict)

ValentinCalomme on 28 Nov 2019

👍2

Sometimes python amazes me.

NamedTuple
dataclass
TypedDict

It doesn't really fit with the zen of python:

There should be one-- and preferably only one --obvious way to do it.

samuelcolvin on 28 Nov 2019

👍10 😄4 😕2

Hi guys, it is a very demanded feature :)
Do you have any plans for its implementation?

askerka on 18 Feb 2020

Happy to accept a PR to implement it.

I don't think in the short term I'll be building it myself.

samuelcolvin on 18 Feb 2020

Sorry for spamming, but maybe for someone, it would be useful as a quick solution (like in my case).
Let say we have the next data structure:

from typing import Dict, List, Optional, TypedDict
from uuid import UUID


class SessionUser(TypedDict):
    id: int
    name: str
    uuid: UUID
    email: str
    username: str


class SessionToken(TypedDict):
    user: SessionUser
    session_id: UUID

then it could be parsed as:

from typing import _TypedDictMeta as TypedDictMeta


types: dict = {}


def parse_dict(typed_dict: TypedDictMeta) -> Type[BaseModel]:
    annotations = {}
    for name, field in typed_dict.__annotations__.items():
        if isinstance(field, TypedDictMeta):
            annotations[name] = (parse_dict(field), ...)
        else:
            default_value = getattr(typed_dict, name, ...)
            annotations[name] = (field, default_value)

    return create_model(typed_dict.__name__, **annotations)


def as_typed_dict(
        json_dict: Dict[str, Any],
        typed_dict: TypedDictMeta,
) -> Dict[str, Any]:
    model = types.get(typed_dict)
    if not model:
        model = types[typed_dict] = parse_dict(typed_dict)

    return model(**json_dict).dict()


as_typed_dict({...}, SessionToken)

askerka on 21 Feb 2020

It would be super cool if one could get a TypedDict from a pydantic class.

Use case: I have code which uses dictionaries a lot. At some point I will convert them to the Pydantic class, but it would help for refactoring to have an in-between state where I use a TypedDict derived from a pydantic class.

MartinThoma on 6 Jul 2020

👍1

I also have a similar use case. I use celery for managing tasks. The biggest challenge with that is serialisation between tasks (since pickle isn't feasible) - to keep everything JSON serialisable I like to use TypedDict but often I'd like to convert those into a pydantic model instead. Custom classes is advantageous over dict in many cases obviously.

My challenge is to due this in a "static type safe" way as well as performant. I feel like there is a lot to gain from pydantic for this use case but I have cracked the code completely yet I feel.

mr-bjerre on 24 Sep 2020

👍1

Adding basic support for TypedDict is easy and doesn't cost much. Seeing the large number of up votes, I reckon it can be added. I opened a PR for this. Feedback more than welcome