Pydantic: Support Generic Container Types

Created on 30 Jan 2019  Â·  25Comments  Â·  Source: samuelcolvin/pydantic

Feature Request

We want to define our own generic container-types (similar to List[T]), but we want to do the type-checking on our own:

class Array(numpy.ndarray, Collection[T]):
    @classmethod
    def get_validators(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        return np.array(val, dtype=cls.__args__[0])

(We read in very large lists from a json-file, and pydantic's validation logic is a real bottleneck).

class MyModel:
    values: Array[float]

Problem

For now, pydantic assumes that generic types are from a selected set (Tuple, List, Mapping, plus a few others) and therefore this assertion error is thrown here:

assert issubclass(origin, Mapping)

Possible solutions

Allowing this, could generally be done in two ways:

  • Treat custom generic types not differently than other types. The type-parameter is just ignored by pydantic.
  • Have a notion of general generic types in pydantic, and provide a validation-interface which allows users to interact with the type-parameters.
feature request help wanted

Most helpful comment

A minor improvement to @samuelcolvin 's code:

  • Added copy=False in numpy.array's call to avoid copying the array when a copy is not needed.
  • Made __dtype__ = None by default to avoid an extra branch
  • Added an optional shape to the __dtype__, which is used to validate the shape.
import pydantic
import numpy


class _ArrayMeta(type):
    def __getitem__(self, t):
        return type('Array', (Array,), {'__dtype__': t})


class Array(numpy.ndarray, metaclass=_ArrayMeta):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        dtype = getattr(cls, '__dtype__', None)
        if isinstance(dtype, tuple):
            dtype, shape = dtype
        else:
            shape = tuple()

        result = numpy.array(val, dtype=dtype, copy=False, ndmin=len(shape))
        assert not shape or len(shape) == len(result.shape)  # ndmin guarantees this

        if any((shape[i] != -1 and shape[i] != result.shape[i]) for i in range(len(shape))):
            result = result.reshape(shape)
        return result


class Model(pydantic.BaseModel):
    int_values: Array[float]
    any_values: Array
    shaped1_values: Array[float, (-1, )]
    shaped2_values: Array[float, (2, 1)]
    shaped3_values: Array[float, (4, -1)]
    shaped4_values: Array[float, (-1, 4)]


m = Model(
    int_values=[1, 2, 3],
    any_values=[1, 'hello'],
    shaped1_values=numpy.array([1.1, 2.0]),
    shaped2_values=numpy.array([1.1, 2.0]),
    shaped3_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
    shaped4_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
)
print(m)

assert (m.int_values == numpy.array([1.0, 2.0, 3.0])).all()
assert (m.any_values == numpy.array(['1', 'hello'], dtype='<U21')).all()
assert (m.shaped1_values == numpy.array([1.1, 2. ])).all()
assert (m.shaped2_values == numpy.array([[1.1], [2.]])).all()
assert (m.shaped3_values == numpy.array([[1.1], [2.], [2.0], [3.0]])).all()
assert (m.shaped4_values == numpy.array([[1.1, 2., 2.0, 3.0]])).all()


class Model(pydantic.BaseModel):
    a: Array[float, (1, 10)]


# errors: with ValidationError: cannot reshape array of size 3 into shape (1,10) (type=value_error)
# Model(a=[1, 1, 2])

All 25 comments

Why can't you do this with validators and pre=True, whole=True and type List[float]? Or sequence as per #304.

I definitely think validation via the get_validators route should work even if the class inherits from Collection[T]. Pull request welcome, although I realise this might be quite tricky.

The issue is that pydantic's validation logic for lists kicks in and make parsing the data ~10x slower:

parse_array: Avg 3.4680049419403076 s
parse_list: Avg 38.27087903022766 s

(Data is ~180MB in size).

how about using something like

from pydantic import BaseModel, validator
import numpy


class Model(BaseModel):
    values: numpy.ndarray

    @validator('values', pre=True)
    def parse_values(v):
        return numpy.array(v, dtype=float)

    class Config:
        arbitrary_types_allowed = True


m = Model(values=[1, 2, 3])
print(m)

as a work around?

I'll look into generics more in future, but won't be immediate.

Or to close to what you originally had

class TypedArray(numpy.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        return numpy.array(val, dtype=cls.inner_type)

class ArrayMeta(type):
    def __getitem__(self, t):
        return type('Array', (TypedArray,), {'inner_type': t})

class Array(numpy.ndarray, metaclass=ArrayMeta):
    pass

class Model(BaseModel):
    values: Array[float]

m = Model(values=[1, 2, 3])
print(m)

Thanks for your suggestions!

The first one is a bit unergonomic in my view, since we want to reuse the Array-type a few times at different places.

The second one should work :) That said, I wouldn't be unhappy if pydantic gets improved support for generics in the future.

I think replaced by #556 which is more succinct. Let me know if that's significantly different.

+1 For this for me - I'm trying to figure out how to support Association Proxy fields from SQLAlchemy and map them to Lists. The end result should be array-like but the Association is an instance of Collection rather than list so throws an assertion error when passed as a value to my pydantic model

class MySAModel(Base):
    children = association_proxy(
        'through_table_relation',
        'child_id',
        creator=lambda child_id: ChildOfParent(child_id=child_id)
    )

    ...

instance = Session().query(MySAModel).first()

instance.children # << [1,2,3]
isinstance(instance.children, typing.Collection) # << True
isinstance(instance.children, typing.List) # << False


class MySerializer(BaseModel):
    children: List[int] = []

    class Config:
        orm_mode = True

MySerializer.from_orm(instance)

>   ???
E   pydantic.error_wrappers.ValidationError: 1 validation error for MySerializer
E   children
E     value is not a valid list (type=type_error.list)

Support for Collection here would be really handy. The second solution using metaclasses above does get round this but the inner type of the container is lost when generating a schema which is unfortunate

@bharling if you are willing to get your hands dirty, you can override BaseModel._decompose_class. With a little massaging you should be able to modify the response so that the "children" key's value is converted to a list first.

(Although it starts with _ to prevent field name collisions, this method is intended to be part of the public API.)

I think in v1.0 the GetterDict interface will have improvements to make it easier to apply validators when initializing via from_orm, but you should be able to get this working now without too much effort.

ah ok great thanks for the quick response, I'll see how I get on.

@bharling If you want better support around this point, please leave some feedback on #821 or #822

Ah I see - this actually seems like a better solution as the issue only manifests during the from_orm phase - thanks for the pointers

Hmm - _decompose_class isn't actually helping much unfortunately. It doesn't seem to play well in any kind of derived class. I was hoping to tack it on as a mixin to BaseModels in my app but it never seems to get called

@bharling This is what I had in mind:

from typing import Any, Type, List

from pydantic import BaseModel
from pydantic.utils import GetterDict


class MySerializer(BaseModel):
    children: List[str]
    class Config:
        orm_mode = True

    class CustomGetterDict(GetterDict):
        def get(self, item: Any, default: Any) -> Any:
            attribute = getattr(self._obj, item, default)
            if item == "children":
                attribute = list(attribute)
            return attribute

    @classmethod
    def _decompose_class(cls: Type['Model'], obj: Any) -> GetterDict:
        return MySerializer.CustomGetterDict(obj)

class MySerializerOrm:
    def __init__(self, children):
        self.children = children


non_list_children = (child for child in ["A", "B"])
print(non_list_children)
# generator object <genexpr> at 0x10502bb88>
orm = MySerializerOrm(children=non_list_children)
print(MySerializer.from_orm(orm))
# MySerializer children=['A', 'B']

@dmontagu do you know if theres any plan to add support for Collection type prior to the 1.0 release? I am hitting the SQLAlchemy association problem described by@bharling a lot, and while providing a custom GetterDict works, it feels quite whacky.

Version 1 is released.

What happens if you use Sequence instead of List?

What happens if you use Sequence instead of List?

Same thing:

ValidationError: 1 validation error for User
teams
  value is not a valid sequence (type=type_error.sequence)

(teams is an association proxy)

I should add I am still using 0.32.2 - is the Sequence approach something that is expected to work in version 1.0?

No, was just wondering.

We might be able to support Collections in future, but I suspect GetterDict
is probably the best solution.

On Wed, Oct 23, 2019, 23:30 jonathanunderwood notifications@github.com
wrote:

I should add I am still using 0.32.2 - is the Sequence approach something
that is expected to work in version 1.0?

—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/380?email_source=notifications&email_token=AA62GGNVW22BPW2CHB3G6KDQQDF7TA5CNFSM4GTI7WI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECDCSDA#issuecomment-545663244,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA62GGNPEJJB6NDGRLGTXQDQQDF7TANCNFSM4GTI7WIQ
.

Or to close to what you originally had

class TypedArray(numpy.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        return numpy.array(val, dtype=cls.inner_type)

class ArrayMeta(type):
    def __getitem__(self, t):
        return type('Array', (TypedArray,), {'inner_type': t})

class Array(numpy.ndarray, metaclass=ArrayMeta):
    pass

class Model(BaseModel):
    values: Array[float]

m = Model(values=[1, 2, 3])
print(m)

Hi, I am trying to get this exact thing working but I get this error:

E RuntimeError: no validator found for , see arbitrary_types_allowed in Config

Also, Array isn't recognized as a generic, so I just defined my class as:

    @dataclass
    class HasNpArray:
        values: Array

I don't care so much about the generics functionality, I just want to be able to serialize/deserialize numpy arrays.

from typing import Any
import numpy
from pydantic import BaseModel

class ArrayMeta(type):
    def __getitem__(self, t):
        return type('Array', (Array,), {'__dtype__': t})

class Array(numpy.ndarray, metaclass=ArrayMeta):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        dtype = getattr(cls, '__dtype__', Any)
        if dtype is Any:
            return numpy.array(val)
        else:
            return numpy.array(val, dtype=dtype)

class Model(BaseModel):
    int_values: Array[float]
    any_values: Array

m = Model(int_values=[1, 2, 3], any_values=[1, 'hello'])
print(m)

Thank you. I was able to get this to functionally work however mypy gives this error on Array[float]:

error: "Array" expects no type arguments, but 1 given

I can life without type assertions though.

Removing ArrayMeta and switching to using TypeVar / Generic seems to work fine though. (this is with Python 3.7), so that works for me. TY!

if TYPE_CHECKING:
    Array = numpy.array
else:
    class Array...

but you can experiment

A minor improvement to @samuelcolvin 's code:

  • Added copy=False in numpy.array's call to avoid copying the array when a copy is not needed.
  • Made __dtype__ = None by default to avoid an extra branch
  • Added an optional shape to the __dtype__, which is used to validate the shape.
import pydantic
import numpy


class _ArrayMeta(type):
    def __getitem__(self, t):
        return type('Array', (Array,), {'__dtype__': t})


class Array(numpy.ndarray, metaclass=_ArrayMeta):
    @classmethod
    def __get_validators__(cls):
        yield cls.validate_type

    @classmethod
    def validate_type(cls, val):
        dtype = getattr(cls, '__dtype__', None)
        if isinstance(dtype, tuple):
            dtype, shape = dtype
        else:
            shape = tuple()

        result = numpy.array(val, dtype=dtype, copy=False, ndmin=len(shape))
        assert not shape or len(shape) == len(result.shape)  # ndmin guarantees this

        if any((shape[i] != -1 and shape[i] != result.shape[i]) for i in range(len(shape))):
            result = result.reshape(shape)
        return result


class Model(pydantic.BaseModel):
    int_values: Array[float]
    any_values: Array
    shaped1_values: Array[float, (-1, )]
    shaped2_values: Array[float, (2, 1)]
    shaped3_values: Array[float, (4, -1)]
    shaped4_values: Array[float, (-1, 4)]


m = Model(
    int_values=[1, 2, 3],
    any_values=[1, 'hello'],
    shaped1_values=numpy.array([1.1, 2.0]),
    shaped2_values=numpy.array([1.1, 2.0]),
    shaped3_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
    shaped4_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
)
print(m)

assert (m.int_values == numpy.array([1.0, 2.0, 3.0])).all()
assert (m.any_values == numpy.array(['1', 'hello'], dtype='<U21')).all()
assert (m.shaped1_values == numpy.array([1.1, 2. ])).all()
assert (m.shaped2_values == numpy.array([[1.1], [2.]])).all()
assert (m.shaped3_values == numpy.array([[1.1], [2.], [2.0], [3.0]])).all()
assert (m.shaped4_values == numpy.array([[1.1, 2., 2.0, 3.0]])).all()


class Model(pydantic.BaseModel):
    a: Array[float, (1, 10)]


# errors: with ValidationError: cannot reshape array of size 3 into shape (1,10) (type=value_error)
# Model(a=[1, 1, 2])

Hello current and future pydantic users - this topic is near and dear to my heart, and I chose to run a bit with the solutions presented above. Here's something that I came up with, using typing.Generic instead of custom metaclasses - the latter method wasn't passing type checks in my codebase. I didn't get around to shapes because those tend to be a bit more dynamic for my usecase. Hope it's useful to someone!

DType = TypeVar('DType')


class TypedArray(np.ndarray, Generic[DType]):
    """Wrapper class for numpy arrays that stores and validates type information.
    This can be used in place of a numpy array, but when used in a pydantic BaseModel
    or with pydantic.validate_arguments, its dtype will be *coerced* at runtime to the
    declared type.
    """

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, val, field: ModelField):
        dtype_field = field.sub_fields[0]
        actual_dtype = dtype_field.type_.__args__[0]
        # If numpy cannot create an array with the request dtype, an error will be raised
        # and correctly bubbled up.
        np_array = np.array(val, dtype=actual_dtype)
        return np_array

Full gist with tests to demonstrate functionality: https://gist.github.com/danielhfrank/00e6b8556eed73fb4053450e602d2434

Was this page helpful?
0 / 5 - 0 ratings