We want to define our own generic container-types (similar to List[T]), but we want to do the type-checking on our own:
class Array(numpy.ndarray, Collection[T]):
@classmethod
def get_validators(cls):
yield cls.validate_type
@classmethod
def validate_type(cls, val):
return np.array(val, dtype=cls.__args__[0])
(We read in very large lists from a json-file, and pydantic's validation logic is a real bottleneck).
class MyModel:
values: Array[float]
For now, pydantic assumes that generic types are from a selected set (Tuple, List, Mapping, plus a few others) and therefore this assertion error is thrown here:
assert issubclass(origin, Mapping)
Allowing this, could generally be done in two ways:
Why can't you do this with validators and pre=True, whole=True and type List[float]? Or sequence as per #304.
I definitely think validation via the get_validators route should work even if the class inherits from Collection[T]. Pull request welcome, although I realise this might be quite tricky.
The issue is that pydantic's validation logic for lists kicks in and make parsing the data ~10x slower:
parse_array: Avg 3.4680049419403076 s
parse_list: Avg 38.27087903022766 s
(Data is ~180MB in size).
how about using something like
from pydantic import BaseModel, validator
import numpy
class Model(BaseModel):
values: numpy.ndarray
@validator('values', pre=True)
def parse_values(v):
return numpy.array(v, dtype=float)
class Config:
arbitrary_types_allowed = True
m = Model(values=[1, 2, 3])
print(m)
as a work around?
I'll look into generics more in future, but won't be immediate.
Or to close to what you originally had
class TypedArray(numpy.ndarray):
@classmethod
def __get_validators__(cls):
yield cls.validate_type
@classmethod
def validate_type(cls, val):
return numpy.array(val, dtype=cls.inner_type)
class ArrayMeta(type):
def __getitem__(self, t):
return type('Array', (TypedArray,), {'inner_type': t})
class Array(numpy.ndarray, metaclass=ArrayMeta):
pass
class Model(BaseModel):
values: Array[float]
m = Model(values=[1, 2, 3])
print(m)
Thanks for your suggestions!
The first one is a bit unergonomic in my view, since we want to reuse the Array-type a few times at different places.
The second one should work :) That said, I wouldn't be unhappy if pydantic gets improved support for generics in the future.
I think replaced by #556 which is more succinct. Let me know if that's significantly different.
+1 For this for me - I'm trying to figure out how to support Association Proxy fields from SQLAlchemy and map them to Lists. The end result should be array-like but the Association is an instance of Collection rather than list so throws an assertion error when passed as a value to my pydantic model
class MySAModel(Base):
children = association_proxy(
'through_table_relation',
'child_id',
creator=lambda child_id: ChildOfParent(child_id=child_id)
)
...
instance = Session().query(MySAModel).first()
instance.children # << [1,2,3]
isinstance(instance.children, typing.Collection) # << True
isinstance(instance.children, typing.List) # << False
class MySerializer(BaseModel):
children: List[int] = []
class Config:
orm_mode = True
MySerializer.from_orm(instance)
> ???
E pydantic.error_wrappers.ValidationError: 1 validation error for MySerializer
E children
E value is not a valid list (type=type_error.list)
Support for Collection here would be really handy. The second solution using metaclasses above does get round this but the inner type of the container is lost when generating a schema which is unfortunate
@bharling if you are willing to get your hands dirty, you can override BaseModel._decompose_class. With a little massaging you should be able to modify the response so that the "children" key's value is converted to a list first.
(Although it starts with _ to prevent field name collisions, this method is intended to be part of the public API.)
I think in v1.0 the GetterDict interface will have improvements to make it easier to apply validators when initializing via from_orm, but you should be able to get this working now without too much effort.
ah ok great thanks for the quick response, I'll see how I get on.
@bharling If you want better support around this point, please leave some feedback on #821 or #822
Ah I see - this actually seems like a better solution as the issue only manifests during the from_orm phase - thanks for the pointers
Hmm - _decompose_class isn't actually helping much unfortunately. It doesn't seem to play well in any kind of derived class. I was hoping to tack it on as a mixin to BaseModels in my app but it never seems to get called
@bharling This is what I had in mind:
from typing import Any, Type, List
from pydantic import BaseModel
from pydantic.utils import GetterDict
class MySerializer(BaseModel):
children: List[str]
class Config:
orm_mode = True
class CustomGetterDict(GetterDict):
def get(self, item: Any, default: Any) -> Any:
attribute = getattr(self._obj, item, default)
if item == "children":
attribute = list(attribute)
return attribute
@classmethod
def _decompose_class(cls: Type['Model'], obj: Any) -> GetterDict:
return MySerializer.CustomGetterDict(obj)
class MySerializerOrm:
def __init__(self, children):
self.children = children
non_list_children = (child for child in ["A", "B"])
print(non_list_children)
# generator object <genexpr> at 0x10502bb88>
orm = MySerializerOrm(children=non_list_children)
print(MySerializer.from_orm(orm))
# MySerializer children=['A', 'B']
@dmontagu do you know if theres any plan to add support for Collection type prior to the 1.0 release? I am hitting the SQLAlchemy association problem described by@bharling a lot, and while providing a custom GetterDict works, it feels quite whacky.
Version 1 is released.
What happens if you use Sequence instead of List?
What happens if you use
Sequenceinstead ofList?
Same thing:
ValidationError: 1 validation error for User
teams
value is not a valid sequence (type=type_error.sequence)
(teams is an association proxy)
I should add I am still using 0.32.2 - is the Sequence approach something that is expected to work in version 1.0?
No, was just wondering.
We might be able to support Collections in future, but I suspect GetterDict
is probably the best solution.
On Wed, Oct 23, 2019, 23:30 jonathanunderwood notifications@github.com
wrote:
I should add I am still using 0.32.2 - is the Sequence approach something
that is expected to work in version 1.0?—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/380?email_source=notifications&email_token=AA62GGNVW22BPW2CHB3G6KDQQDF7TA5CNFSM4GTI7WI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECDCSDA#issuecomment-545663244,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA62GGNPEJJB6NDGRLGTXQDQQDF7TANCNFSM4GTI7WIQ
.
Or to close to what you originally had
class TypedArray(numpy.ndarray): @classmethod def __get_validators__(cls): yield cls.validate_type @classmethod def validate_type(cls, val): return numpy.array(val, dtype=cls.inner_type) class ArrayMeta(type): def __getitem__(self, t): return type('Array', (TypedArray,), {'inner_type': t}) class Array(numpy.ndarray, metaclass=ArrayMeta): pass class Model(BaseModel): values: Array[float] m = Model(values=[1, 2, 3]) print(m)
Hi, I am trying to get this exact thing working but I get this error:
E RuntimeError: no validator found for arbitrary_types_allowed in Config
Also, Array isn't recognized as a generic, so I just defined my class as:
@dataclass
class HasNpArray:
values: Array
I don't care so much about the generics functionality, I just want to be able to serialize/deserialize numpy arrays.
from typing import Any
import numpy
from pydantic import BaseModel
class ArrayMeta(type):
def __getitem__(self, t):
return type('Array', (Array,), {'__dtype__': t})
class Array(numpy.ndarray, metaclass=ArrayMeta):
@classmethod
def __get_validators__(cls):
yield cls.validate_type
@classmethod
def validate_type(cls, val):
dtype = getattr(cls, '__dtype__', Any)
if dtype is Any:
return numpy.array(val)
else:
return numpy.array(val, dtype=dtype)
class Model(BaseModel):
int_values: Array[float]
any_values: Array
m = Model(int_values=[1, 2, 3], any_values=[1, 'hello'])
print(m)
Thank you. I was able to get this to functionally work however mypy gives this error on Array[float]:
error: "Array" expects no type arguments, but 1 given
I can life without type assertions though.
Removing ArrayMeta and switching to using TypeVar / Generic seems to work fine though. (this is with Python 3.7), so that works for me. TY!
if TYPE_CHECKING:
Array = numpy.array
else:
class Array...
but you can experiment
A minor improvement to @samuelcolvin 's code:
copy=False in numpy.array's call to avoid copying the array when a copy is not needed.__dtype__ = None by default to avoid an extra branchshape to the __dtype__, which is used to validate the shape.import pydantic
import numpy
class _ArrayMeta(type):
def __getitem__(self, t):
return type('Array', (Array,), {'__dtype__': t})
class Array(numpy.ndarray, metaclass=_ArrayMeta):
@classmethod
def __get_validators__(cls):
yield cls.validate_type
@classmethod
def validate_type(cls, val):
dtype = getattr(cls, '__dtype__', None)
if isinstance(dtype, tuple):
dtype, shape = dtype
else:
shape = tuple()
result = numpy.array(val, dtype=dtype, copy=False, ndmin=len(shape))
assert not shape or len(shape) == len(result.shape) # ndmin guarantees this
if any((shape[i] != -1 and shape[i] != result.shape[i]) for i in range(len(shape))):
result = result.reshape(shape)
return result
class Model(pydantic.BaseModel):
int_values: Array[float]
any_values: Array
shaped1_values: Array[float, (-1, )]
shaped2_values: Array[float, (2, 1)]
shaped3_values: Array[float, (4, -1)]
shaped4_values: Array[float, (-1, 4)]
m = Model(
int_values=[1, 2, 3],
any_values=[1, 'hello'],
shaped1_values=numpy.array([1.1, 2.0]),
shaped2_values=numpy.array([1.1, 2.0]),
shaped3_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
shaped4_values=numpy.array([1.1, 2.0, 2.0, 3.0]),
)
print(m)
assert (m.int_values == numpy.array([1.0, 2.0, 3.0])).all()
assert (m.any_values == numpy.array(['1', 'hello'], dtype='<U21')).all()
assert (m.shaped1_values == numpy.array([1.1, 2. ])).all()
assert (m.shaped2_values == numpy.array([[1.1], [2.]])).all()
assert (m.shaped3_values == numpy.array([[1.1], [2.], [2.0], [3.0]])).all()
assert (m.shaped4_values == numpy.array([[1.1, 2., 2.0, 3.0]])).all()
class Model(pydantic.BaseModel):
a: Array[float, (1, 10)]
# errors: with ValidationError: cannot reshape array of size 3 into shape (1,10) (type=value_error)
# Model(a=[1, 1, 2])
Hello current and future pydantic users - this topic is near and dear to my heart, and I chose to run a bit with the solutions presented above. Here's something that I came up with, using typing.Generic instead of custom metaclasses - the latter method wasn't passing type checks in my codebase. I didn't get around to shapes because those tend to be a bit more dynamic for my usecase. Hope it's useful to someone!
DType = TypeVar('DType')
class TypedArray(np.ndarray, Generic[DType]):
"""Wrapper class for numpy arrays that stores and validates type information.
This can be used in place of a numpy array, but when used in a pydantic BaseModel
or with pydantic.validate_arguments, its dtype will be *coerced* at runtime to the
declared type.
"""
@classmethod
def __get_validators__(cls):
yield cls.validate
@classmethod
def validate(cls, val, field: ModelField):
dtype_field = field.sub_fields[0]
actual_dtype = dtype_field.type_.__args__[0]
# If numpy cannot create an array with the request dtype, an error will be raised
# and correctly bubbled up.
np_array = np.array(val, dtype=actual_dtype)
return np_array
Full gist with tests to demonstrate functionality: https://gist.github.com/danielhfrank/00e6b8556eed73fb4053450e602d2434
Most helpful comment
A minor improvement to @samuelcolvin 's code:
copy=Falseinnumpy.array's call to avoid copying the array when a copy is not needed.__dtype__ = Noneby default to avoid an extra branchshapeto the__dtype__, which is used to validate the shape.