I'm wondering about making a major change to pydantic's internals in v2.
Validators are currently a list of functions that are called one after another, here's the actual code
I want to switch to an "Onion" in much the same way many web frameworks implement middleware, here is a description from django's docs.
The idea is that validation would be done by one function, which when called, called the next layer down until you get to the inner-most function which just does some parsing/validation and returns the value.
Your custom validators can now look like
@validator('foobar')
def foobar_validator(cls, v, handler)
v = v * 2
v = handler(v)
return v.upper()
(or something more sane)
Instead of either pre=True or pre=False, and the need for multiple validators if you want to do both.
The validator can also choose not to call handler and thereby skip heavy validation in some cases, or implement custom behaviour.
@validator('ts')
def ts_validator(cls, v, handler)
if v == 'now':
return datetime.now()
else:
return handler(v)
This might be useful for the case of None which currently has lots of custom logic around not calling further validators when the value is None.
You could also catch exceptions from inner validation and modify the error or continue with some default value.
For very simple cases (which are very common) e.g. a plain string, validation would be as simple as calling one function.
This should be much faster than the current "iterate oven a list of one element". Edit: or perhaps not, hard to say.
Code like
can be modified to
@classmethod
def __validate__(cls, value: Any, **kwargs) -> Decimal:
value = decimal_validator(value, **kwargs)
value = number_size_validator(value, **kwargs)
value = number_multiple_validator(value, **kwargs)
... do custom decimal validation
Which cython should be able to compile to be much faster.
Instead of the current __get_validators__ interface, custom types could simply have a __validate__ method, which is called for validation. We could do this anyway I guess, but it would make more sense with the onion.
If we ever provide a way to get access to the stack trace from the exception when validation fails, it should be longer but clearer tha the current one would be. (This is not currently a feature request, maybe I'm the only person who would want it).
But it might not be too bad if we do it in a major version change and use it as an opportunity to rewrite a lot of fields.py which I think could do with an improvement (e.g. the way we deal with None)
I guess for decorated validators we have to provide some backwards compatibility, just a wrapper function which calls the actual validator either before or after handler() depending on pre.
To keep the interface as as simple as possible, it would be useful if handler was always an option argument to a validator, so I guess this could stay for good.
Currently we have lots of clever (and ugly) logic so that all the keyword arguments to validators are optional.
As the decimal example above shows, this will very confusing and brittle when validators are manually calling other validators.
I think best that all validators take config, fields and values (and perhaps context #1549) and pass them all on to other validators. Validators that don't need access to some arguments can just ignore them with **kwargs. This should remove a lot of fluff from _generic_validator_basic
Decorated validators are a special case and keyword arguments should still all be optional.
I guess this will be faster in some cases and slower in others. Will it make pydantic faster overall?
The main problem is avoiding dynamic functions which cython can't compile.
My first proposal for an implementation of Onion is very simple:
class Onion:
def __init__(self, *functions: Callable):
self.outer_layers = list(functions)
self.inner = self.outer_layers.pop()
self.layer_iter: Iterable[Callable]
def __call__(self, v):
self.layer_iter = iter(self.outer_layers)
return self.call_layer(v)
def call_layer(self, v):
try:
func = next(self.layer_iter)
except StopIteration:
return self.inner(v)
else:
return func(v, handler=self.call_layer)
Can we do much better than that performance-wize?
Most system I think do something more like
validator = partial(validator, handler=inner_validator)
is that faster with cython?
More generally, does this sound like a good idea?
@PrettyWood @tiangolo @StephenBrown2 @dmontagu, anyone else?
First and foremost thank you for this clear and explicit proposal!
Onion ring model is the reason I switched from express to koa on an old project: writing custom middleware was way more enjoyable!
I reckon this is a great idea as it makes the whole validation workflow way more explicit. I went through the pros and cons but imo the cons are not really a problem. The biggest challenge is the "big rewrite" part but there were already big plans and a lot of code to change anyway.
I think best that all validators take config, fields and values (and perhaps context #1549) and pass them all on to other validators.
👍 Or simply just one ctx variable like node middleware ((ctx, next)), which would have the config, fields, values and other extra variables that could be modified on the fly when writing a custom validator?
My two main concerns are
yield in the middle of the validator instead of this handler function to split pre and post validation. I need to take some time to think it through.We agree that it would mean this example would be written like this?
class DemoModel(BaseModel):
square_numbers: List[int] = []
cube_numbers: List[int] = []
@validator('*')
def validate_all(cls, v, handler):
if isinstance(v, str):
v = v.split('|')
v = handler(v)
if sum(v) > 42:
raise ValueError('sum of numbers greater than 42')
return v
@validator('square_numbers', each_item=True)
def check_squares(cls, v):
assert v ** 0.5 % 1 == 0, f'{v} is not a square number'
return v
@validator('cube_numbers', each_item=True)
def check_cubes(cls, v):
assert v ** (1 / 3) % 1 == 0, f'{v} is not a cubed number'
return v
I'm a bit confused by the proposal.
Does that mean that if I write a custom validator like:
@validator('foobar')
def foobar_validator(cls, v, handler)
v = v * 2
return v.upper()
and I forget to add v = handler(v) (e.g. because I'm new the pydantic) that other validators will not be called? That seems like an issue.
I don't see the advantage over the existing pre=True approach. e.g. what if I want to validate a datetime? Currently I can assume that if a unix epoch time integer is passed in, by the time it gets to my validator, it will be a datetime. Right? Can I still assume that in the onion model? Or do I have to re-invent the wheel for every validator to coerce the type into what I expect?
Wouldn't an onion traceback be harder to read, since it will include validator functions which have already passed? So it would be full of red herrings?
This might be useful for the case of None which currently has lots of custom logic around not calling further validators when the value is None.
Couldn't the current model be extended by adding a new semantic of "SkipValidation" type or an exception type to avoid the ambiguity of the None usage.
For example,
for validator in validators:
try:
v = validator(cls, v, values, self, self.model_config)
except SkipValidation as e:
return e.value
except (ValueError, TypeError, AssertionError) as exc:
return v, ErrorWrapper(exc, loc)
Or
for validator in validators:
try:
v = validator(cls, v, values, self, self.model_config)
if isinstance(v, SkipValidation):
return v.value
except (ValueError, TypeError, AssertionError) as exc:
return v, ErrorWrapper(exc, loc)
return v, None
It's also worth investigating how this new onion model fits with the new Annotated feature introduced in Python 3.9 via PEP-593. Many of the examples are encoding the types with metadata which could be used at runtime (e.g., validation). In general, it's not completely clear to me how this new Python 3.9 feature plays nicely with Pydantic.
Borrowing an example from the PEP,
(Note, ValueRange isn't concretely defined, it's general metadata that can be added. In other words, Annotated[T, x], Annotated[T, x, y, ...])
from typing import Annotated
from dataclasses import dataclass
@dataclass
class Record:
x: Annotated[int, ValueRange(3, 10)]
Or
from typing import Annotated
from dataclasses import dataclass
T = Annotated[int, ValueRange(-10, 5)]
@dataclass
class Record:
x: T
y: Annotated[T, ValueRange(-20, 3)]
As a result, y has an annotated type of Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)].
I think Pydantic is a great project. It looks like the project is getting a lot of attention and the community is growing quite a bit in 2020.
https://star-history.t9t.io/#samuelcolvin/pydantic
For the overall health of the project, It might be useful to be clear about where this feature fits in the larger scope of the project and how it competes with the new changes to typing within Python, as well as the growing number of new feature requests and bugs in the backlog.
Great! :nerd_face: :rocket:
Mainly what the pros you already mentioned:
pre=True, pre=False, and having the order just depend on the order of internal code calls seems like it could be more intuitive. (but also a con below :pensive: )yield, so, great!__validate__ instead of __get_validators__: I think this is one of the things I like the most. ✨ It would simplify a lot the simplest cases, that I would expect to be common, to make some object a "valid" Pydantic type.handler(v)(As @mdavis-xyz mentions).
I like the simplicity of not having to understand pre=True and pre=False, but I see that could be a problem (maybe worse than learning pre?).
In FastAPI, with the current behavior, I think it's great that the client gets all the errors, at the exact location, after a single request.
With the onion, if validation fails at one of the layers, it would return that single error, and the client would have to send the request again to see the next validation error from the next layer, and so and so. A bit more irritating than getting all the things to fix in a single validation error response.
One way to make it work with the onion is by returning a tuple of the value and errors, like fields do (and maybe optionally also raising to force skipping any further processing?). But that seems like way more complex for the user/developer than both the current approach and the proposed onion. :pensive:
About the signature with config, fields, context, the way I'm doing it in Typer, similar to FastAPI, is passing the data by type annotations. So, if the user declares a parameter config: Config, they would receive the config in that parameter. The same for fields: List[FieldInfo], values: List[Any], and value: Any (or value: str, or whatever).
The pros of this approach:
configuration: Config would still work.Config either ("was it Configuration, Configs?), because the editor can give completion for the class Config, etc. But the editor can't help guessing the exact parameter name config.The cons:
Thank you so much for everyone's replies, it's really useful.
I'll try and reply to everyone, but let me know if I miss something.
@PrettyWood: 👍 Or simply just one ctx variable like node middleware
agreed, this is a good idea anyway, it's described in #2034. However an onion isn't required for single ctx argument (though a single ctx argument is required for the onion)
@PrettyWood: have a pytest fixture-like API with optional yield in the middle of the validator instead of this handler function to split pre and post validation. I need to take some time to think it through.
This is an interesting idea, I had a think about this and built a simple demo of it:
import re
from typing import Generator
from devtools import debug
def int_validator(value: str) -> Generator[int, int, int]:
if value == 'one':
# here further validation is short circuited
return 1
validated_value = yield re.sub(r'[^\d.\-]', '', value)
yield abs(validated_value)
def run_validator(raw_value: str) -> int:
g = int_validator(raw_value)
try:
intermediate = next(g)
except StopIteration as e1:
return e1.value
else:
as_int = int(intermediate)
return g.send(as_int)
debug(run_validator('xxx 123'))
debug(run_validator('one'))
debug(run_validator('-12'))
debug(run_validator('broken'))
This is sort of very cool, pros:
send() which is very rarely used, I get a geeky pleasure out of using an exotic language featurereturn with a yield you get a normal function, not a generator and validation takes the normal coursecons:
send() which is very rarely used and confusing - I had some memory of it existing, but I had to google for a bit to even find ithandle(value)overall this is cool, but I don't think it's a good way forward.
@mdavis-xyz: and I forget to add
v = handler(v)
Good catch, this could definitely be a problem, two work arounds:
allow_skip which if this is omitted or False, accidently skipping further validation would raise a config error@mdavis-xyz: Or do I have to re-invent the wheel for every validator to coerce the type into what I expect?
no you don't that's what handle() does.
@mdavis-xyz: Wouldn't an onion traceback be harder to read, since it will include validator functions which have already passed? So it would be full of red herrings?
It would be longer and more complex, but more complete.
@mpkocher: Couldn't the current model be extended by adding a new semantic of "SkipValidation" type or an exception type to avoid the ambiguity of the None usage.
yes, that would be possible.
What would not be possible would be catching errors in other "deeper" validators and modifying the exception or returning a different value.
@mpkocher: new Annotated feature introduced in Python 3.9 via PEP-593
I don't think this feature/change has anything to do with PEP-593, maybe pydantic will get support for PEP-593 annotations (new issue please if you want this), but it's not related to this issue.
@mpkocher: For the overall health of the project, It might be useful to be clear about where this feature fits in the larger scope of the project and how it competes with the new changes to typing within Python, as well as the growing number of new feature requests and bugs in the backlog.
I'm not sure what you mean. Pydantic is popular, I'm trying to make it better, python typing is changing and not doing much to make runtime type checking easier. Again, I don't see how all that relates to this issue.
@tiangolo: A simple __validate__ instead of __get_validators__: I think this is one of the things I like the most. ✨
Agreed, but that's really related to #2034, not this. Still it makes more sense with this.
@tiangolo: Not getting all the errors
I think you're mistaken here. Multiple errors are only shown when you have a Union and validation against all the types in the union fails, that wouldn't change here.
@tiangolo: About the signature with config, fields, context, the way I'm doing it in Typer, similar to FastAPI, is passing the data by type annotations.
that's more or less what we're doing now, but to do it fast with cython, we have to have ugly logic like this.
But it also makes it much harder for one validator to call another validator, hence #2034.
Performance is still the big outstanding question. I think #2034 will indirectly improve performance a lot. I think using an onion (in addition to #2034) will not have a big effect on performance, but I need to do some work to make sure of that.
Overall my opinion is that this would be a good thing if we can achieve it without damaging performance.
Most helpful comment
Great! :nerd_face: :rocket:
The things I like about this
Mainly what the pros you already mentioned:
pre=True,pre=False, and having the order just depend on the order of internal code calls seems like it could be more intuitive. (but also a con below :pensive: )yield, so, great!__validate__instead of__get_validators__: I think this is one of the things I like the most. ✨ It would simplify a lot the simplest cases, that I would expect to be common, to make some object a "valid" Pydantic type.The doubts I have:
Forgetting to call
handler(v)(As @mdavis-xyz mentions).
I like the simplicity of not having to understand
pre=Trueandpre=False, but I see that could be a problem (maybe worse than learningpre?).Not getting all the errors
In FastAPI, with the current behavior, I think it's great that the client gets all the errors, at the exact location, after a single request.
With the onion, if validation fails at one of the layers, it would return that single error, and the client would have to send the request again to see the next validation error from the next layer, and so and so. A bit more irritating than getting all the things to fix in a single validation error response.
Solutions...?
One way to make it work with the onion is by returning a tuple of the value and errors, like fields do (and maybe optionally also raising to force skipping any further processing?). But that seems like way more complex for the user/developer than both the current approach and the proposed onion. :pensive:
Other ideas
About the signature with
config,fields,context, the way I'm doing it in Typer, similar to FastAPI, is passing the data by type annotations. So, if the user declares a parameterconfig: Config, they would receive the config in that parameter. The same forfields: List[FieldInfo],values: List[Any], andvalue: Any(orvalue: str, or whatever).The pros of this approach:
configuration: Configwould still work.Configeither ("was itConfiguration,Configs?), because the editor can give completion for the classConfig, etc. But the editor can't help guessing the exact parameter nameconfig.The cons: