ref #284 and #360
Pydantic is no a validation library, it's a parsing library.
It makes no guarantee whatsoever about checking the form of data you give to it; instead it makes a guarantee about the form of data you get out of it.
This sounds like an esoteric difference, but it has real practical consequences, eg.
"3" (which is not an int) to an int field, pydantic will convert it to an int3.14 (which again is not an int) to an int field, pydantic will convert it to an int (thereby "loosing information")I think this is correct and I'm not interested in changing it, but we should be clear about what pydantic is/does - while I was annoyed by the manor of the question in #284 (sorry, wrong issue, I meant #360) I do understand the motivation for the question.
To fix this we should:
validate_model to parse_model (this will require the old version to continue to work with a deprecation warning for 2 versions)__get_validators__ to __get_parser_functions__ or similar (again will require the old function to continue to work with a deprecation warning)This is a big, backwards incompatible change for no material benefit but I think it's worth it for clarity.
I have no objection to this renaming. I also think it's a better fit for pydantic use-cases.
We use it as an advanced dataclass so that we can ensure that the data we want to save in our DB is consistent with the data model we have specified. We also use it to mangle messages sent on a queue into a data model that we can easily use in our code and store it as json documents that adhere to the specifications of the models we have written.
We were using cerberus but it was way harder to read the model, and we really didn't need to validate the incoming data as long as it was convertible into the data type we wanted in the model. We can even mangle the data in really advanced ways using for example pre-parsing.
tldr;
I support this.
Agreed. Makes sense.
This was probably implied, but since it wasn't specifically called out in the checkboxes, I wanted to add just in case:
validate should probably also be changed to say parse (namely validate_all and validate_always).I've decided to chill out on this a lot, I'm going to add roughly the above warning to the docs. but avoid all the renaming.
"Validate" seems to be much more commonly used than "parse", also one step of parsing is validating, so like it or not; pydantic does do validation. Just that it's primary purpose isn't validation.
@samuelcolvin As a casual Pydantic repo followers and _light_ user, this issue and some of these nuances caught me by surprise. I suspect some of the (not especially polite) tone in related issues stems from similar misunderstandings.
Just my 2¢, but I think changing some of the documentation language and package tagline would help tremendously here (much more so than breaking API changes). For example, the Google snippet for Pydantic starts "Data validation and settings management...", and the introductory section of the docs has validation/validate 4 times.
Adding a section that highlights "What type of validation does Pydantic perform?" or "How does Pydantic validate input data?" in the early introductory text would probably help a great deal too.
I hope that's a helpful suggestion, and thanks for the great library!
yes, I agree. That's basically what I decided after a long think.
Recently I read this blog post and it made me think of the the line from the pydantic docs:
pydantic is primarily a parsing library, not a validation library
In case anyone finds their way to this thread in an attempt to understand the implications of the difference between parsing and validation, it might be an interesting read.
I have spent 1 hour trying to figure out if I could well use Pydantic for data validation or not. I guess the answer is YES. The following note (from the documentation): "pydantic is primarily a parsing library, not a validation library. Validation..." leads you to an UNNECESSARY rabbit hole that could well be avoided by writing something along these lines (in the same note section):
"Although validation is not the main purpose YOU CAN USE THIS LIBRARY for validation; go to the Validators section..."
It is written in the Validators section: "Custom validation ... can be achieved" !. So my point is fair; this should be explicitly mentioned in that Note box (in the Models section).
For me the introduction in the docs was misleading:
Data validation and settings management using python type annotations.
pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
Define how data should be in pure, canonical python; validate it with pydantic.
I had wrong expectations for the library, I think clarifying this would greatly help in avoiding potential issues. Apart from being surprised with implicit data parsing/conversion, I also made the mistake of not returning values in the validators (which are not only validators, but also parsers/deserializers), which led to some surprising behavior. The later issue was resolved after reading the docs for validators, but would help if I had mindset from the start, that parsing is the main goal of the library, while (strict) validation is also possible.
For those who – like me – stumble on this discussion and want to determine if and how pydantic can be used for strict validation: After reading through various issues examples/types_strict.py was what helped me a lot. IMHO it should be mentioned in the models/#data-conversion.
Suggestion:
This is a deliberate decision of *pydantic*, and in general it's the most useful approach, see
[here](https://github.com/samuelcolvin/pydantic/issues/578) for a longer discussion of the subject.
Nevertheless strict type checking is supported. See [examples/types_strict.py](docs/examples/types_strict.py).
Most helpful comment
@samuelcolvin As a casual Pydantic repo followers and _light_ user, this issue and some of these nuances caught me by surprise. I suspect some of the (not especially polite) tone in related issues stems from similar misunderstandings.
Just my 2¢, but I think changing some of the documentation language and package tagline would help tremendously here (much more so than breaking API changes). For example, the Google snippet for Pydantic starts "Data validation and settings management...", and the introductory section of the docs has validation/validate 4 times.
Adding a section that highlights "What type of validation does Pydantic perform?" or "How does Pydantic validate input data?" in the early introductory text would probably help a great deal too.
I hope that's a helpful suggestion, and thanks for the great library!