Pydantic: Disable all validations

Created on 14 Oct 2019 · 10Comments · Source: samuelcolvin/pydantic

Question

Please complete:

OS: Ubuntu 18.04
Python version import sys; print(sys.version):
3.7.3 (default, Apr 3 2019, 19:16:38)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
Pydantic version import pydantic; print(pydantic.VERSION): 0.32.2

Hello anyone,
It possible to disable all validations of incoming data?

import pydantic

class MyModel(pydantic.BaseModel):
    class Config:
        validation = False

...

feature request

Source

prostomarkeloff

🚀2

Most helpful comment

This is what construct is for:

https://github.com/samuelcolvin/pydantic/blob/6cda388c7a1af29d17973f72dfebd98cff626988/pydantic/main.py#L417

It definitely needs documenting but I don't think we need a new method.

If it's signature needs changing to do what you want, we should do that pre v1 if possible.

samuelcolvin on 15 Oct 2019

👍4

All 10 comments

I am also very interested in this question, as in many cases in my code I know the values are correct and don't require validation. It would also be nice to have an API for initializing a model instance from known-valid inputs without needing to go through the various alias-lookups etc.

In general though I wouldn't want to disable the validation at the model level, but rather on a per-initialization basis. (So a separate classmethod for initializing from known-valid values would be nice.)

I think this is possible in one way or another; I'll see if I can make any progress.

@prostomarkeloff if you really do want "models" with validation completely disabled under all circumstances, you might try using vanilla dataclasses?

dmontagu on 14 Oct 2019

Okay, yes, I got this working (based on BaseModel.validate):

The following method could be added to your model subclass if you want to do 100% unvalidated assignment (beyond checking field names and falling back to default values):

    @classmethod
    def unvalidated(__pydantic_cls__: "Type[Model]", **data: Any) -> "Model":
        for name, field in __pydantic_cls__.__fields__.items():
            try:
                data[name]
            except KeyError:
                if field.required:
                    raise TypeError(f"Missing required keyword argument {name!r}")
                if field.default is None:
                    # deepcopy is quite slow on None
                    value = None
                else:
                    value = deepcopy(field.default)
                data[name] = value
        self = __pydantic_cls__.__new__(__pydantic_cls__)
        object.__setattr__(self, "__dict__", data)
        object.__setattr__(self, "__fields_set__", set(data.keys()))
        return self

I did some performance tests and for data-heavy models, and the performance difference is enormous:

Click to expand the benchmark script

from contextlib import contextmanager
import time
from datetime import datetime
from typing import List, Dict, Iterator

from pydantic import BaseModel


class DoubleNestedModel(BaseModel):
    number: int
    message: str


class SubDoubleNestedModel(DoubleNestedModel):
    timestamps: List[datetime]


class NestedModel(BaseModel):
    number: int
    message: str
    double_nested: DoubleNestedModel


class SubNestedModel(NestedModel):
    timestamps: List[datetime]


class Model(BaseModel):
    nested: List[Dict[str, NestedModel]]


class SubModel(Model):
    other_nested: Dict[str, List[NestedModel]]
    timestamps: List[datetime]


def get_sub_model(timestamp: datetime) -> SubModel:
    timestamps = [timestamp] * 5
    sub_double_nested = SubDoubleNestedModel(number=1, message="a", timestamps=timestamps)
    sub_nested = SubNestedModel(number=2, message="b", double_nested=sub_double_nested, timestamps=timestamps)

    nested = [{letter: sub_nested for letter in 'abcdefg'}]
    other_nested = {letter: [sub_nested] * 5 for letter in 'abcdefg'}
    return SubModel(nested=nested, other_nested=other_nested, timestamps=timestamps)


def get_sub_model_unvalidated(timestamp: datetime) -> SubModel:
    timestamps = [timestamp] * 5
    sub_double_nested = SubDoubleNestedModel.unvalidated(number=1, message="a", timestamps=timestamps)
    sub_nested = SubNestedModel.unvalidated(number=2, message="b", double_nested=sub_double_nested,
                                            timestamps=timestamps)

    nested = [{letter: sub_nested for letter in 'abcdefg'}]
    other_nested = {letter: [sub_nested] * 5 for letter in 'abcdefg'}
    return SubModel.unvalidated(nested=nested, other_nested=other_nested, timestamps=timestamps)


@contextmanager
def basic_profile(label: str) -> Iterator[None]:
    t0 = time.time()
    yield
    t1 = time.time()
    print(f"{label}: {(t1 - t0):,.3f}s")


def run():
    n_warmup_runs = 1000
    n_runs = 10000

    timestamp = datetime.utcnow()
    sub_model = get_sub_model(timestamp)
    unvalidated_sub_model = get_sub_model_unvalidated(timestamp)

    assert sub_model == unvalidated_sub_model

    for _ in range(n_warmup_runs):
        get_sub_model(timestamp)
        get_sub_model_unvalidated(timestamp)

    with basic_profile("validated"):
        for _ in range(n_runs):
            get_sub_model(timestamp)

    with basic_profile("unvalidated"):
        for _ in range(n_runs):
            get_sub_model_unvalidated(timestamp)


run()

The result:

validated: 2.918s
unvalidated: 0.083s

That's a 35x speedup. Not surprising, given that validators are called on each list element / dict value. But given the performance benefits I think it may be worth using an API like this if you know it's safe. Currently it's probably bit risky due to the lack of type checking, but it would be very easy to use a mypy plugin to set the signature in the same way currently done in https://github.com/samuelcolvin/pydantic/pull/722 for the __init__ function. So I think this could be supported in a statically checked way. (Note: this would not call any custom validators or similar.)

Note: in my testing, removing the field checks and default value inference sped things up even further to an ~43x speedup, but I think having the field and default checks are worthwhile.

@samuelcolvin do you think there is any room for a method like this on the BaseModel class in pydantic proper? (Maybe with a little more polish in case I'm mishandling some edge cases?) Or do you think this should be left to users to implement at their own risk?

dmontagu on 14 Oct 2019

🚀3

This is what construct is for:

https://github.com/samuelcolvin/pydantic/blob/6cda388c7a1af29d17973f72dfebd98cff626988/pydantic/main.py#L417

It definitely needs documenting but I don't think we need a new method.

If it's signature needs changing to do what you want, we should do that pre v1 if possible.

samuelcolvin on 15 Oct 2019

👍4

@samuelcolvin I wasn't aware of that, that's nice.

It seems to me that 1) support for default values, and 2) not needing to manually specify the fields_set would go a long way toward making it more of a drop in replacement for __init__ when the input data is known-valid.

I think it would be easier to justify the refactor efforts (in a pydantic-using codebase where I want to use construct for performance reasons) if it was literally just a matter of adding .construct for the speed up, as opposed to the more potentially involved refactor currently required.

dmontagu on 15 Oct 2019

I would be in favor of a private method like _construct instead of what is currently called construct, and have construct use defaults if missing and set the fields_set based on the keyword arguments provided.

I would also understand if you didn't want to add a breaking API change this far into the 1.0 beta. I would definitely be in favor of adding the above-described functionality though even if it needs to be a different method name or a private call. But I can always just define it on custom base class, so it's not the end of the world.

dmontagu on 15 Oct 2019

👍1

I am also very interested in this question, as in many cases in my code I _know_ the values are correct and don't require validation. It would also be nice to have an API for initializing a model instance from known-valid inputs without needing to go through the various alias-lookups etc.

In general though I wouldn't want to disable the validation at the _model_ level, but rather on a per-initialization basis. (So a separate classmethod for initializing from known-valid values would be nice.)

I _think_ this is possible in one way or another; I'll see if I can make any progress.

@prostomarkeloff if you really do want "models" with validation completely disabled under all circumstances, you might try using vanilla dataclasses?

Thank you for answer! What about dataclasses? Firstly, i don't want to mix really different libraries in one project, secondly, dataclasses slower than pydantic. Some models in my project need this feature.

prostomarkeloff on 15 Oct 2019

Okay, yes, I got this working (based on BaseModel.validate):

The following method could be added to your model subclass if you want to do 100% unvalidated assignment (beyond checking field names and falling back to default values):
    @classmethod
    def unvalidated(__pydantic_cls__: "Type[Model]", **data: Any) -> "Model":
        for name, field in __pydantic_cls__.__fields__.items():
            try:
                data[name]
            except KeyError:
                if field.required:
                    raise TypeError(f"Missing required keyword argument {name!r}")
                if field.default is None:
                    # deepcopy is quite slow on None
                    value = None
                else:
                    value = deepcopy(field.default)
                data[name] = value
        self = __pydantic_cls__.__new__(__pydantic_cls__)
        object.__setattr__(self, "__dict__", data)
        object.__setattr__(self, "__fields_set__", set(data.keys()))
        return self
I did some performance tests and for data-heavy models, and the performance difference is enormous:
Click to expand the benchmark script

The result:
validated: 2.918s
unvalidated: 0.083s
That's a 35x speedup. Not surprising, given that validators are called on each list element / dict value. But given the performance benefits I think it may be worth using an API like this if you know it's safe. Currently it's probably bit risky due to the lack of type checking, but it would be very easy to use a mypy plugin to set the signature in the same way currently done in #722 for the __init__ function. So I think this could be supported in a statically checked way. (Note: this would not call any custom validators or similar.)

Note: in my testing, removing the field checks and default value inference sped things up even further to an ~43x speedup, but I think having the field and default checks are worthwhile.

@samuelcolvin do you think there is any room for a method like this on the BaseModel class in pydantic proper? (Maybe with a little more polish in case I'm mishandling some edge cases?) Or do you think this should be left to users to implement at their own risk?

Awesome! I think it's what i need.

prostomarkeloff on 15 Oct 2019

I would also understand if you didn't want to add a breaking API change this far into the 1.0 beta.

Hopefully this wouldn't be much of a breaking change, and it would be to an as-yet undocumented feature.

I would be in favor of a private method like _construct instead of what is currently called construct, and have construct use defaults if missing and set the fields_set based on the keyword arguments provided.

Please have a look at #898 which is my proposal for how to fix this.

If you think that it would be useful, it'll also need documentation of construct adding.

samuelcolvin on 15 Oct 2019

Hi,
So what's the final conclusion on this @dmontagu do I have to use the class that you mentioned above to disable validations in my model or is there a model config attribute that I can use?

Like the reporter mentioning a desired snippet at the top: