Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":
pydantic version: 1.3
pydantic compiled: False
install path: /Users/foo/git/bar//.venv/lib/python3.6/site-packages/pydantic
python version: 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
platform: Darwin-19.2.0-x86_64-i386-64bit
optional deps. installed: ['typing-extensions', 'email-validator']
The following code exposes that attr is much faster than Pydantic
attr - 0.07763974100816995
pydantic - 1.9739535469561815
I was reading your benchmarks page and i can't figure how come it is slower than attr
Anyways, the performance is critical for my project and i would like to see if i am missing something before choosing attr over Pydantic (which i really like and appreciate)
Here is my test code :
import attr
from typing import List, Optional
from pydantic import BaseModel
class DateColumnPydentic(BaseModel):
"""Date column from DP"""
name: str
type: str
values: List[int]
max_date: Optional[int]
min_date: Optional[int]
@attr.s
class DateColumnAttr(object):
"""Date column from DP"""
name = attr.ib(type=str)
type = attr.ib(type=str)
values = attr.ib(type=List[int])
max_date = attr.ib(type=Optional[int])
min_date = attr.ib(type=Optional[int])
def pyd_func():
name="avi"
type_="type2"
values=[1,2,3,4,5]
max_date=10
min_date=1
return DateColumnPydentic(
name=name,
type=type_,
values=values,
max_date=max_date,
min_date=min_date
)
def attr_func():
name="avi"
type_="type2"
values=[1,2,3,4,5]
max_date=10
min_date=1
return DateColumnAttr(
name=name,
type=type_,
values=values,
max_date=max_date,
min_date=min_date
)
import timeit
print(timeit.timeit(attr_func, number=100000))
print(timeit.timeit(pyd_func, number=100000))
Much of this difference is because you're pydantic install is not compiled. If you're actual application will run on Linux, pydantic will be much faster in production.
But also, the example is very simple, and there's no type coercion required. Have a look at the actual benchmark code for an example of a more realistic case.
That said attr is fast, sometimes faster than pydantic. But if speed is critical, don't write the code in python - write it in rust, c++ or c. If instead readability and development time are critical, pydantic often delivers more readable code than attr because it comes with more batteries included.
Hope that helps.
@samuelcolvin thanks for the detailed response.
Our product is running in production on top of FastApi web framework therefore we use Pydantic data validation and OpenApi integration.
The relative slowness of Pydantic is actually painful and we'll have to address it and figure how to handle it in production.
Is latency or throughput your challenge?
If latency, what's your limit time and expected time?
If throughput, how many requests are you processing per second? What
proportion of your total saas/cloud cost is dedicated to executing python
(and input parsing/validation in particular)? What proportion of the total
project cost is saas/cloud?
On Tue, Jan 7, 2020, 10:41 avi tal notifications@github.com wrote:
@samuelcolvin https://github.com/samuelcolvin thanks for the detailed
response.
Our product is running in production on top of FastApi web framework
therefore we use Pydantic data validation and OpenApi integration.The relative slowness of Pydantic is actually painful and we'll have to
address it and figure how to handle it in production.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/1153?email_source=notifications&email_token=AA62GGMFT63HWYBKEGTIU43Q4RL4BA5CNFSM4KDNOQUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIIOPGQ#issuecomment-571533210,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AA62GGLSJEUNKPEFYEZLF5TQ4RL4BANCNFSM4KDNOQUA
.
I've just gone through your example. It's completely meaningless since attr isn't doing any validation:
@attr.s
class DateColumnAttr(object):
"""Date column from DP"""
name = attr.ib(type=str)
type = attr.ib(type=str)
values = attr.ib(type=List[int])
max_date = attr.ib(type=Optional[int])
min_date = attr.ib(type=Optional[int])
def attr_func():
name = 3
type_ = {3, 4}
values = 'string'
max_date = {1: 2}
min_date = 1.5
return DateColumnAttr(
name=name,
type=type_,
values=values,
max_date=max_date,
min_date=min_date
)
print(attr_func())
#> DateColumnAttr(name=3, type={3, 4}, values='string', max_date={1: 2}, min_date=1.5)
@avi3tal
As @samuelcolvin has pointed out, the vast majority of pydantic's overhead is due to validation -- without this, you wouldn't be able to automatically return the detailed 422 response if incorrectly formatted data is sent to your FastAPI endpoint. (The attrs code in particular wouldn't be able to automatically return this.)
In most cases the overhead coming from the validation adds more benefits than costs, which is why it is the default behavior in FastAPI. However, even though FastAPI has been designed in such a way that performing all validation is the most convenient approach, it does not lock you into that pattern.
If you need maximum performance for a specific endpoint and want to use hand-written validation logic to minimize validation overhead, there are various ways to accomplish this. The most straightforward is just to read the request body directly off of the fastapi Request, and parse it yourself. You can even make use of the BaseModel.construct function to produce a pydantic BaseModel instance with similar performance to the attrs code you've provided (though like this attrs example it will not validate the inputs). You'd just be responsible for validating and raising errors as desired.
As @samuelcolvin noted, it may also make sense to implement the validation logic in Cython/C/C++/Rust if you need to make it faster (e.g., if validating a long list). (Cython would probably be the easiest to use if you haven't done this before.)
In practice, for most Web APIs, only a small fraction of the endpoints need special treatment for performance. In such cases, I think hand-tuning a specific performance-sensitive endpoint is reasonable, and preferable to switching to more complex server infrastructure. However, if you find yourself needing to hand-tune many/most of your endpoints, or if you only have a small number of endpoints and are happy to invest deeply into optimizing them all, you may be better off using a Go/Rust web API framework.
Is latency or throughput your challenge? If latency, what's your limit time and expected time? If throughput, how many requests are you processing per second? What proportion of your total saas/cloud cost is dedicated to executing python (and input parsing/validation in particular)? What proportion of the total project cost is saas/cloud?
…
On Tue, Jan 7, 2020, 10:41 avi tal @.*> wrote: @samuelcolvin https://github.com/samuelcolvin thanks for the detailed response. Our product is running in production on top of FastApi web framework therefore we use Pydantic data validation and OpenApi integration. The relative slowness of Pydantic is actually painful and we'll have to address it and figure how to handle it in production. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1153?email_source=notifications&email_token=AA62GGMFT63HWYBKEGTIU43Q4RL4BA5CNFSM4KDNOQUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIIOPGQ#issuecomment-571533210>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA62GGLSJEUNKPEFYEZLF5TQ4RL4BANCNFSM4KDNOQUA .
@avi3tal I'm curious about the answer of this question.
Most helpful comment
@avi3tal
As @samuelcolvin has pointed out, the vast majority of pydantic's overhead is due to validation -- without this, you wouldn't be able to automatically return the detailed 422 response if incorrectly formatted data is sent to your FastAPI endpoint. (The
attrscode in particular wouldn't be able to automatically return this.)In most cases the overhead coming from the validation adds more benefits than costs, which is why it is the default behavior in FastAPI. However, even though FastAPI has been designed in such a way that performing all validation is the most convenient approach, it does not lock you into that pattern.
If you need maximum performance for a specific endpoint and want to use hand-written validation logic to minimize validation overhead, there are various ways to accomplish this. The most straightforward is just to read the request body directly off of the fastapi
Request, and parse it yourself. You can even make use of theBaseModel.constructfunction to produce a pydanticBaseModelinstance with similar performance to theattrscode you've provided (though like thisattrsexample it will not validate the inputs). You'd just be responsible for validating and raising errors as desired.As @samuelcolvin noted, it may also make sense to implement the validation logic in Cython/C/C++/Rust if you need to make it faster (e.g., if validating a long list). (Cython would probably be the easiest to use if you haven't done this before.)
In practice, for most Web APIs, only a small fraction of the endpoints need special treatment for performance. In such cases, I think hand-tuning a specific performance-sensitive endpoint is reasonable, and preferable to switching to more complex server infrastructure. However, if you find yourself needing to hand-tune many/most of your endpoints, or if you only have a small number of endpoints and are happy to invest deeply into optimizing them all, you may be better off using a Go/Rust web API framework.