Could you please help me to find Pydantic equivalent of Marshmallow Context-aware serialization
https://marshmallow.readthedocs.io/en/stable/why.html#context-aware-serialization
In my case I need some way to pass context data to custom validators
So your case this isn't related to serialisation, but rather validation?
In that case, we don't have a direct equivalent to marshmallow's context (though I think it's an interesting idea). The best alternative would be to add another field on a model context, which you can then use to change how validation is done:
from pydantic import BaseModel, validator
from devtools import debug
class User(BaseModel):
context: str
id: int
name: str
@validator('name')
def anonymize_name(cls, value, values):
if values.get('context') == 'anonymize':
return '<anonymized>'
return value
external_data1 = {'id': 123, 'name': 'Fred'}
external_data1.update(context='anonymize')
user = User(**external_data1)
debug(user.name)
external_data2 = {'id': 123, 'name': 'Jane'}
external_data2.update(context='standard')
user = User(**external_data2)
debug(user.name)
"""
test.py:20 <module>
user.name: '<anonymized>' (str) len=12
test.py:26 <module>
user.name: 'Jane' (str) len=4
"""
I'm liking Pydantic and its API a lot so far. I feel like the feature proposal above (setting context-data or passing in a context-dict that can be accessed in validation methods) would be a great addition.
Humm, I'm reluctant to add new features like this unless they're absolutely necessary.
Between:
context field approach explained abovethreading.local()create_model() (or even parse_obj_as)We have a lot of options for context specific validation without a needing to add a new high level concept.
But I'm open to ideas.
Threadlocals or contextvars would probably be a very heavy-handed approach for this. The wording "context" was maybe not ideal, because I think the feature idea doesn't necessarily have anything to do with multi-threaded use cases.
Both cases are not too unusual and don't have anything to do with threading/contextvars. I guess the context field approach is okay, with a couple of minor downsides:
context field even though the data for that is coming from the application and not from the API user.context field in the results, even though it's not supposed to be part of the validated data.context. But we cannot use the field name _context for this instead, because then it would be ignored (if I'm not mistaken), right?But don't get me wrong, Pydantic looks super-nice and I like the focus on a clean API + static typing + performance. So thank you for all of that!
But don't get me wrong, Pydantic looks super-nice and I like the focus on a clean API + static typing + performance. So thank you for all of that!
Thank you.
a very heavy-handed approach
In what regard "heavy"? It's not computationally heavy, and it's not heavy in terms of code. In many contexts (no pun intended) I think it actually would be the cleanest approach:
from contextlib import contextmanager
from contextvars import ContextVar
from devtools import debug
from pydantic import BaseModel, validator
validation_context_var = ContextVar('validation_context')
class User(BaseModel):
id: int
name: str
@validator('name')
def anonymize_name(cls, value):
if validation_context_var.get() == 'anonymize':
return '<anonymized>'
return value
@contextmanager
def set_context(state):
token = validation_context_var.set(state)
yield
validation_context_var.reset(token)
with set_context('anonymize'):
external_data1 = {'id': 123, 'name': 'Fred'}
user = User(**external_data1)
debug(user.name)
with set_context('standard'):
external_data2 = {'id': 123, 'name': 'Jane'}
user = User(**external_data2)
debug(user.name)
I found this working and clear solution for me :)
Models should inherit this modified BaseModel:
class BaseModelWithContext(BaseModel):
context: Any
@root_validator
def remove_context(cls, values):
del values['context']
return values
In validators
@validator('some_field')
def validate_some_field(cls, v, values):
context = values.get('context')
...
Usage (the most beautiful and clear part)
data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})
As for me it looks a little bit better than approach with contextmanager in last example
Maybe in your code you can be sure payload won't include context, in which case that's fine, otherwise
payload = {'x': 'y', 'context': None}
data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})
Will raise TypeError: __init__() got multiple values for keyword argument 'context' or similar. Hence the update(...) my code.
But you can work around that. Both approaches work fine.
In summary, I don't think we need to complicate pydantic's API with custom context tooling.
Yes, you are right regarding TypeError. Here I posted simplified example. In my own case name for context is more complicated. But in other hand TypeError is also nice to see to be sure we don't have conflicts between context we set and context from incoming data. Quiet override may causes hidden problems latter.
One big problems with "context" variable approach :(
Context is not shared with nested objects :(
To make it available for all levels of nested objects I need to add it everywhere and my magic line doesn't work anymore.
data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})
Unfortunately my approach with BaseModelWithContext doesn't work well. root_validator that remove context variable run before root_validators from model that inherit BaseModelWithContext that doesn't allow access values['context'] in other root_validators. @samuelcolvin is there a way to order root_validators in the hierarchical models structure? I expect to see local root validators run before upper(parent) root validators.
maybe you could reverse __pre_root_validators__ on the model.
@samuelcolvin sorry :(, I'm not sure I understand how to do it right. Could you please post here small example?
from pydantic import BaseModel, root_validator
class FooModel(BaseModel):
name: str
@root_validator(pre=True)
def foo(cls, v):
print('foo')
return v
class BarModel(FooModel):
@root_validator(pre=True)
def bar(cls, v):
print('bar')
return v
# foo will be called first, then bar
# BarModel(name='x')
BarModel.__pre_root_validators__.reverse()
# bar called before foo
BarModel(name='x')
@samuelcolvin Thank you a lot! It works now!!!
but with __post_root_validators__ instead of __pre_root_validators__.
:))))
I'm not in complete agreement that something like this is needed. I've created a new issue #1549 to describe the problem (and hopefully solution in future) in a succinct way. Feedback welcome.
Most helpful comment
Thank you.
In what regard "heavy"? It's not computationally heavy, and it's not heavy in terms of code. In many contexts (no pun intended) I think it actually would be the cleanest approach: