Pydantic: Pydantic equivalent of Marshmallow Context-aware serialization

Created on 16 Jan 2020  路  16Comments  路  Source: samuelcolvin/pydantic

Could you please help me to find Pydantic equivalent of Marshmallow Context-aware serialization
https://marshmallow.readthedocs.io/en/stable/why.html#context-aware-serialization

question

Most helpful comment

But don't get me wrong, Pydantic looks super-nice and I like the focus on a clean API + static typing + performance. So thank you for all of that!

Thank you.

a very heavy-handed approach

In what regard "heavy"? It's not computationally heavy, and it's not heavy in terms of code. In many contexts (no pun intended) I think it actually would be the cleanest approach:

from contextlib import contextmanager
from contextvars import ContextVar
from devtools import debug
from pydantic import BaseModel, validator

validation_context_var = ContextVar('validation_context')


class User(BaseModel):
    id: int
    name: str

    @validator('name')
    def anonymize_name(cls, value):
        if validation_context_var.get() == 'anonymize':
            return '<anonymized>'
        return value


@contextmanager
def set_context(state):
    token = validation_context_var.set(state)
    yield
    validation_context_var.reset(token)


with set_context('anonymize'):
    external_data1 = {'id': 123, 'name': 'Fred'}
    user = User(**external_data1)
    debug(user.name)

with set_context('standard'):
    external_data2 = {'id': 123, 'name': 'Jane'}
    user = User(**external_data2)
    debug(user.name)

All 16 comments

In my case I need some way to pass context data to custom validators

So your case this isn't related to serialisation, but rather validation?

In that case, we don't have a direct equivalent to marshmallow's context (though I think it's an interesting idea). The best alternative would be to add another field on a model context, which you can then use to change how validation is done:

from pydantic import BaseModel, validator
from devtools import debug


class User(BaseModel):
    context: str
    id: int
    name: str

    @validator('name')
    def anonymize_name(cls, value, values):
        if values.get('context') == 'anonymize':
            return '<anonymized>'
        return value


external_data1 = {'id': 123, 'name': 'Fred'}

external_data1.update(context='anonymize')
user = User(**external_data1)
debug(user.name)

external_data2 = {'id': 123, 'name': 'Jane'}

external_data2.update(context='standard')
user = User(**external_data2)
debug(user.name)
"""
test.py:20 <module>
    user.name: '<anonymized>' (str) len=12
test.py:26 <module>
    user.name: 'Jane' (str) len=4
"""

I'm liking Pydantic and its API a lot so far. I feel like the feature proposal above (setting context-data or passing in a context-dict that can be accessed in validation methods) would be a great addition.

Humm, I'm reluctant to add new features like this unless they're absolutely necessary.

Between:

  • context field approach explained above
  • contextvars
  • threading.local()
  • and even, dynamic model creation with create_model() (or even parse_obj_as)

We have a lot of options for context specific validation without a needing to add a new high level concept.

But I'm open to ideas.

Threadlocals or contextvars would probably be a very heavy-handed approach for this. The wording "context" was maybe not ideal, because I think the feature idea doesn't necessarily have anything to do with multi-threaded use cases.

  • Imagine a straight-forward multi-tenant application, where the model validates incoming data dependent on the user of the current request. Then you might simply want to pass the user-instance into the pydantic-model (without the hassle of setting up a threadlocal just for validation).
  • Imagine an endpoint where the API consumer specifies a choice out of a set of choices, but the allowed choices depend on a one-to-many relation on a parent object. Then you'd want to pass in the parent-instance or the list of allowed choices.

Both cases are not too unusual and don't have anything to do with threading/contextvars. I guess the context field approach is okay, with a couple of minor downsides:

  • The model will validate the context field even though the data for that is coming from the application and not from the API user.
  • The model will return the context field in the results, even though it's not supposed to be part of the validated data.
  • Potential naming collision with a normal field that happens to be called context. But we cannot use the field name _context for this instead, because then it would be ignored (if I'm not mistaken), right?

But don't get me wrong, Pydantic looks super-nice and I like the focus on a clean API + static typing + performance. So thank you for all of that!

But don't get me wrong, Pydantic looks super-nice and I like the focus on a clean API + static typing + performance. So thank you for all of that!

Thank you.

a very heavy-handed approach

In what regard "heavy"? It's not computationally heavy, and it's not heavy in terms of code. In many contexts (no pun intended) I think it actually would be the cleanest approach:

from contextlib import contextmanager
from contextvars import ContextVar
from devtools import debug
from pydantic import BaseModel, validator

validation_context_var = ContextVar('validation_context')


class User(BaseModel):
    id: int
    name: str

    @validator('name')
    def anonymize_name(cls, value):
        if validation_context_var.get() == 'anonymize':
            return '<anonymized>'
        return value


@contextmanager
def set_context(state):
    token = validation_context_var.set(state)
    yield
    validation_context_var.reset(token)


with set_context('anonymize'):
    external_data1 = {'id': 123, 'name': 'Fred'}
    user = User(**external_data1)
    debug(user.name)

with set_context('standard'):
    external_data2 = {'id': 123, 'name': 'Jane'}
    user = User(**external_data2)
    debug(user.name)

I found this working and clear solution for me :)

Models should inherit this modified BaseModel:

class BaseModelWithContext(BaseModel):
    context: Any

    @root_validator
    def remove_context(cls, values):
        del values['context']
        return values

In validators

@validator('some_field')
def validate_some_field(cls, v, values):
    context = values.get('context')
    ...

Usage (the most beautiful and clear part)

data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})

As for me it looks a little bit better than approach with contextmanager in last example

Maybe in your code you can be sure payload won't include context, in which case that's fine, otherwise

payload = {'x': 'y', 'context': None}
data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})

Will raise TypeError: __init__() got multiple values for keyword argument 'context' or similar. Hence the update(...) my code.

But you can work around that. Both approaches work fine.

In summary, I don't think we need to complicate pydantic's API with custom context tooling.

Yes, you are right regarding TypeError. Here I posted simplified example. In my own case name for context is more complicated. But in other hand TypeError is also nice to see to be sure we don't have conflicts between context we set and context from incoming data. Quiet override may causes hidden problems latter.

One big problems with "context" variable approach :(
Context is not shared with nested objects :(
To make it available for all levels of nested objects I need to add it everywhere and my magic line doesn't work anymore.

data = MyModel(**payload, context={'arg1': 'some_value', 'arg2': 100})

Unfortunately my approach with BaseModelWithContext doesn't work well. root_validator that remove context variable run before root_validators from model that inherit BaseModelWithContext that doesn't allow access values['context'] in other root_validators. @samuelcolvin is there a way to order root_validators in the hierarchical models structure? I expect to see local root validators run before upper(parent) root validators.

@samuelcolvin sorry :(, I'm not sure I understand how to do it right. Could you please post here small example?

from pydantic import BaseModel, root_validator

class FooModel(BaseModel):
    name: str

    @root_validator(pre=True)
    def foo(cls, v):
        print('foo')
        return v

class BarModel(FooModel):
    @root_validator(pre=True)
    def bar(cls, v):
        print('bar')
        return v

# foo will be called first, then bar
# BarModel(name='x')

BarModel.__pre_root_validators__.reverse()

# bar called before foo
BarModel(name='x')

@samuelcolvin Thank you a lot! It works now!!!
but with __post_root_validators__ instead of __pre_root_validators__.
:))))

I'm not in complete agreement that something like this is needed. I've created a new issue #1549 to describe the problem (and hopefully solution in future) in a succinct way. Feedback welcome.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samuelcolvin picture samuelcolvin  路  30Comments

rrbarbosa picture rrbarbosa  路  35Comments

Yolley picture Yolley  路  18Comments

bradodarb picture bradodarb  路  22Comments

jaheba picture jaheba  路  25Comments