Pydantic: How to have an “optional” field but if present required to conform to non None value?

Created on 12 Feb 2020 · 16Comments · Source: samuelcolvin/pydantic

How to have an “optional” field but if present required to conform to non None value?

How can I have an optional field where None is not allowed? Meaning a field may be missing but if it is present it should not be None.

from pydantic import BaseModel

class Foo(BaseModel):
    count: int
    size: float = None  # how to make this an optional float? 

 >>> Foo(count=5)
 Foo(count=5, size=None)  # GOOD - "size" is not present, value of None is OK

 >>> Foo(count=5, size=None)
 Foo(count=5, size=None) # BAD - if field "size" is present, it should be a float

 # BONUS
 >>> Foo(count=5)
 Foo(count=5)  # BEST - "size" is not present, it is not required to be present, so we don't care about about validating it all.  We are using Foo.json(exclude_unset=True) handles this for us which is fine.

I cross posted SO:
https://stackoverflow.com/questions/60191270/python-pydantic-how-to-have-an-optional-field-but-if-present-required-to-con

question

Source

mgcdanny

Most helpful comment

Thank you for pointing out exclude_unset as a possible solution. Unfortunately I am aware that it exists, but my issue is that it is very unergonomic and excessively verbose to have to dump the object or manually inspect an internal attr like model.__fields_set__ if you want to get at only the fields that were specified. You should not have to dump the object before doing anything useful with it.

I'm not sure "the approach used by pydantic is the approach used by mypy". Mypy doesn't concern itself with things that don't exist at all, so it's not really comparable. Mypy can only type check things that exist – pydantic had to make a choice about how to handle missing fields which aren't even there.

This is a bit different from idiomatic python as well, because in idiomatic python (at least in the past), the reason that unspecified optional kwargs defaulted to None was because there wasn't actually a concept of optional kwargs – there were just kwargs that you could pretend were optional if you specified their default value as None. But for most purposes, this wasn't too different from specifying any other value as the default for the kwarg .

I think the real reason for my confusion is that to my mind it doesn't make much sense for a default to be arbitrarily chosen as None when a field is Optional[X]. Why not choose a random (probably falsey) value of X as the default? If I specify a field as Optional[int] and don't specify the field in construction, where is the logic in loading/dumping it as None rather than calling it 0? As the programmer, I have given no preference for one or the other. If it's Optional[str], why not default to '' instead of None? With the idea that Optional[X] means None or X but not missing, this arbitrary decision disappears. And then if you wanted that behavior, you could do the pythonic thing of setting the default as None, i.e. with Optional[X] = None. I think that is overall much more pythonic than the current behaviour, no?

I will admit that I am using pydantic for JSON-centric parsing, but I think I have made it clear that this is also more of a general obejction to conflation rather than a complaint that "this doesn't perfectly mirror JSON's behavior". I understand that this is a generic data library (builtin methods for dumping to JSON notwithstanding). I'm more looking at things like marshmallow (what I was using before in Python and which is very widely used), which is also not JSON-centric but easily allows for the behavior we are talking about (in fact it is the default).

acnebs on 5 Mar 2020

👍9 👎1

All 16 comments

use a validator

from pydantic import BaseModel, validator

class Foo(BaseModel):
    count: int
    size: Optional[float] = None

    @validator('size')
    def prevent_none(cls, v):
        assert v is not None, 'size may not be None'
        return v

samuelcolvin on 12 Feb 2020

👍1

Just to add onto @samuelcolvin's answer, this works because, by default, validators aren't called on arguments that are not provided (though there is a keyword argument on @validator that can make this happen). (This is described in the docs linked above.)

(Another route would be to use a sentinel value (e.g., object()) as the default (this might require some Config changes to make it work, I'm not 100% sure), and add an always=True validator that converts the exact default value to None, and raises an error if None was provided.)

Note that the partial lack of idempotency may cause trouble with certain frameworks (like FastAPI) which may convert the model to dict form (which may have the None), and then call the initializer again on the dumped data. In particular, as of now you are likely to run into this issue if you specify the model as a response_model to a FastAPI endpoint.

There may be a way to achieve a similar pattern that can track whether the root source of the value was default initialization (defeating the idempotency issues), but you'd have to avoid the use of None. (Something that you ensure JSON-encodes to None might work though, if that's the context you are working in.)

dmontagu on 12 Feb 2020

👍4

This is irking me at the moment. Passing in a field as None is fundamentally different from not passing in a field at all. Right now, pydantic conflates the two by using the Optional type for both use-cases. In code:

Foo(x=1, y=None)

is different (with regards to the intentions of the programmer) from

Foo(x=1)

There might be more use-cases, but my own and I think the most obvious one is that sometimes I want to do a partial update: pass in some values, validate them (with pydantic) and then update them elsewhere (say: in my datastore).

But. I don't want the values to be null. I want them to be either some type or I don't want them to exist at all. Having to use a custom validator for this wherever I need it is a lot of extra effort.

At a library level, it seems there are two ways of dealing with this.

One is to allow BaseModel's to have a special configuration option that changes the behaviour to be something like this:

from pydantic import Required

class Foo(BaseModel):
    a: Required[int]
    b: int
    c: Optional[int]
    d: Required[Optional[int]]
    class Config:
        require_by_default = False (default: True)

This would result in these requirements:

a to be present and only an int
b to be an int (but not necessarily present)
c does not need to be present, and if it is, it can be int or None
d needs to be present, and can be int or None

The other option would be to add a custom type that supports b, so that you don't need a custom config option.

from pydantic import NotNone

class Foo(BaseModel):
    a: int
    b: NotNone[int]
    c: Optional[int]

The problem with this solution is that it does not support use case d, which seems like a good use-case to support.

However, in both cases, b demonstrates the behaviour desired in this issue.

I would actually prefer it if the first option was the default behavior for pydantic, but at this point clearly that is not on the table.

acnebs on 22 Feb 2020

👍5

@samuelcolvin @dmontagu Would there be any willingness to add this functionality to pydantic? I would be willing to start a PR if so. I personally am a big fan of option 1's functionality, as it allows for all possible iterations of providing data to a pydantic class, and I think is a better reflection of what Optional[x] truly is (just Union[x, None]). The current way of doing it is straightforward, but as can be seen in this issue falls short for some use-cases in terms of developer intent.

The set of fields actually passed into instantiation is already stored anyway, so doesn't seem like this would hurt much perf wise.

Some alternatives to calling the "must be present" pydantic type Required could be: Present, Provided, Needed, Mandatory, etc.

acnebs on 2 Mar 2020

I wouldn't necessarily have a problem adding such functionality, but I'll warn you that if your goal is to use this with FastAPI, you may well run into problems until FastAPI is updated to make less use of dump-then-reparse in various places.

After some review of the specs, I think the approach described in your first bullet is a substantially better reflection of OpenAPI/JSON Schema semantics. Because of that, I'm more open to a config-flag-based approach than I might otherwise be.

However, I see two major obstacles:

1) JSON Schema and OpenAPI's interpretation differs from the interpretation of the word Optional in the majority of statically typed languages, including mypy-compatible python (I think this stems from the fact that fields in structs can't just be missing, unlike in JSON/python objects). So I think we probably can't/shouldn't adopt this naming convention (i.e., using Optional and Nullable generic types with these semantics), convenient as it would be for OpenAPI/JSON Schema compatibility.

As a result, I think we should avoid the use of Optional alongside Nullable and Required, since its interpretation differs depending on context.

2) Mypy, the built-in dataclasses library, and basically all python static analysis tools (including IDEs, etc.) treat types as "required" by default, rather than "optional" by default (where I am using the JSON schema notion of required vs. optional here). As a result, these semantics are likely to either work poorly with existing development tools, or add an enormous maintenance burden for plugins (mypy, pycharm, vscode, etc.).

As a result, I think we need to keep annotated fields required by default.

Between the above two points, I think it could make sense to add Unrequired and Nullable as new generic types with the following semantics:

Unrequired[X] is equivalent to Optional[X] with a validator that value is not None when specified.
Nullable[X] is like Optional[X], but the value must be specified. (This is similar to how Optional[X] works with the dataclasses package.)
Unrequired[Nullable[X]] would be equivalent to the semantics currently represented by Optional[X].
Nullable[Unrequired[X]] is undefined/disallowed; in general, Unrequired should never occur as a parameter of another generic. So ideally something like List[Unrequired[int]] would raise a TypeError during the model class creation.

Note that adding another config flag, _especially_ one that modifies fundamental behavior like whether a type is required/optional by default, is likely to introduce many subtle bugs, add a lot of maintenance burden, and make it difficult to refactor things as we discover better approaches to implementation. So despite the comment I made above that I am more open to a config-based approach than I would normally be, I think in this case we should really, really try to avoid it if possible.

But I think maybe the approach using the Unrequired and Nullable generics described above might make this unnecessary.

Note that adding support for Unrequired and Nullable is likely to require a non-trivial amount of effort to add good support for PyCharm (and other IDEs) and mypy. That said, I think adding such support would at least be straightforward, if not quick and easy (unlike, for example, modifying the behavior of pydantic's GenericModel).

dmontagu on 3 Mar 2020

👍1

Thanks for the great thoughts @dmontagu! Need to go over in more detail, but I just had an idea that I thought I'd throw out there (and is somewhat inspired by your previous comment in the thread) before I go to bed and forget it:

What if, instead of Pydantic loading in missing fields as None, it loaded them in as some sort of sentinel object+type, let's say for simplicity an object/type called Missing. Then, Optional[X] would work for the "can't be missing but can be None" use case, while for "can be missing but not None", we'd introduce an Unrequired[X] (or otherwise named) generic type that under the hood is (similar to Optional[X] before it) just Union[X, Missing].

Then, for the "can be missing or None", you could either have Optional[Unrequired[X]], but perhaps more ergonomically the developer could just use Union[X, None, Missing]. Perhaps you could even have a shorthand for that which would just be Disposable[X] or something.

This would presumably alleviate some of the burden with regard to type checking with mypy and the like, no?

That way you don't technically even need generic types like Nullable, Unrequired or Disposable – you could just being loading in missing fields as the missing type and leave the pydantic user to do Union[X, Missing], etc.

acnebs on 4 Mar 2020

Technically that information is already tracked -- you can look inside instance.__fields_set__ to see whether the value was missing or not. So I think things still boil down to what semantics we actually want, and given a decision there, what's the easiest way to implement that.

The obvious challenges I see with adding a new type to represent Missing are:

1) It would likely throw mypy/IDEs for a loop (without extensive plugin work)
2) It would likely add a lot of complexity related to converting Missing to None when dumping the data to a dict and/or JSON
3) It would be a big departure from the way things work now, and I think it's probably a bad idea to break too much existing code, even with a major version bump. Maybe it would be possible to introduce in such a way that existing code was unlikely to break, but it seems like it would be challenging.

Note that adding additional generic types wouldn't require changing the behavior of any existing code. To the extent that we needed to change existing logic to avoid serializing Unrequired properties, that too would likely be a much more localized change than what would need to happen if we used a different approach to represent unspecified values.

dmontagu on 4 Mar 2020

I think this is a duplicate of #1223 (at least the latter part of this discussion).

I'm inclined to stick to the current logic that:

class Model(base):
    a: Optional[int]  # this field is required bit can be given None (to be CHANGED in v2)
    b: Optional[int] = None  # field is not required, can be given None or an int (current behaviour)
    c: int = None  # this field isn't required but must be an int if it is provided (current behaviour)

The only other thing I would consider (but am currently opposed to having read through #1223) is Nullable[] or RequiredOptional[] which is equivalent to Optional[] but requires a value to be provided, e.g. same as a above without a breaking change.

Unrequired is (no pun intended) not required as far I can see since it's the same as c above.

samuelcolvin on 4 Mar 2020

👍1

Oh, I see that c would break mypy. Given that this case seems very rare, can't we stick with the validator approach?

If not, I think Unrequired is a confusing name, how about DisallowNone? Though I'm very far from convinced it needs to be added to pydantic, it would work perfectly well as a custom type.

samuelcolvin on 4 Mar 2020

👍1

The issue is that c is not very rare – it comes up any time someone wants to validate a partial update of some data with pydantic. Which for me happens a lot. Why pass through an entire representation of the data in question when I can instead only pass in the subset of data that I want to update? In fact, I guarantee this a fairly common pattern for many web apps at the very least.

The deserialization/serialization lib I used before pydantic (marshmallow) handled this by having a required param for fields that can't be missing and an allow_none param for "can be None". So the default behavior was actually c.

acnebs on 4 Mar 2020

👍5

I disagree.

Since c can be None, e.g. when it's not provided. Therefore there's no harm in also allowing it to be None when it is provided - e.g. the a or b case.

I therefore continue to hold the opinion that "X is not required, but if it is supplied it may not be None" is not particularly common.

I'll wait to be proved wrong by :+1: on this issue or duplicate issues.

Since there are two workarounds (validators or a custom type), I'm not that interested in continuing this conversation or adding DisallowNone on the basis of opinion alone.

Let's wait to see if others agree with you.

samuelcolvin on 5 Mar 2020

The only reason c can be none is because pydantic returns missing fields as None and conflates disparate behaviours.

How common it is does not change the fact that explicitly passing in some field as None to a pydantic BaseModel is different from not passing in a value at all. Different inputs should have different outputs in the final object.

There is clearly a lot of confusion around how Optional behaves in pydantic. There are many issues in this repo related to that fact. I think a big reason for this is due to:

more than one way to do some things
all permutations not actually being easily possible.

If pydantic actually returned a Missing type for missing fields, you wouldn't need unintuitive magic syntax like here to allow for "Required Optional fields". The syntax would just be Optional[X] for that behavior, which is intuitive and makes sense given that Optional is just Union[X, None].

I recognize that arguably the real reason for all this is because Python's typing module decided to go with calling it Optional instead of Nullable which is not what most other languages would call it and becomes confusing when you throw more libs into the mix.

@dmontagu I'm not sure what you mean by concern (1): How is this not already covered by the functionality of type checkers? Pydantic would return either Missing or X for "CanBeMissing" fields and None or X for "Optional" fields. This seems well within what type checkers are already doing on their own, no?
As to (2), why not just not dump fields that weren't present at init in the first place, like other data serialization libraries do. If a field is missing, you just don't dump it to a dict or json, because if it wasn't in the incoming data (and you intentionally specified that you are OK with it being "missing") why would you put it in the output? The only time you would put it in the output is if you specified some default value, in which case you still wouldn't have a problem.

acnebs on 5 Mar 2020

👍4

@acnebs I agree that it is unfortunate that both missing or specified-as-null are conflated. But for better or worse I think this is ultimately a fairly pythonic convention -- if you have a keyword argument with a default value, you can't tell whether the keyword argument provided was specified as the default value or just not provided. Some may see this as a flaw in the language's design (e.g. Rust users), but at this point it's certainly conventional/idiomatic python.

How common it is does not change the fact that explicitly passing in some field as None to a pydantic BaseModel is different from not passing in a value at all. Different inputs should have different outputs in the final object.

The way you are describing this makes me think you might not be aware that you can obtain the precise set of fields that were set during initialization by using model.dict(exclude_unset=True) or checking model.__fields_set__. If your point is that the current functionality is too un-ergonomic, that's a reasonable perspective, but I just want to make it clear that this capability does currently exist. (Note that it isn't especially well-supported by FastAPI right now though for FastAPI-specific reasons.)

If pydantic actually returned a Missing type for missing fields, you wouldn't need unintuitive magic syntax like here to allow for "Required Optional fields". The syntax would just be Optional[X] for that behavior, which is intuitive and makes sense given that Optional is just Union[X, None].

Personally I am inclined to agree that it might have resulted in less confusion to have x: Optional[X] in a pydantic model behave similarly to other types, and similar to how it would work in a dataclass (required unless a default (e.g. None) is explicitly specified). But in practice this has rarely been an issue. At this point I could go either way in terms of this behavior in v2; it would be a large enough breaking change that I could see an argument against, despite the fact that I personally find it to be the more intuitive/pythonic approach. (Also, perhaps not everyone agrees with that perspective anyway..)

I recognize that arguably the real reason for all this is because Python's typing module decided to go with calling it Optional instead of Nullable which is not what most other languages would call it and becomes confusing when you throw more libs into the mix.

You seem to be approaching the problem from a very JSON-centric perspective, but I would argue pydantic should be somewhat more concerned with type safety than following JSON conventions, and the approach used by pydantic is the approach used by mypy.

Also, I would contest the claim that "most other languages would call this concept Nullable" -- when type-checked with mypy, python's Optional types have essentially the same semantics as Rust's Option, Kotlin's Option, Swift's Optional, C++'s std::optional, etc. (more info here). The only languages I'm familiar with that prefer the term Nullable are C# and TypeScript, and I'm not familiar with any language/type-system besides that of TypeScript that even distinguishes between undefined and null as field values. (And I think most developers coming from other languages would sooner consider this a wart of JavaScript than a feature, despite the minor benefits around simplifying efficient serialization.) But I admittedly don't claim to be an expert on these issues.

At any rate, not everyone is using pydantic strictly for JSON-oriented parsing, so I'm not sure it makes sense to prioritize those conventions here.

@dmontagu I'm not sure what you mean by concern (1): How is this not already covered by the functionality of type checkers? Pydantic would return either Missing or X for "CanBeMissing" fields and None or X for "Optional" fields. This seems well within what type checkers are already doing on their own, no?

Yes, this is true, but the vast majority of existing pydantic code has been written to assume that missing Optional values translate to None, rather than some auxiliary type. And as I said above, this is idiomatic python for optional keyword arguments.

While it could certainly be type-safe (arguably more so than the current approach) to use a fundamentally different type to represent an unspecified value, it would add a large amount of boilerplate any time you didn't want to handle the cases differently, which I would argue is the case for most real applications.

As to (2), why not just not dump fields that weren't present at init in the first place, like other data serialization libraries do. If a field is missing, you just don't dump it to a dict or json, because if it wasn't in the incoming data (and you intentionally specified that you are OK with it being "missing") why would you put it in the output? The only time you would put it in the output is if you specified some default value, in which case you still wouldn't have a problem.

As I said above, this is possible now using exclude_unset=True. As far as I'm aware, outside of working with JSON, it isn't really conventional to remove "unset" fields from an unstructured representation of a class instance. I'd argue the use of the exclude_unset keyword argument is a fairly convenient compromise here.

dmontagu on 5 Mar 2020

👍2

acnebs on 5 Mar 2020

👍9 👎1

How about this idea?

Create a singleton for a missing property, so that the property is allowed to be Missing but not None.

from pydantic import BaseModel
from typing import Union

class MissingType:
    def __repr__(self):
        """Just for pretty printing"""
        return 'Missing'

Missing = MissingType()


class Foo(BaseModel):
    count: int
    size: Union[float, MissingType] = Missing

    class Config:
        arbitrary_types_allowed = True

foo = Foo(count=5)
print(foo)  # count=5 size=Missing
print(foo.size is Missing)  # True

foo = Foo(count=5, size=Missing)
print(foo)  # count=5 size=Missing
print(foo.size is Missing)  # True

foo = Foo(count=5, size=None) #  none is not an allowed value (type=type_error.none.not_allowed)

Actually, this kind of Missing object could be substituted for _missing = object() in pydantic.main, so that users could import Missing from pydantic.

Even better, I think that it could be great if pydantic.Field had an explicit required parameter (in line with OpenAPI) so that each field could be left Missing if there is no default value and if required==False, independent of the type hint. For example, if we want to field to be either None, float or not set at all, we could write

from pydantic import BaseModel
from typing import Optional

class Foo(BaseModel):
    field: Optional[float] = Field(required=False)

foo = Foo()
check_foo(foo)

def check_foo(foo: Foo):

    if foo.size is Missing:
        # field is not set at all
       pass
    elif foo.field is None:
        # field is explicitly set to None
        pass
    else:
        # Do something with the field
        pass

psippola on 31 Aug 2020

👍4

A better Missing will also override its __new__ to be a true singleton. Otherwise, deepcopying a structure with embedded Missings will probably create additional instances of MissingType which won’t pass the is Missing test.