Pydantic: [Feature Request] Provide a discriminated union type (OpenAPI 3)

Created on 25 Jun 2019 · 45Comments · Source: samuelcolvin/pydantic

Feature Request

Pydantic currently has a decent support for union types through the typing.Union type from PEP484, but it does not currently cover all the cases covered by the JSONSchema and OpenAPI specifications, most likely because the two specifications diverge on those points.

OpenAPI supports something similar to tagged unions where a certain field is designated to serve as a "discriminator", which is then matched against literal values to determine which of multiple schemas to use for payload validation. In order to allow Pydantic to support those, I suppose there would have to be a specific type similar to typing.Union in order to specify what discriminator field to use and how to match it. Such a type would then be rendered into a schema object (oneOf) with an OpenAPI discriminator object built into it, as well as correctly validate incoming JSON into the correct type based on the value or the discriminator field. This change would only impact OpenAPI, as JSON schema (draft 7 onwards) uses conditional types instead, which would probably need to be the topic of a different feature request, as both methods appear mutually incompatible.

Implementation ideas

I'd imagine the final result to be something like this.

MyUnion = Union[Foo, Bar]
MyTaggedUnion = TaggedUnion(Union[Foo, Bar], discriminator='type', mapping={'foo': Foo, 'bar': Bar}))

Python doesn't have a feature like TypeScript to let you statically ensure that discriminator exists as a field for all variants of that union, though that shouldn't be a problem since this is going to be raised during validation regardless.

discriminator and mapping could also simply be added to Schema, though I'm not sure about whether it's a good idea to add OpenAPI-specific extensions there.

PEP 593 would also have been a nice alternative, since it would hypothetically allow tagged unions to be implemented as a regular union with annotations specific to Pydantic for that purpose, however it is only still a draft and most likely won't make it until Python 3.9 (if at all).

Schema feature request help wanted

Source

sm-Fifteen

👍47

Most helpful comment

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

samuelcolvin on 11 Aug 2019

👍14

All 45 comments

The validation component of this can already by accomplished via the const kwarg to Schema will be permitted as annotations via Literal once #582 is released.

This allows you to add a field which is constrained to one or multiple values then use that field to discriminate between to models.

However at the moment this doesn't extend to a discriminator object in the schema.

Perhaps it would be possible to either:

automatically spot the discriminator when building schema, or
add a property to Config or a kwarg to Schema to tell pydantic about it

samuelcolvin on 25 Jun 2019

Guessing what the discriminator field may be based on const or literal values might lead to unexpected behavior, using Config properties or schema parameters would probably be preferable.

A union field could have a discriminator parameter on its schema object to indicate which field to match against, which would then have to be const/literal values on each of the variants. Each literal value could then be rendered as a separate key/value pair in the mapping dictionary, which would raise an error in case of collision.

This would leave the default case (first key in the discriminator mapping is considered as the default option is nothing matches, IIRC) undefined, though...

I'll see if I can try coming up with something that can be easily validated against while not being too cumbersome to work with.

sm-Fifteen on 27 Jun 2019

I've taken another look at the OpenAPI spec on this and tried a number of potential syntaxes.

The Literal syntax for Python 3.8 combined with an inheritance model flows really well. Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only. I'm not entirely sure, however, of how well mypy will tolerate shadowing with narrower types like that.

from pydantic import BaseModel, Schema
from typing import Literal

class Pet(BaseModel):
    petType: str = Schema(...)

    # One or the other
    __discriminator__ = 'petType'

    class Config:
        discriminator = 'petType'

class Cat(Pet):
    petType: Literal['cat', 'robot_cat'] # Should render as a string enum
    name: str = Schema(...)

class Dog(Pet):
    petType: Literal['dog', 'robot_dog']
    bark: str = Schema(...)

There would however be the problem of dealing with versions of Python without support for Literal, which could be accomplished with const as you've suggested, but this would mean having no more than one matched value per type. "Mapping keys MUST be string values, but tooling MAY convert response values to strings for comparison." would indicate that the discriminator property has to be of type string and that the values for subclasses can be validated as enums (of strings) as well.

The OpenAPI spec also mentions that the schema names can act as implicit discriminators, whether or not an explicit mapping is present (so 'Dog' and 'Cat' could technically be valid if there is no other constraint), however I don't really believe supporting that use case is of much concern.

sm-Fifteen on 30 Jun 2019

Literal is supported in python 3.7 (and 3.6 if I recall correctly) by importing from the package typing_extensions.

dmontagu on 30 Jun 2019

Ah, well, that's sure to make things a lot simpler, then.

That leaves the question of whether to use model config or __dunder__ attributes to define the discriminator property. Pydantic has some examples for both styles and I can't find any info on when one should be prefered over the other in style guidelines.

EDIT: All __dunder__ attributes and methods are considered reserved by the interpreter and can break without warning, so adding more such attributes may not be such a good idea after all.

I might try filing a PR for this in the coming days if I have time.

sm-Fifteen on 2 Jul 2019

I will propose a PR to include additional schema (JSON Schema) data for models.

This will allow you to create the validation required using, e.g. Literal or the Generics functionality @dmontagu added. Then you can do the validation in your model using standard Pydantic validators.

And then you can describe it in the extra schema data (as JSON Schema/OpenAPI schema) as an OpenAPI discriminator, etc. purely for documentation.

I think adding discriminator support directly to Pydantic wouldn't be convenient, as the discriminator ideas are very specific to OpenAPI, are a bit constrained (not that generalizable), and have conflicts with some ideas in JSON Schema (if, else, etc).

But combining these things I described above (with the extra schema I'll PR) you should be able to achieve what you need, with the OpenAPI documentation you expect @sm-Fifteen .

tiangolo on 13 Jul 2019

@sm-Fifteen I think you can now perform the validation as you need in a validator and generate the schema the way you want it using schema_extra. Could you check if that solves your use case?

tiangolo on 7 Aug 2019

Thinking about this more (and running into a similar problem myself), I think some kind of discriminator field is required:

To speed up parsing so that validation only needs to be attempted against one model
To make the error message less verbose - currently if you try to validate against multiple models, all but one are likely to have multiple errors. All those errors get added to the error output making it extremely verbose.

Personally I think this should be done by adding a discriminator argument to Field / Schema (#577) rather than creating a custom Union type that will never play well with mypy or IDEs.

samuelcolvin on 11 Aug 2019

👍13

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

samuelcolvin on 11 Aug 2019

👍14

I think adding discriminator support directly to Pydantic wouldn't be convenient, as the discriminator ideas are very specific to OpenAPI, are a bit constrained (not that generalizable), and have conflicts with some ideas in JSON Schema (if, else, etc).

@tiangolo: I don't know if we could have a solution that would work for both OpenAPI and JSONSchema without losing the benefits of mypy validation. I don't even know if JSON Schema's fully conditional validation system can cleanly be mapped to a type system at all. Being able to specify fields unknown to Pydantic in your generated schema is nice, but discriminators affect validation logic and not being able to get mypy to tell appart the subtypes would be unfortunate.

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

@samuelcolvin: I like the idea and proposed syntax, though I see the "running into a similar problem" issue was closed after you posted your reply, so I'm not sure if you still thing the addition is warranted?

sm-Fifteen on 19 Aug 2019

I still definitely want descriminator argument to Field.

samuelcolvin on 19 Aug 2019

but it might need to be a function or a field name.

samuelcolvin on 19 Aug 2019

maybe determinant would be a better name than descriminator? Docs. should probably talk about both at least.

(I've just spent some time looking for this issue as I looked for "determinant" rather than "descriminator")

samuelcolvin on 27 Aug 2019

maybe determinant would be a better name than descriminator? Docs. should probably talk about both at least.

(I've just spent some time looking for this issue as I looked for "determinant" rather than "descriminator")

Considering this would mainly be there to map with OpenAPI's discriminator field, I figure it would make more sense to call it that, unless you're maybe trying to figure out to make it work the same way with JSON Schema's model?

Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only.

I figured I should restate that part, since it would probably affect the resulting API design.

sm-Fifteen on 1 Sep 2019

👍2

My interest in using descriminators is not related to openAPI or JSONSchema, I want a way of using Unions, where:

Avoid having to run validation for all union types where it's trivial to work out what type the object should have
Avoid the massively long error messages that currently result from failed validation against unions with many options.

I'm fine with descriminator as a name, although I think actually that determinant would be a more appropriate name. As long as both are used in the docs.

Considering discriminator objects are only allowed as fields of non-inlined schema objects in OpenAPI, there should be no issue with allowing them as BaseModel properties only.

Can't say I entirely understand what this means, but sounds good to me.

samuelcolvin on 1 Sep 2019

👍13 👀4

Nevermind that, I thought the spec essentially said something along the lines of "Only schemas that are in components/schemas can have a discriminator field.", which would have required to have an explicit superclass for all of those Unions.

The spec actually says something more like "The discriminator can only map __towards__ schemas that have IDs (i.e: the subclasses must appear in components/schemas).", which makes a lot more sense considering how the discriminator mapping can only contain references.

MyResponseType:
  discriminator:
    propertyName: petType
    mapping:
      # Notice how there's no $ref, it's just a direct reference to the target type
      dog: '#/components/schemas/Dog'
      monster: 'https://gigantic-server.com/schemas/Monster/schema.json'
  oneOf:
  - $ref: '#/components/schemas/Cat'
  - $ref: '#/components/schemas/Dog'
  - $ref: '#/components/schemas/Lizard'
  - $ref: 'https://gigantic-server.com/schemas/Monster/schema.json'

sm-Fifteen on 7 Sep 2019

This feature is highly desired in my teams implementation :)

ashears on 4 Oct 2019

👍10

I don't get how schema_extra can solve this problem. Is there any workaround that I can override the BaseModel.parse_raw behavior?

Congee on 20 Dec 2019

I found a workaround https://github.com/samuelcolvin/pydantic/issues/854

Congee on 20 Dec 2019

So Usage would be something like

class Foo(BaseModel):
    model_type: Literal['foo']

class Bar(BaseModel):
    model_type: Literal['bar']

class MyModel(BaseModel):
    foobar: Union[Foo, Bar] = Field(..., descriminator='model_type')

How will this work with inheritance? Or will it?

For example, this would be awesome to achieve:

class Vehicle(BaseModel):
    vehicle_type: str
    make: str

    __determinant__ = vehicle_type

class Car(Vehicle):
    vehicle_type: Literal["car"]
    convertible: bool

class Motorcycle(Vehicle):
    vehicle_type: Literal["motorcycle"]
    chain_length: float

And then you could pass {"vehicle_type": "motorcycle", "chain_length": 54.54, "make": "Honda"} to Vehicle and get back a Motorcycle

This is similar to how polymorphism works in SQLAlchemy.

I don't know the proper type for vehicle_type in Vehicle because it should be extendable by further models without knowing them ahead of time.

f0ff886f on 15 Jan 2020

👍5

@f0ff886f I think better would be to have something like DiscriminatedUnion[Foo, Bar] as a type that has the same typing semantics as Union, except that pydantic knows to look for __discriminator__ (or similar) in the model Config to determine the name of the discrimination key.

Then you could just assign a type alias like FooBar = DiscriminatedUnion[Foo, Bar] and use that as the type annotation anywhere you want to allow either of those types.

I think this is preferable because a discriminated union may be desirable in circumstances where you want to accept unrelated models and don't want to enforce any inheritance requirements.

Moreover, the inheritance-based approach would require you to use some funny metaclass/__init_subclass__ business to automatically register the subclasses that is likely to add cognitive burden to newcomers, not to mention an on-going maintenance burden to pydantic itself.

I also think it would make things substantially more fragile in the future for an API if you wanted to change things to allow only a specific subset of the inheriting subclasses for different endpoints.

dmontagu on 16 Jan 2020

❤1

Fair points, and yes, the DiscriminatedUnion would also work well. I wanted to avoid the need to manually keep some Union (Discriminated or not) updated with a list of types that are acceptable, versus saying that "I want to accept all Vehicles", but, that is a rather minor nitpick.

f0ff886f on 16 Jan 2020

👍2

PEP 593 arriving in 3.9 might be of some interest here, this example emulating a tagged union in particular. I see that PEP 593 was mentioned above, though it has since been accepted for inclusion.

layday on 17 Jan 2020

👍1

I would love to see a TaggedUnion type in python as I find the equivalent very ergonomic in Swift and Rust. Without a language-level switch statement I'm not sure it would feel as clean, but I'd still be interested.

And it seems that pydantic support for such a type would not only address @f0ff886f 's use case, but would also enable the more general case through dynamic generation (which could be done, e.g., by DiscriminatedUnion.__class_getitem__).

I'm not sure I love the implementation described in PEP 593, but I'm curious about other people's thoughts.

dmontagu on 20 Jan 2020

There seems to be ongoing discussions regarding the possible deprecation and eventual removal of discriminator from the OpenAPI spec maintainers. See OAI/OpenAPI-Specification#2143.

sm-Fifteen on 26 Feb 2020

@sm-Fifteen Thanks for sharing this. I'm having a hard time parsing the conversation in that issue, but from what I gather the point is that they view the discriminator field as superfluous since even without explicitly specifying a discriminator field, you can ensure that you'll only pass validation for a single subtype (even if the way you accomplish it is essentially equivalent to using a discriminator field)?

If that's the case, it makes sense, though I think it would be simpler to write (performant) tooling if a discriminator field is specified (though basically no one seems to have done so so far..).

Regardless, I think it might still make sense to handle this via some sort of special case in pydantic to ensure high-performance validation and clear validation error messages.

In the absence of a discriminator keyword, I think the current approach probably is the most appropriate way to perform validation in general since one way not explicitly intend to use literal fields as a discriminator. But for large tagged unions, pydantic's current approach could add enormous overhead since it would try each variant in sequence (and generate error information for each). Not to mention the ugliness of error messages.

Curious whether people would still want a DiscriminatedUnion type if the discriminator keyword is dropped from the OpenAPI spec.

Another (related) option would be the following: When building any Union field, see if all of the members happen to have the same-labeled Literal-valued field with different specific literal values for each member of the union. If so, modify the validation logic to first look for and validate that field, then validate only against the specific model valid for that literal value.

This would be substantially easier to use from a user perspective, and would be more performant. And I don't think it would be very hard to implement, but it may come with a small amount of additional field-creation-time overhead for unions (though this is usually not important for runtime performance since it typically only runs a constant number of times during start up). Also it may seem somewhat surprising or ad-hoc if the discriminator pattern wasn't your goal (though it's hard for me to imagine a scenario where you wouldn't want to handle a union type with a discriminator field for each type in this way).

I could imagine a long tail of requests related to extending the set of circumstances where a discriminator can be used, which would be annoying to handle (e.g., Union[Type1, Type2, List[Union[Type1, Type2]]]). I'd argue that anything complex should be out of scope, but that wouldn't stop the requests from being made.

I'm not sure whether this change should be considered breaking -- the same set of things would fail validation, but the errors could change substantially. Probably best to wait for v2 if we wanted to go down a route like this. You could prevent the breaking change by using a new type (e.g., DiscriminatedUnion), but I think there would be substantial discoverability/ease-of-use benefits if we just used this behavior by default for all Unions.

Curious about @samuelcolvin 's opinion here.

dmontagu on 26 Feb 2020

❤1 👍1

@dmontagu

Curious whether people would still want a DiscriminatedUnion type if the discriminator keyword is dropped from the OpenAPI spec.

Yes, for my teams implementation, not having this was the biggest downside of using pydantic.

ashears on 26 Feb 2020

👍5

Thanks for sharing this. I'm having a hard time parsing the conversation in that issue, but from what I gather the point is that they view the discriminator field as superfluous since even without explicitly specifying a discriminator field, you can ensure that you'll only pass validation for a single subtype (even if the way you accomplish it is essentially equivalent to using a discriminator field)?

The discriminator field is definitely an odd one, and I agree with some of the oddities they highlight. For instance, given the schema I used above:

MyResponseType:
  discriminator:
    propertyName: petType
    mapping:
      # Notice how there's no $ref, it's just a direct reference to the target type
      dog: '#/components/schemas/Dog'
      monster: 'https://gigantic-server.com/schemas/Monster/schema.json'
  oneOf:
  - $ref: '#/components/schemas/Cat'
  - $ref: '#/components/schemas/Dog'
  - $ref: '#/components/schemas/Lizard'
  - $ref: 'https://gigantic-server.com/schemas/Monster/schema.json'

The valid discriminator values for petType are dog, `Dog, monster, Monster, Cat and Lizard because the name of each possible type implicitly counts as a valid discriminator value whether or not they are part of the mapping. From reading the discussion, I understand that the maintainer's point of view on discriminator is that it's not only poorly specced, but would also be redundant with that more recent iterations of JSON Sheema can offer.

However, I believe Pydantic follows a different paradigm from how they assume OpenAPI to be used, since OpenAPI is usually thought of more as a schema-first format (written by hand and then used to generate client and possibly server code stubs). Pydantic and FastAPI do something pretty unique where they dynamically generate OpenAPI definitions and validators based on type signatures, which I understand to be pretty tricky to pull off in any other language I can think of. Making type information available the way PEP 3107 and PEP 526 annotations do (without having to resort to bespoke static analysis, that is) is not really something I've seen anywhere else (though I just looked it up and there's apparently this TSOA thing for Typescript), and is the thing that's making all of this possible, so I figure that's not exactly a common use case for OpenAPI.

sm-Fifteen on 27 Feb 2020

Another (related) option would be the following: When building _any_ Union field, see if all of the members happen to have the same-labeled Literal-valued field with different specific literal values for each member of the union. If so, modify the validation logic to first look for and validate that field, then validate only against the specific model valid for that literal value.

This would be substantially easier to use from a user perspective, and would be more performant. And I don't think it would be very hard to implement, but it may come with a small amount of additional field-creation-time overhead for unions (though this is usually not important for runtime performance since it typically only runs a constant number of times during start up). Also it may seem somewhat surprising or ad-hoc if the discriminator pattern wasn't your goal (though it's hard for me to imagine a scenario where you _wouldn't_ want to handle a union type with a discriminator field for each type in this way).

@dmontagu My guess is that using Literal as a discriminator is quite a common case, and making pydantic more performant just by handling the basic case of flat a Union structure would be a very nice start

(And hopefully it will also solve the issue I'm facing with hard to read error messages in FastAPI request validation errors 😁)

jmagnusson on 6 Mar 2020

Another (related) option would be the following: When building _any_ Union field, see if all of the members happen to have the same-labeled Literal-valued field with different specific literal values for each member of the union. If so, modify the validation logic to first look for and validate that field, then validate only against the specific model valid for that literal value.

(forgot to address that part initially)

Using literal fields to generate discriminators would technically be incorrect, since it would just be "promoting" simple union validation to discriminators. Part of the discussion on the OpenAPi spec repo is that technically, you can just have literal or enum matching on each member of your union (like what you're proposing) and it essentially acheives the same goal as what discriminators do. The advantage of discriminators is that it specifically marks one specific field as your "pivot" of sorts, in a way that really makes this intention clear that this is a proper tagged union, not just a union to validate against a bunch of models, hoping the input data is only going to match one.

sm-Fifteen on 6 Mar 2020

👍2

(Edit: The first version of this comment was extensively wrong-headed; I've edited it heavily since. I had a hand-wavey idea of what would be nice; but it was too hyper-focused on my very specific use case, and was almost certainly too confusing to be illustrative.)

As a random anecdote from a new user, I'm currently trying to parse an API I don't control the design of, and it makes extensive use of tagged/discriminated unions. It does not use or care about OpenAPI/jsonSchema.

It has a global "Message" format with a "name" (the discriminator/tag) and a "value" field. There are at at least 200 names!

Creating models for each type individually and then creating a Union of all 200+ types is going to be difficult to keep updated and synchronized. Whenever one arm in the Union is incorrect, it becomes increasingly impossible to diagnose failures.

The API I am working my way backwards to writing a Schema for makes extensive use of Messages that take this format:

{
    "name": "Hello",
    "value": { "data": "World" }
}

So I have been using Pydantic in the following way:

class MessageName(str, Enum):
    # Enum class for all valid names: Useful for iterating and dispatch tables in the client
    hello = "Hello"

class BaseValue(BaseModel):
    # Empty class for all 'value' field types to inherit from.
    pass

class BaseMessage(BaseModel):
    # Abstract type describing all possible Messages
    name: MessageName
    value: BaseValue

class HelloValue(BaseValue):
    # Concrete type: This is the format of `value` for `Hello` messages
    data: Literal['World']

class HelloMessage(BaseMessage):
    # Concrete type: This is the format of a `Hello` message.
    name: Literal[MessageName.hello]
    value: HelloValue

Message = Union[HelloMessage, ...]

This has been working reasonably well, but with some shortcomings:

I have to repeat the name, value keys a lot in an awful lot of places
For every type I add, I have to amend the Union. When you approach 200+ names, this actually becomes non-trivial.
The BaseMessage type isn't actually useful for parsing; it's only a descriptive type that I can use in my code to accept "Any Message" and then do dynamic dispatch against.
The BaseValue class is almost entirely useless, except as a common base type for the dispatch table when handling generic Messages.
If I want to parse these as a root object (which does come up during debugging, especially when the failure messages for such a large tagged Union are so hard to read), I need to create a further model that is only useful as a root type:

class MessageDebugRoot(BaseModel):
    __root__: Message

It'd be nice to address at least some of these difficulties:

Being able to parse the BaseMessage and have it parse the correct subtype based on the discriminator
Limiting error messages to only the relevant branch
Allow me the option to imply the Union type so I don't have to keep it synchronized manually

samuelcolvin's design looks like it has two difficulties:

It confines tagged unions to a field, on which we discriminate in the child. I think that means for my case I would need to use a special __root__ class which discriminates at the upper level, but then I can't use this type as a field later. I need to re-declare the discriminator in the parent everywhere I want to do that discrimination.
It still requires me to keep the Union manually updated, which I would really like to avoid.

I really like f0ff886f's design (It matches some of my intuition about what would be helpful). It's a good brick-by-brick way to dynamically build a tagged union. I like the implication that I can use it both as a field and as a root model. I think it fits well with client code that will want to do dynamic dispatch based on subtype.

dmontagu's seems quite nice for mixing and matching possible allowed branches in different contexts, which is pretty flexible. Consider a large pool of messages that have different overlapping subsets of which branches you want to allow in different contexts: this design works well here for that case. It also doesn't require the use of a shared base type/class, which might allow some extra flexibility in refactoring if you don't need these values to share a common type. (I think I do, though, but...) My strongest nitpicks are having to manage your own Union, and (presumably) having to declare your own __root__ model to parse these directly from file.

So, well, long story short -- +1 for a tagged union of one kind or another. Whatever form it winds up taking syntactically, it's going to be an immense help for parsing data already logistically organized like this. The reduction in error messages is vital for debugging failures, and for extremely large branches, it's going to be a massive improvement in speed as well.

jnsnow on 11 Mar 2020

but it might need to be a function or a field name.

@samuelcolvin A function would be the most flexible in handling the complex cases, with the added value of transfering the task of discriminating between complex models to users who are goofing around with said complex models.

One use case I've encountered:

from typing import Literal, Type, Union
from pydantic import BaseModel, Field

class DomainA(BaseModel):
    domain: Literal["A"]

class DomainB(BaseModel):
    domain: Literal["B"]

class FoofromdomainA(A):
    identifier: Literal["foo"]

class BarfromdomainA(A):
    identifier: Literal["bar"]

class FoofromdomainB(B):
    identifier: Literal["foo"]

class BarfromdomainB(B):
    identifier: Literal["bar"]

class Goober(BaseModel):
    model: Union[FoofromdomainA, BarfromdomainA, FoofromdomainB, BarfromdomainB]

In this case it is not just a single property need for discrimination/determination of Goober.model, but the domain/identifier pair. The function could look something like this:

def discriminator(obj: Any)-> Union[
    Type[FoofromdomainA],
    Type[BarfromdomainA], 
    Type[FoofromdomainB],
    Type[BarfromdomainB],
]:
    mapper = {
        "A": {"foo": FoofromdomainA, "bar": BarfromdomainA}, 
        "B": {"foo": FoofromdomainB, "bar": BarfromdomainB},
    }
    return mapper[obj['domain']][obj['identifier']]


class Goober(BaseModel):
    model: Union[
        FoofromdomainA, BarfromdomainA, FoofromdomainB, BarfromdomainB
    ] = Field(..., discriminator=discriminator)

The signature would be receiving the object that needs validation as an argument and it would return the correct type/model to apply the validation. If pydantic receives an exception or None, it proceeds with the current behaviour. Otherwise it applies validation using only the returned type/model.

Cons:

users have another vector to introduce unexpected behaviour
discriminator is out-of-schema (similar to a model validator)

Pros:

advanced users can program advanced behaviour, especially:
- improved performance
- better errors
independent of future changes to OpenAPI
Uses the Union type, rather than introducing a new type

vdwees on 24 Apr 2020

For the specific case I mentioned above, you could also solve it with discriminator accepting both a string (as suggested above) or a tuple of strings corresponding to the attributes that enable unique identification of the model. Something like this:

class Goober(BaseModel):
    model: Union[
        FoofromdomainA, BarfromdomainA, FoofromdomainB, BarfromdomainB
    ] = Field(..., discriminator=("domain", "identifier"))

vdwees on 24 Apr 2020

Seems some of the solutions can also solve this https://github.com/samuelcolvin/pydantic/issues/1439

if the type hinting for array is Dict[Any] with a dynamically constructed discriminator (frozen before the first execution).

jqqqqqqqqqq on 7 May 2020

👀1

here is a decent but not perfect workaround in current pydantic version:

from pydantic import BaseModel, Field, validator

from typing import Union, List
from typing_extensions import Literal

class T1(BaseModel):
    v: Literal['t1']
    name: str

class T2(BaseModel):
    v: Literal['t2']
    name: str

class T3(BaseModel):
    v: Literal['t3']
    name: str

class U(BaseModel):
    t: Union[T1, T2, T3] = Field(...)

    @validator('t', pre=True)
    def validate_t(cls, value):
        if isinstance(value, BaseModel):
            return value
        if not isinstance(value , dict):
            raise ValueError('value must be dict')
        v = value.get('v')
        if v == 't1':
            return T1(**value)
        elif v == 't2':
            return T2(**value)
        elif v == 't3':
            return T3(**value)
        else:
            raise ValueError('Unkonwn v %s' % v)

m = U(**{
    't': {
        'v': 't1',
        'name': 'test1'
    }
})

print(m)

m = U(**{
    't': {
        'v': 't2',
        'name': 'test2'
    }
})

print(m)

m = U(**{
    't': {
        'v': 't3',
        'name': 'test3'
    }
})

print(m)

# field required (type=value_error.missing)
m = U(**{
    't': {
        'v': 't3'
    }
})

print(m)

ghostbody on 29 May 2020

👍4

I wonder if a "short path" based on the presence of Literal fields in a Union would be a good start? What I have in mind is that Pydantic would inspect Unions, and if the following criteria are met:

all members of the Union are subclasses of BaseModel, and
all members of the Union share one or more fields of type Literal at root level, and
the shared Literal fields alone are sufficient to differentiate the members of the Union,

Pydantic will validate the input based on only the matching member of the Union. If the input does not match any models, fall back to existing behaviour (or maybe even limit the validation error to the shared Literal fields, although this is a slight API change)

This extra check would cover most of the use cases described above, would not extend/change the existing API, would improve performance in many cases, and would clean up the validation error output.

vdwees on 21 Jun 2020

👍7

+1 Would be amazing to see this addition to PyDantic.

abaveja313 on 22 Jun 2020

Regarding the syntax possibilities, I just checked and the typing.Annotated syntax from PEP 593 could be an option now that python/cpython#18260 and python/mypy#8371 have been merged for in the Python 3.9 standard lib and the mypy typing backports, respectively.

sm-Fifteen on 23 Jun 2020

👍4

@samuelcolvin would you accept a PR for the solution outlined in https://github.com/samuelcolvin/pydantic/issues/619#issuecomment-647135829?

vdwees on 25 Jun 2020

+1 this would be amazing and extremely helpful. Happy to help on a PR as well.

dgasmith on 23 Jul 2020

+1 this is so needed, is there anything I can help with?

Syniex on 2 Sep 2020

Hello, I am a total newbie to pydantic, but we have a need for this too! (in fact, I came here hopefully from dataclasses-json, which also doesn't support polymorphism). The Union and Literal approach works OK and is very straightforward to understand but as others have pointed out, it results in an awful lot of noise when validation fails...

@vdwees I like your solution, but my only concern is that it is trying to "guess" what the user wants. Explicit is better than implicit... do you think it would make sense to introduce a boolean flag somewhere so that the user can turn this behavior on explicitly? We then wouldn't need to write a bunch of complex logic to check for 1. 2. and 3., because it would be up to the user to make sure that those criteria are met.

harrybiddle on 25 Sep 2020

hello, for anyone interested, until this feature is implemented, you can use a custom data type that returns the appropriate type in the validate classmethod, it's hacky but it does the job atm.

kontsaki on 21 Oct 2020

@kontsaki could you provide a small working example by chance?

dgasmith on 21 Oct 2020

@dgasmith

class ActionModel(BaseModel):
    class Config:
        fields = {"action": dict(const=True)}
        extra = "forbid"

class Something(ActionModel):
    action = "something"

ACTIONS = {
    "something": Something,
}

class Action:
    @classmethod
    def __get_validators__(cls):
        yield cls.return_action

    @classmethod
    def return_action(cls, values):
        try:
            action = values["action"]
        except KeyError:
            raise MalformedAction(
                f"Missing required 'action' field for action: {values}"
            )
        try:
            return ACTIONS[action](**values)
        except KeyError:
            raise MalformedAction(f"Incorrect action: {action}")

 class Flow(BaseModel):
    actions: List[Action]

kontsaki on 21 Oct 2020

🚀3

Was this page helpful?

0 / 5 - 0 ratings