Pydantic: Is it possible to change json schema generation logic for Enum?

Created on 3 Jul 2019  Â·  18Comments  Â·  Source: samuelcolvin/pydantic


Question

For bugs/questions:

  • OS: Mac OS X
  • Python version import sys; print(sys.version): 3.7.3
  • Pydantic version import pydantic; print(pydantic.VERSION): 0.29

When generating JSON schema for Enum, pydantic uses Enum.value.
https://github.com/samuelcolvin/pydantic/blob/010ba38dc197f0c37adc32c9a74029cef061a72a/pydantic/schema.py#L742
I use this JSON schema for Swagger for my JSON API. The problem is, that, in my opinion, api should expect enum.name and not value (value should be used only internally, like, for example, when we write enum.value to database). I can add some custom validations for model, so when I pass enum.name (when I create instance of pydantic model), it won't raise any validation errors, but still there is an issue with JSON schema generation.

Feedback Wanted Schema feature request help wanted

Most helpful comment

There are already all necessary features inside pydantic to do such biderectional convertion from public representation to db representation and vice versa.

Let consider following example.

1. Versions

* OS: **Ubuntu 19.04**
* Python version `import sys; print(sys.version)`: **3.7.3 (default, Oct  7 2019, 12:56:13) [GCC 8.3.0]**
* Pydantic version `import pydantic; print(pydantic.VERSION)`: **1.2**

2. Consider, we want to get in JSON key "status" with "deactivated" or "activated" values, but store this values in DB as False or True in column status.

3. Script

from dataclasses import dataclass
from enum import (
    Enum,
)

from pydantic import (
    BaseModel,
    ValidationError,
)

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base


@dataclass
class Status:
    deactivated: bool = False
    activated: bool = True


class EnumLookupByKeyMixin:
    @classmethod
    def __get_validators__(cls):
        cls.lookup = {v: k.value for v, k in cls.__members__.items()}
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return cls.lookup[v]
        except KeyError:
            raise ValidationError('Invalid value')


class EnumLookupReturnNameMixin:
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return cls(v).name
        except KeyError:
            raise ValidationError('Invalid value')


StatusToDBEnum = Enum("StatusToDBEnum", type=EnumLookupByKeyMixin, names=Status().__dict__)


StatusFromDBEnum = Enum("StatusFromDBEnum", type=EnumLookupReturnNameMixin, names=Status().__dict__)


class TestSchemaToDB(BaseModel):
    status: StatusToDBEnum


class TestSchemaFromDB(BaseModel):
    status: StatusFromDBEnum

    class Config:
        orm_mode = True


DeclarativeBase = declarative_base()


class TestModel(DeclarativeBase):
    __tablename__ = "test_model"

    id = sa.Column('rid', sa.Integer(), primary_key=True)
    status = sa.Column('status', sa.Boolean(), nullable=False)

    def __init__(self, id=None, status=None):
        self.id = id
        self.status = status

# Request
request_body_json = {"status": "activated"}

request = TestSchemaToDB(**request_body_json)

# Response
orm_object = TestModel(status=False)

response = TestSchemaFromDB.from_orm(orm_object)

4. Results

In [2]: request.status                                                                                                                                                                                                                      
Out[2]: True

In [3]: response.status                                                                                                                                                                                                                     
Out[3]: 'deactivated'

All 18 comments

I think it's too late to change the default behaviour of enums now, however I guess we could allow some way to modify the schema, perhaps on a per field basis.

@tiangolo what do you think?

@Yolley are you aware of a precedent for using name over value? In my opinion it feels more natural to use value rather than name for this purpose, for a few reasons:

First, using name instead of value prevents you from including certain characters in the externally-facing value (e.g., hyphens). While this is probably not an issue if you control the API, it could be a nuisance when interfacing with an external API that does include such characters.

Second, and more importantly, I would expect the name attribute to follow a naming style consistent with the python codebase. In particular, I would expect a multi-word name to use snake case for its name, but I typically would NOT expect / impose the use of snake case externally. Using name instead of value would limit this flexibility, and external APIs would impose a restriction on your naming conventions.

Obviously with some effort you can configure your way around these edge cases, but I would expect that using the name would get in the way more frequently than using the value.

@Yolley if your concern is that you don't want to have the name and value duplicated in your enum definitions, see https://docs.python.org/3/library/enum.html#using-automatic-values for a way to ensure values are automatically generated in a manner consistent with your naming. (If this were not possible, I would find the use of name more compelling, but I just subclass Enum with a modified _generate_next_value_ and inherit from that everywhere and haven't had issues.)

@Yolley are you aware of a precedent for using name over value? In my opinion it feels more natural to use value rather than name for this purpose, for a few reasons:

First, using name instead of value prevents you from including certain characters in the externally-facing value (e.g., hyphens). While this is probably not an issue if you control the API, it could be a nuisance when interfacing with an external API that does include such characters.

Second, and more importantly, I would expect the name attribute to follow a naming style consistent with the python codebase. In particular, I would expect a multi-word name to use snake case for its name, but I typically would NOT expect / impose the use of snake case externally. Using name instead of value would limit this flexibility, and external APIs would impose a restriction on your naming conventions.

Obviously with some effort you can configure your way around these edge cases, but I would expect that using the name would get in the way more frequently than using the value.

@Yolley if your concern is that you don't want to have the name and value duplicated in your enum definitions, see https://docs.python.org/3/library/enum.html#using-automatic-values for a way to ensure values are automatically generated in a manner consistent with your naming. (If this were not possible, I would find the use of name more compelling, but I just subclass Enum with a modified _generate_next_value_ and inherit from that everywhere and haven't had issues.)

In my case I have enum like this

class SomeStatus(IntEnum):
    SUCCESS = 0
    ERROR = 1
    NEUTRAL = 2

So, i store enum.value in my database (because it is better for performance of my db to store integers as statuses instead of strings), and when I return this status to user via JSON API, I would return him enum.name instead of value, because user doesn't know, what number 1 means, and also for user it would be more understandable to send "NEUTRAL" instead of 2 in change status request, in my opinion. So, as of now, I can configure my API to validate over enum names, but still json schema generation in pydantic returns only list of enum.value. The only way to fix this on my part, as I see it now, would be to edit enum.value property, so it would return enum._name_, and then add some custom property, which would return enum._value_, so I could use it, when I write to my database.

So, i store enum.value in my database (because it is better for performance of my db to store integers as statuses instead of strings)

Depending on what database you're using, you might be able to create an enum in the database, then you get the best of both worlds.

More generally I agree with you that this would be useful. The other thing missing at the moment from choices is human readable/translatable descriptions, so using gender as an example we might have

| DB value | code name | human readable description |
|----------|-----------|----------------------------|
| 1 | male | Male |
| 2 | female | Female |
| 3 | not_given | I'd rather not say |

(Don't know whether JSON schema has the option for human readable description?)

I think the best way to implement these things would be:

  • a use_names decorator for enums that tells pydantic to use the names not values
  • a method eg. get_description for getting description.
@use_names
class Genders(IntEnum):
    male = 1
    female = 2
    not_given = 3

    def get_description(self):
        return {
            1: _('Male'),
            2: _('Femail'),
            3: _("I'd rather not say"),
        }[self.value]

Thoughts?

So, i store enum.value in my database (because it is better for performance of my db to store integers as statuses instead of strings)

Depending on what database you're using, you might be able to create an enum in the database, then you get the best of both worlds.

More generally I agree with you that this would be useful. The other thing missing at the moment from choices is human readable/translatable descriptions, so using gender as an example we might have

DB value code name human readable description
1 male Male
2 female Female
3 not_given I'd rather not say
(Don't know whether JSON schema has the option for human readable description?)

I think the best way to implement these things would be:

  • a use_names decorator for enums that tells pydantic to use the names not values
  • a method eg. get_description for getting description.
@use_names
class Genders(IntEnum):
    male = 1
    female = 2
    not_given = 3

    def get_description(self):
        return {
            1: _('Male'),
            2: _('Femail'),
            3: _("I'd rather not say"),
        }[self.value]

Thoughts?

About description there is, in my opinion, pretty nice implementation, when we add description to __doc__ (https://stackoverflow.com/a/50473952/6051871) like so

class DocEnum(Enum):
    def __new__(cls, value, doc=None):
        self = object.__new__(cls)  # calling super().__new__(value) here would fail
        self._value_ = value
        if doc is not None:
            self.__doc__ = doc
        return self

class Gender(DocEnum):
    """ Some colors """
    MALE   = 1, "Male"
    FEMALE = 2, "Female"
    NOT_GIVEN  = 3, "I'd rather not say"

It may not be the best way, because we edit magic method here, but still I like the idea of using __doc__ attribute of enum rather than write some custom function, where we need to add additional mapping.

use_names decorator I guess would be okay, I just can't think of a better way to implement this.

Thanks for your patience. Yes, JSON Schema only declares a way to define enum values.

I agree with @dmontagu's rationale to have values instead of names as the enums...


But you can extend the JSON Schema with custom data/metadata.

For example, these React widgets from Mozilla: https://github.com/mozilla-services/react-jsonschema-form support the non-standard enumNames as a list of the names to show for the enum.

Then you can write your model like:

class SomeStatus(IntEnum):
    SUCCESS = 0
    ERROR = 1
    NEUTRAL = 2


class SomeModel(BaseModel):
    status: SomeStatus = Schema(
        ..., title="Status", enumNames=["Success", "Error", "Neutral"]
    )

Also, I'll make a PR proposing to allow adding custom extra JSON schema to a model. That would allow you to do that or similar things, extending the generated JSON Schema from the models.

BTW, the PR with overrides for the schema is already merged and released, find it in the schema section with the string schema_extra.

What about a schema_extra (along with support for schema_extra being a callable) being a kwarg for Field?

see #1065, but yes that might be possible too.

fixe by #1054, #1125 and #1065

Hi @Yolley , was this ever solved? I am unable to get this to work. I am dealing with an API that allows you to put in the Name of the Enumeration field as well as the value. See below.

import json
from aenum import Enum
from pydantic import BaseModel, Field, validator, ValidationError

class Gender(Enum):
    """ Some colors """
    _init_ = 'value __doc__'
    MALE   = 1, 'Male'
    FEMALE = 2, 'Female'
    NOT_GIVEN  = 3, 'I\'d rather not say'

class Stop(BaseModel):
    gender: Gender = Field(..., title="Gender", enumNames=["Male", "Female", "I\'d rather not say"])
    stop_sequence: int = None

try:
    print(Stop.schema_json(indent=2))
    print(f'{Gender.MALE.value}')
    stop = Stop(gender=Gender.MALE, stopSequence=0)
    stop_ret = Stop.parse_raw(json.dumps({'gender': 'Male'}))
except ValidationError as e:
    print(e)

If you run this code you will get a validation error:

1 validation error for Stop
gender
  value is not a valid enumeration member; permitted: 1, 2, 3 (type=type_error.enum; enum_values=[<Gender.MALE: 1>, <Gender.FEMALE: 2>, <Gender.NOT_GIVEN: 3>])

The API I am using this for supports the enumeration value and key name as valid inputs but I don't see a way to make this work except to put Alias like the below everywhere:

class Gender(Enum):
    """ Some colors """
    _init_ = 'value __doc__'
    MALE   = 1, 'Male'
    FEMALE = 2, 'Female'
    NOT_GIVEN  = 3, 'I\'d rather not say'
    MALE_Alias = 'Male'
    FEMALE_Alias = 'Female'

I'm not sure what the question is and the code isn't that easy to ready. Please could you correct formatting and clarify the question.

@samuelcolvin , I updated the question to make it more readable. Also, I have one additional question. I am working with an API that allows "null" types as well as integers. See the original field definition below, and then what I changed it to in order to allows "null" as well as integers. Is there a better way to do this?

class RoutingProfile(BaseModel):
    elevation_limit: int = None

class RoutingProfile(BaseModel):
    elevation_limit: Any

as described in the documentation, use

class Gender(Enum):
    MALE   = 'Male'
    FEMALE = 'Female'
    NOT_GIVEN  = 'I\'d rather not say'

Is there a reason that doesn't work for you?


Regarding allowing null, that's an unrelated question, but you can use Optional[int].

Samuelcolvin/Pydantic,
That doesn't work because the API also accepts integers (0, 1, 2) as valid
inputs as well.

On Mon, Jan 13, 2020 at 9:01 AM Samuel Colvin notifications@github.com
wrote:

as described in the documentation, use

class Gender(Enum):
MALE = 'Male'
FEMALE = 'Female'
NOT_GIVEN = 'I\'d rather not say'

Is there a reason that doesn't work for you?

Regarding allowing null, that's an unrelated question, but you can use
Optional[int].

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/samuelcolvin/pydantic/issues/637?email_source=notifications&email_token=AFBYANTVOCRRYL5Y7L2LOPDQ5R65DA5CNFSM4H5ET4H2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIZAZAY#issuecomment-573705347,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AFBYANVXJ4N265LD5MH4BDLQ5R65DANCNFSM4H5ET4HQ
.

--

Greg Price

CEO and Co-founder

515 Congress Ave. #2650, Austin, TX 78701

P 512 333 0898

C 512 775 5411

E [email protected] | W shipwell.com
https://www.facebook.com/justShipwell/
https://twitter.com/justShipwell
https://www.linkedin.com/in/gregorydprice/
https://www.instagram.com/justshipwell/

Which doesn't work? The enum? Then you could add a validator to your model to map ints to the gender values.

But that's no directly linked to this issue, you need to try and get help elsewhere, or thoroughly read the documentation.

There are already all necessary features inside pydantic to do such biderectional convertion from public representation to db representation and vice versa.

Let consider following example.

1. Versions

* OS: **Ubuntu 19.04**
* Python version `import sys; print(sys.version)`: **3.7.3 (default, Oct  7 2019, 12:56:13) [GCC 8.3.0]**
* Pydantic version `import pydantic; print(pydantic.VERSION)`: **1.2**

2. Consider, we want to get in JSON key "status" with "deactivated" or "activated" values, but store this values in DB as False or True in column status.

3. Script

from dataclasses import dataclass
from enum import (
    Enum,
)

from pydantic import (
    BaseModel,
    ValidationError,
)

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base


@dataclass
class Status:
    deactivated: bool = False
    activated: bool = True


class EnumLookupByKeyMixin:
    @classmethod
    def __get_validators__(cls):
        cls.lookup = {v: k.value for v, k in cls.__members__.items()}
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return cls.lookup[v]
        except KeyError:
            raise ValidationError('Invalid value')


class EnumLookupReturnNameMixin:
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            return cls(v).name
        except KeyError:
            raise ValidationError('Invalid value')


StatusToDBEnum = Enum("StatusToDBEnum", type=EnumLookupByKeyMixin, names=Status().__dict__)


StatusFromDBEnum = Enum("StatusFromDBEnum", type=EnumLookupReturnNameMixin, names=Status().__dict__)


class TestSchemaToDB(BaseModel):
    status: StatusToDBEnum


class TestSchemaFromDB(BaseModel):
    status: StatusFromDBEnum

    class Config:
        orm_mode = True


DeclarativeBase = declarative_base()


class TestModel(DeclarativeBase):
    __tablename__ = "test_model"

    id = sa.Column('rid', sa.Integer(), primary_key=True)
    status = sa.Column('status', sa.Boolean(), nullable=False)

    def __init__(self, id=None, status=None):
        self.id = id
        self.status = status

# Request
request_body_json = {"status": "activated"}

request = TestSchemaToDB(**request_body_json)

# Response
orm_object = TestModel(status=False)

response = TestSchemaFromDB.from_orm(orm_object)

4. Results

In [2]: request.status                                                                                                                                                                                                                      
Out[2]: True

In [3]: response.status                                                                                                                                                                                                                     
Out[3]: 'deactivated'

Our use case was a bit different. We want to entry JSON data by enum keys, while we want pydantic to return enum items, not keys or values. Hence, JSON schema should also propose enum keys. Reason is that enum values could be integers and this is not very user friendly while inspecting JSON files manually - enum key is much more descriptive and easy to understand.

Our solution: we have implemented a custom Enum base class and configuration context handler:

class JsonEnumCfg:
    """
    Temporary changes behavior of JsonEnumBase, so that pydantic JSON schema generator uses enum
    keys, not values in the generated schemas.
    Usage:
        with JsonEnumCfg():
            # create JSON schema
    """
    valueIsName: bool = False

    def __enter__(self):
        JsonEnumCfg.valueIsName = True

    def __exit__(self, etype, value, traceback):
        JsonEnumCfg.valueIsName = False


class JsonEnumBase(enum.Enum):
    """
    This class is used during parsing JSON data to make pydantic properly handle enum keys.
    Pydantic by default uses enum values, but we want to use keys for better portability between
    versions and JSON readability.
    Usage: enums that needs to be serialized/deserialized to/from JSON should be derived from
        this class.
    """

    def __getattribute__(self, item):
        # Calling the super class to avoid recursion
        if item == '_value_':
            if JsonEnumCfg.valueIsName:
                return super().__getattribute__("_name_")

        return super().__getattribute__(item)

    @ classmethod
    def __get_validators__(cls):
        yield cls.validate

    @ classmethod
    def validate(cls, value):
        return cls[value]

One can than generate schema:

with JsonEnumCfg():
    schema = model.schema_json(indent=2)

... or parse JSON file as it would normally do with parse_*

Although this solution works for us and covers our current use cases, it is a hack - we would much rather see this as a pydantic configuration option, a separate setting for schema creation and JSON file parsing. As soon as we decide to use IntEnum, we are again back at creating intermediate enum base classes.

Examples of proposed solutions:

Config:
    use_enum_keys = True

or

    schema = model.schema_json(indent=2, enum_as_keys=True)
    data = model.parse_file(filePath, enum_as_keys=True)

@samuelcolvin What do you think of this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rrbarbosa picture rrbarbosa  Â·  35Comments

samuelcolvin picture samuelcolvin  Â·  30Comments

dand-oss picture dand-oss  Â·  19Comments

cazgp picture cazgp  Â·  34Comments

demospace picture demospace  Â·  26Comments