Pydantic: [Feature Request] Auto-Docstring generation for Schema bearing Models

Created on 3 Jul 2019  路  8Comments  路  Source: samuelcolvin/pydantic


Feature Request

This request is to have Pydantic models auto-generate their doc strings for their parameters, reading the parameters' Schema objects for more information. The end result would be Model classes who's __doc__ provides details about the parameters the model has. This would have use for people who generate docs for their models though a program like Sphinx to auto create the more complete docstring.

This can, and probably should be an optional thing the user sets or calls since it will require overwriting the __doc__ variable.

This may turn out to be too dependent on individual user preferences of doc style flavors to have any viable officially supported flavor(s) in pydantic, but I wanted to propose anyways.

Below I have a crude toy implementation with examples to show the outputs. I have tested this in Python 3.6 and 3.7 with Pydantic Versions 0.26 and 0.29 and should run as with no external dependencies beyond pydantic itself)

Foreseeable difficulties:

  • Hard to catch every combination and case to ensure doc is formatted correctly
  • Very dependent on the internal pydantic representation structure (See code where I have to check for pre 0.28 and post 0.28)
  • Requires overwriting doc variable
  • Documentation style could be in debate (e.g. NumPy, Google, reStructuredText, Epytext, etc.)
  • Not sure how this would effect speed
  • How to handle model nesting
  • How to handle mixed documented through Schema and not variables

Known issues with toy implementation:

  • Very crude
  • Does not support mixed Schema vs. non-Schema parameters
  • Nesting pydantic models requires the nested model to have its own Description
  • Have not tested all combinations of parameters
  • Formats to NumPy Docstring style
  • Formats some things to Sphinx cross-reference style (e.g. nested models get cast to :class: TargetClass instead of any further docstring description which would in Sphinx's RST format as a link to that class in the docs, not exactly helpful in all cases though)
from enum import Enum
from textwrap import dedent, indent
from typing import Tuple, Dict

from pydantic import BaseModel, Schema, confloat, BaseSettings, validator, ValidationError

####################################
# Start of Auto-Doc Generation block
####################################


class _JsonRefModel(BaseModel):
    """
    Reference model for Json replacement fillers

    Matches style of:

    ``'allOf': [{'$ref': '#/definitions/something'}]}``

    and will always be a length 1 list
    """
    allOf: Tuple[Dict[str, str]]

    @validator("allOf", whole=True)
    def all_of_entries(cls, v):
        value = v[0]
        if len(value) != 1:
            raise ValueError("Dict must be of length 1")
        elif '$ref' not in value:
            raise ValueError("Dict needs to have key $ref")
        elif not isinstance(value["$ref"], str) or not value["$ref"].startswith('#/'):
            raise ValueError("$ref should be formatted as #/definitions/...")
        return v


def doc_formatter(target_object):
    """
    Set the docstring for a Pydantic object automatically based on the parameters

    This could use improvement.
    """
    doc = target_object.__doc__

    # Handle non-pydantic objects
    if doc is None:
        new_doc = ''
    elif 'Parameters\n' in doc or not (issubclass(target_object, BaseSettings) or issubclass(target_object, BaseModel)):
        new_doc = doc
    else:
        type_formatter = {'boolan': 'bool',
                          'string': 'str',
                          'integer': 'int',
                          'number': 'float'
                          }
        # Add the white space
        if not doc.endswith('\n\n'):
            doc += "\n\n"
        new_doc = dedent(doc) + "Parameters\n----------\n"
        target_schema = target_object.schema()
        # Go through each property
        for prop_name, prop in target_schema['properties'].items():
            # Catch lookups for other Pydantic objects
            if '$ref' in prop:
                # Pre 0.28 lookup
                lookup = prop['$ref'].split('/')[-1]
                prop = target_schema['definitions'][lookup]
            elif 'allOf' in prop:
                # Post 0.28 lookup
                try:
                    # Validation, we don't need output, just the object
                    _JsonRefModel(**prop)
                    lookup = prop['allOf'][0]['$ref'].split('/')[-1]
                    prop = target_schema['definitions'][lookup]
                except ValidationError:
                    # Doesn't conform, pass on
                    pass
            # Get common properties
            prop_type = prop["type"]
            new_doc += prop_name + " : "
            prop_desc = prop['description']

            # Check for enumeration
            if 'enum' in prop:
                new_doc += '{' + ', '.join(prop['enum']) + '}'

            # Set the name/type of object
            else:
                if prop_type == 'object':
                    prop_field = prop['title']
                else:
                    prop_field = prop_type
                new_doc += f'{type_formatter[prop_field] if prop_field in type_formatter else prop_field}'

            # Handle Classes so as not to re-copy pydantic descriptions
            if prop_type == 'object':
                if not ('required' in target_schema and prop_name in target_schema['required']):
                    new_doc += ", Optional"
                prop_desc = f":class:`{prop['title']}`"

            # Handle non-classes
            else:
                if 'default' in prop:
                    default = prop['default']
                    try:
                        # Get the explicit default value for enum classes
                        if issubclass(default, Enum):
                            default = default.value
                    except TypeError:
                        pass
                    new_doc += f", Default: {default}"
                elif not ('required' in target_schema and prop_name in target_schema['required']):
                    new_doc += ", Optional"

            # Finally, write the detailed doc string
            new_doc += "\n" + indent(prop_desc, "    ") + "\n"

    # Assign the new doc string
    target_object.__doc__ = new_doc

########################
# Start of Example block
########################


class FruitEnum(str, Enum):
    apple = "apple"
    orange = "orange"


class Taxes(BaseModel):
    """The State and Federal Taxes charged for operation"""
    state: float = 0.06
    federal: float = 0.08
    city: float = None


class FruitStandNoDoc(BaseModel):
    """
    My fruit stand that I sell various things from
    """
    fruit: FruitEnum = FruitEnum.apple
    stock: int
    price: confloat(ge=0) = 0.6
    advertising: str = None
    currently_open: bool = False
    taxes: Taxes = Taxes()


class FruitStand(BaseModel):
    """
    My fruit stand that I sell various things from
    """
    fruit: FruitEnum = Schema(
        FruitEnum.apple,
        description="The fruit which I have available at my stand"
    )
    stock: int = Schema(
        ...,
        description="How many of each fruit to keep on hand"
    )
    price: float = Schema(
        0.60,
        description="Price per piece of fruit",
        ge=0
    )
    advertising: str = Schema(
        None,
        description="Advertising message to display"
    )
    currently_open: bool = Schema(
        False,
        description="Is the fruit stand open or not?"
    )
    taxes: Taxes = Schema(
        Taxes(),
        description="Taxes charged by the state and local level"
    )


print(FruitStandNoDoc.__doc__)
print('-'*20)
print(FruitStand.__doc__)
print('-'*20)
doc_formatter(FruitStand)
print(FruitStand.__doc__)

Outputs the following lines:

    My fruit stand that I sell various things from

--------------------

    My fruit stand that I sell various things from

--------------------

My fruit stand that I sell various things from


Parameters
----------
fruit : {apple, orange}, Default: apple
    The fruit which I have available at my stand
stock : int
    How many of each fruit to keep on hand
price : float, Default: 0.6
    Price per piece of fruit
advertising : str, Optional
    Advertising message to display
currently_open : boolean, Default: False
    Is the fruit stand open or not?
taxes : Taxes, Optional
    :class:`Taxes`

Feedback Wanted Schema feature request

Most helpful comment

@dgasmith For what it's worth, you could implement it without modifying metaclass by making use of __init_subclass__; that would probably be preferable (at least if we followed a similar approach in pydantic), in order to prevent downstream metaclass conflicts.

So ProtoModel would become:

class ProtoModel(BaseModel):
    def __init_subclass__(cls) -> None:
        cls.__doc__ = AutoPydanticDocGenerator(cls, always_apply=True)

and you could drop the metaclass.

All 8 comments

I'm not opposed to it, my questions/feedback would be:

  • how useful would this actually be? I think I only ever look at docstings in code
  • how much will it slow things down? Big python applications can become noticeably slow to load, we wouldn't want to slow things down by default
  • could we add a util function to do create/set a docstring so it has to be called manually?
  • we could have a config parameter defaulting to no that could be no / if_missing / always

Before we add it, would anyone else want this?

I think this would be pretty popular as it would interface with canonical Sphinx documentation tech. Effectively auto-docs from the Schema so that you do not need to write this twice.

For speed, we could use the @property decorator so that the doc string would only be evaluated when called (often during docs generation or Jupyter notebooks).

I'm quite in favor of this, as I've been beginning to create docs with pydoc-markdown for an API client I'm writing using Pydantic for data validation/parsing/coersion and such, and having to re-write my docstrings, especially for inherited models, is a bit tedious. I would gladly switch all my definitions to Schema()/Field() defs, if I could get auto-generated docs for each attribute.

See an example of the autogenerated docs here. This is something that we would still be quite interested in getting into Pydantic. It is fairly straightforward to lazily generate through metaclasses here so that there is no runtime performance penalties.

@dgasmith For what it's worth, you could implement it without modifying metaclass by making use of __init_subclass__; that would probably be preferable (at least if we followed a similar approach in pydantic), in order to prevent downstream metaclass conflicts.

So ProtoModel would become:

class ProtoModel(BaseModel):
    def __init_subclass__(cls) -> None:
        cls.__doc__ = AutoPydanticDocGenerator(cls, always_apply=True)

and you could drop the metaclass.

@dmontagu Thanks! I was not aware of this.

@dmontagu 's comment was very helpful for finding a way to autogenerate my own documentation. I really like the __init_subclass__ solution. After quite a bit of experimentation I feel like this would be afeature that doesn't _have to_ be part of pydantic. Instead, what might be extremely useful is to simply add a little section to pydantic's documentation showing how a user could achieve this. A small example would suffice. If you decide this is a good approach I can submit a pull request.

Just a note, that environ-config library provides method generate_help, see https://environ-config.readthedocs.io/en/stable/tutorial.html#debugging (with implementation at https://github.com/hynek/environ-config/blob/0bd960a602878be39cdc24f2d52d2d767d5056a4/src/environ/_environ_config.py#L352 )

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jasonkuhrt picture jasonkuhrt  路  21Comments

maxrothman picture maxrothman  路  26Comments

jaheba picture jaheba  路  25Comments

koxudaxi picture koxudaxi  路  25Comments

marlonjan picture marlonjan  路  37Comments