Sphinx: Warning about 'optional' in Numpy-style docstrings with sphinx.ext.napoleon

Created on 25 Nov 2019  Â·  13Comments  Â·  Source: sphinx-doc/sphinx

Describe the bug

When using sphinx.ext.napoleon with Numpy-format docstrings and nitpicky mode, a warning is emitted about optional.

To Reproduce

Create an empty directory and add the following files:

  • conf.py:
import sys
sys.path.insert(0, '.')
extensions = ['sphinx.ext.autodoc',
              'sphinx.ext.napoleon',
              'sphinx.ext.intersphinx']
napoleon_google_docstring = False
intersphinx_mapping = {'python': ('https://docs.python.org/3.7', None)}
nitpicky = True
  • mycode.py:
def my_func(arg1, arg2):
    """
    Parameters
    ----------
    arg1 : str
        Description of `arg1`
    arg2 : int, optional
        Description of `arg2`, defaults to 0
    """
    pass
  • index.rst:
 .. autofunction:: mycode.my_func

Then run:

sphinx-build -b html -d _build/doctrees   . _build/html

The output includes a warning about the optional string which should not be interpreted as a type:

Running Sphinx v2.2.1
making output directory... done
loading intersphinx inventory from https://docs.python.org/3.7/objects.inv...
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] index                                                                                               
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index                                                                                                
/home/tom/tmp/debug/mycode.py:docstring of mycode.my_func:: WARNING: py:class reference target not found: optional
generating indices...  genindexdone
writing additional pages...  searchdone
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 1 warning.

The HTML pages are in _build/html.

Expected behavior

No warning should be emitted since adding , optional is valid by the numpy doc spec.

Environment info

  • OS: Ubuntu 19.04
  • Python version: 3.7.3
  • Sphinx version: 2.2.1
  • Sphinx extensions: sphinx.ext.napoleon, sphinx.ext.autodoc, sphinx.ext.intersphinx
bug napoleon help wanted

Most helpful comment

Here's what I came up with, which seems to work fine for my own case. Of course, if others on this issue agree to a fix like this, it should be merged in directly. For now I am just monkey-patching in my conf.py:

# monkey-patch napoleon to better handle optional parameters in numpydoc
# docstrings; see https://github.com/sphinx-doc/sphinx/issues/6861

def _fixup_napoleon_numpydoc():
    from sphinx.locale import _
    from sphinx.ext.napoleon import NumpyDocstring

    def _process_optional_params(self, fields):
        """
        Split a fields list into separate lists of positional parameters and
        keyword parameters.

        Possibly moves some fields out of their original documented order,
        though in practice, in most cases, optional/keyword parameters should
        always be listed after positional parameters.

        For Numpydoc, a parameter is treated as a keyword parameter if its type
        list ends with the keyword "optional".  In this case, the "optional" is
        removed from its type list, and instead the text "(optional)" is
        prepended to the field description.
        """

        positional = []
        keyword = []

        for name, type_, desc in fields:
            types = [t.strip() for t in type_.split(',')]
            optional = types and types[-1].lower() == 'optional'
            if optional:
                type_ = ', '.join(types[:-1])

                if not desc:
                    desc = ['']
                desc[0] = ('*(optional)* – ' + desc[0]).rstrip()

            if optional or name.startswith(r'\*\*'):
                keyword.append((name, type_, desc))
            else:
                positional.append((name, type_, desc))

        return positional, keyword

    def _parse_parameters_section(self, section):
        fields = self._consume_fields()
        pos_fields, kw_fields = self._process_optional_params(fields)
        if self._config.napoleon_use_param:
            lines = self._format_docutils_params(pos_fields)
        else:
            lines = self._format_fields(_('Parameters'), pos_fields)

        if self._config.napoleon_use_keyword:
            if self._config.napoleon_use_param:
                lines = lines[:-1]
            lines.extend(self._format_docutils_params(
                kw_fields, field_role='keyword', type_role='kwtype'))
        else:
            lines.extend(self._format_fields(
                _('Keyword Arguments'), kw_fields))

        return lines

    def _parse_other_parameters_section(self, section):
        fields = self._consume_fields()
        pos_fields, kw_fields = self._process_optional_params(fields)
        return self.format_fields(
                _('Other Parameters'), pos_fields + kw_fields)

    NumpyDocstring._process_optional_params = _process_optional_params
    NumpyDocstring._parse_parameters_section = _parse_parameters_section
    NumpyDocstring._parse_other_parameters_section = \
            _parse_other_parameters_section


_fixup_napoleon_numpydoc()

I think that, now that this adds all keyword arguments to a separate section, keeping the "(optional)" text is a bit redundant. But I keep it anyways, in the original spirit of the numpydoc format.

For example, here is an actual docstring from my project, and the resulting output:

    def simulate_scenarios(self, scenario_params, n_cpus=1, verbose=False):
        """
        Return an iterator over simulated SNPs given a scenario params table
        (see `Simulator.load_scenario_params` for the format of this table).

        This method should iterate over all scenarios in the simulation
        (possibly generating the simulation as well, or reading it from an
        existing simulation), which are then each passed to
        `Simulator.simulate_scenario` for each scenario.

        The items returned from the iterator should be a 3-tuple in the format
        ``(scenario_idx, rep_idx, SNPSample(snp=SNPs, pos=positions)``, where
        ``scenario_idx`` is the index into the scenario params table for the
        parameters that were used to produce this simulation; ``rep_idx`` is
        the replicate index, in the case where multiple replicates are
        generated for each scenario, if not it can just be ``0``; the final
        element is an `.SNPSample` instance containing the SNP matrix and
        positions array generated by the simulator.

        Parameters
        ----------
        scenario_params : `pandas.DataFrame`
            The scenario params table for the scenarios to simulate.
        n_cpus : int, optional
            If ``1``, scenarios are simulated in serial; for ``n_cpus > 1`` a
            process of pool of size ``n_cpus`` is used.  If ``n_cpus = 0`` or
            `None`, use the default number of CPUs used by
            `multiprocessing.pool.Pool`.
        """

Screenshot from 2020-03-26 15-08-17

All 13 comments

I noticed that doing:

def my_func(arg1, arg2):
    """
    Parameters
    ----------
    arg1 : :obj:`str`
        Description of `arg1`
    arg2 : :obj:`int`, optional
        Description of `arg2`, defaults to 0
    """
    pass

works fine, but the example I provided above is lifted straight from the sphinx.ext.napoleon docs, and many projects don't write :obj: and instead rely on it being interpreted as obj by default. I think the bug here is that if :obj or :class: etc is not specified, :obj is then applied to all items including optional.

I also get warnings about array_like which is a valid type according to the numpydoc spec

And I also get warnings for the following syntax which is valid:

        format : {'auto', 'ascii', 'latex'}, optional

The docstring is expanded to following reST document inside napoleon:

.. py:function:: my_func(arg1, arg2)
   :module: example

   :param arg1: Description of `arg1`
   :type arg1: str
   :param arg2: Description of `arg2`, defaults to 0
   :type arg2: int, optional

But py:function directive expects arguments of :type: field is a list of types. So it considers "optional" is also kind of type. I think this is interface mismatch between napoleon and python domain. I understand "optional" is a special keyword for numpydoc. So it should be treated not as reference. But I don't understand how to fix this...

I also get warnings about array_like which is a valid type according to the numpydoc spec

I don't think numpydoc says "array_like" is a valid type. It does not mention about validity. It seems it takes free formatted text. It also uses "data-type" and "iterable object" for example. But it is hard to resolve these type-like text. I think no-nitpicky mode is used for such ambiguous types. -1 for supporting these types.

On the other hand, it would be better to suppress warnings for set of values like {'auto', 'ascii', 'latex'} if possible.

Regarding array_like, here's the text I was referring to:

array_like : For functions that take arguments which can have not only a type ndarray, but also types that can be converted to an ndarray (i.e. scalar types, sequence types), those arguments can be documented with type array_like.

So I think that is also a special term we should allow without reference.

And I'm on board with not trying to allow things like data-type and iterable object - for those one can always add nitpick exceptions if really needed.

The biggest blocker for me at the moment is the set of values, because everything else can be worked around with exceptions.

I think this has to be fixed in NumpyDocstring._consume_field: we'd need to create a function that parses numpydoc type specifications and converts something like

int, optional

to the python domain:

:class:`int`, optional

which I believe should silence any nit-picky warnings?

I had a look into this, and while NumpyDocstring._consume_field might be part of the picture, it certainly isn't the whole picture.

The main problem, as @tk0miya pointed out, is that Napoleon will output something like:

.. py:function:: my_func(arg1, arg2)
   :param arg1: arg1 description
   :type arg1: int, optional

At this point, rendering of the py:function directive and how it handles :type <param>: fields is totally given over to the default renderer for that directive, and it no longer has any connection to Google docstrings or Numpy docstrings, etc.

The second problem is that the available Info field options do not have an option for specifying if a parameter is optional, or what its default value is. I'm picturing that a possible solution could involve adding something like this. For example, for a function my_func(arg1, arg2=1):

.. py:function:: my_func(arg1, [arg2=1])
   :param arg1: arg1 description
   :type arg1: int
   :default arg1: 1

Perhaps this would be redundant, since the fact that an argument is optional, and its default, are already reflected in the function signature line. But an explicit :default <arg>: option could indicate explicitly that you want the default value documented in the parameter description list as well. Currently we usually do this manually by adding something like "(default: 1") at the end of the parameter description. But this could take care of doing that automatically. Optionally, a :default arg1: without specifying a value would be shorthand for just marking the parameter "(optional)".

Then, on the Napoleon, NumpyDocstring end of things, it could detect "optional" at the end of a type list, remove that from the type list, and instead output :default <argname>: for that field.

Digging a little deeper, it seems that Napoleon does already monkeypatch the Python domain to add a distinct "Keyword Arguments" field to the docs, and keyword arguments can be added to it by marking them :keyword <argname>:, :kwarg <argname>:, or :kwparam <argname>: instead of `:param :, etc.

However, it only really creates :kwparam:s when parsing a "keyword arguments" section in the docstring, which is used in Google docstrings but not numpydoc docstrings. At the very least, for numpydoc, it could detect "optional" in a field's type list, and mark that field as a :kwparam: instead of a plain :param:. I'm going to attempt that.

Thanks for working on this, @embray.

Just for reference, numpydoc accepts a mapping that translates between type entries – so we could map mapping to the standard library's dict-like term or array-like / array_like to whatever it officially points to in numpy – and a list that allows ignoring certain words (like of, and and maybe also or?).

That would push most of the issue to the user but we'd still need to find a way to support sets of values. However, at least for optional / default the fix you described sounds cleaner to me.

Here's what I came up with, which seems to work fine for my own case. Of course, if others on this issue agree to a fix like this, it should be merged in directly. For now I am just monkey-patching in my conf.py:

# monkey-patch napoleon to better handle optional parameters in numpydoc
# docstrings; see https://github.com/sphinx-doc/sphinx/issues/6861

def _fixup_napoleon_numpydoc():
    from sphinx.locale import _
    from sphinx.ext.napoleon import NumpyDocstring

    def _process_optional_params(self, fields):
        """
        Split a fields list into separate lists of positional parameters and
        keyword parameters.

        Possibly moves some fields out of their original documented order,
        though in practice, in most cases, optional/keyword parameters should
        always be listed after positional parameters.

        For Numpydoc, a parameter is treated as a keyword parameter if its type
        list ends with the keyword "optional".  In this case, the "optional" is
        removed from its type list, and instead the text "(optional)" is
        prepended to the field description.
        """

        positional = []
        keyword = []

        for name, type_, desc in fields:
            types = [t.strip() for t in type_.split(',')]
            optional = types and types[-1].lower() == 'optional'
            if optional:
                type_ = ', '.join(types[:-1])

                if not desc:
                    desc = ['']
                desc[0] = ('*(optional)* – ' + desc[0]).rstrip()

            if optional or name.startswith(r'\*\*'):
                keyword.append((name, type_, desc))
            else:
                positional.append((name, type_, desc))

        return positional, keyword

    def _parse_parameters_section(self, section):
        fields = self._consume_fields()
        pos_fields, kw_fields = self._process_optional_params(fields)
        if self._config.napoleon_use_param:
            lines = self._format_docutils_params(pos_fields)
        else:
            lines = self._format_fields(_('Parameters'), pos_fields)

        if self._config.napoleon_use_keyword:
            if self._config.napoleon_use_param:
                lines = lines[:-1]
            lines.extend(self._format_docutils_params(
                kw_fields, field_role='keyword', type_role='kwtype'))
        else:
            lines.extend(self._format_fields(
                _('Keyword Arguments'), kw_fields))

        return lines

    def _parse_other_parameters_section(self, section):
        fields = self._consume_fields()
        pos_fields, kw_fields = self._process_optional_params(fields)
        return self.format_fields(
                _('Other Parameters'), pos_fields + kw_fields)

    NumpyDocstring._process_optional_params = _process_optional_params
    NumpyDocstring._parse_parameters_section = _parse_parameters_section
    NumpyDocstring._parse_other_parameters_section = \
            _parse_other_parameters_section


_fixup_napoleon_numpydoc()

I think that, now that this adds all keyword arguments to a separate section, keeping the "(optional)" text is a bit redundant. But I keep it anyways, in the original spirit of the numpydoc format.

For example, here is an actual docstring from my project, and the resulting output:

    def simulate_scenarios(self, scenario_params, n_cpus=1, verbose=False):
        """
        Return an iterator over simulated SNPs given a scenario params table
        (see `Simulator.load_scenario_params` for the format of this table).

        This method should iterate over all scenarios in the simulation
        (possibly generating the simulation as well, or reading it from an
        existing simulation), which are then each passed to
        `Simulator.simulate_scenario` for each scenario.

        The items returned from the iterator should be a 3-tuple in the format
        ``(scenario_idx, rep_idx, SNPSample(snp=SNPs, pos=positions)``, where
        ``scenario_idx`` is the index into the scenario params table for the
        parameters that were used to produce this simulation; ``rep_idx`` is
        the replicate index, in the case where multiple replicates are
        generated for each scenario, if not it can just be ``0``; the final
        element is an `.SNPSample` instance containing the SNP matrix and
        positions array generated by the simulator.

        Parameters
        ----------
        scenario_params : `pandas.DataFrame`
            The scenario params table for the scenarios to simulate.
        n_cpus : int, optional
            If ``1``, scenarios are simulated in serial; for ``n_cpus > 1`` a
            process of pool of size ``n_cpus`` is used.  If ``n_cpus = 0`` or
            `None`, use the default number of CPUs used by
            `multiprocessing.pool.Pool`.
        """

Screenshot from 2020-03-26 15-08-17

Now, this is fixed by @keewis via #7690. Thanks a lot!

Was this page helpful?
0 / 5 - 0 ratings