Sphinx: autosummary table gets confused by long / complex type hints

Created on 18 Apr 2019  Â·  11Comments  Â·  Source: sphinx-doc/sphinx

Describe the bug

While creating the table for .. autosummary::, the function

https://github.com/sphinx-doc/sphinx/blob/15bc5a32bb0b78b432803e55fffa10c801182c75/sphinx/ext/autosummary/__init__.py#L430

will not produce the expected results, some type hints remain, albeit broken up. Here is the long function signature in question:

from pathlib import Path
# ...
def filter_file(path: Union[Path, str], pattern: str,
                repl: Union[Callable[[Match], str], str], count: int = 0,
                flags: int = 0, backup_extension: str = ".orig",
                line_based: bool = False, demand_different: bool = True,
                encoding: Optional[str] = None) -> Path:

In the .. autosummary:: table, the following is produced:

filter_file(path, str], pattern, repl, str], …)

The rendered docs can be found online here, but the root of the problem can be seen by the path and repl parameters (first and third parameter).

Stepping through, what actually happens is that everything just gets split on ', ':

https://github.com/sphinx-doc/sphinx/blob/15bc5a32bb0b78b432803e55fffa10c801182c75/sphinx/ext/autosummary/__init__.py#L448-L454

So when we have something like Union[Path, str] or the (rather ugly) more complicated Union[Callable[[Match], str], str], splitting on ', ' will give us something like this (splitting each argument on line to make clear):

(Pdb) p args
[
    'path: Union[pathlib.Path',
    'str]',
    'pattern: str',
    'repl: Union[Callable[[Match[AnyStr]]',
    'str]',
    # ... remaining arguments ...
]

This is actually a pretty hard problem to solve with regular expressions, if it's even possible to do so (unless I'm missing something, it's kind of late xD). I think we would have to start counting how many [ we have seen and match them with ] (after having seen a :).

So I was playing around with alternatives, found a nice SO answer that has some relevant stuff, here's a rough prototype:

#!/usr/bin/env python3

import ast
import re
class HintDestroyer(ast.NodeTransformer):
    # See: https://stackoverflow.com/a/42734810/3814202
    # we only need to visit function since we only create function
    def visit_FunctionDef(self, node):
        node.returns = None
        if node.args.args:
            for arg in node.args.args:
                # just for a glimpse ...
                print("Removing annotation for {}: {}".format(arg.arg, arg.annotation.__dict__))
                arg.annotation = None
        return node

# sample input that breaks
sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"

# create a valid function signature to parse with ast module.
function = "def foo{sig}: pass\n".format(sig=sig)

# parse and transform
parsed_function = ast.parse(function)
transformed = HintDestroyer().visit(parsed_function)
# we only created a single "function", there should be a single module with a
# function as its body.
transformed_function = transformed.body[0]

# is this what we want?
args = ", ".join([item.arg for item in transformed_function.args.args])
print(args)
# output:
# path, pattern, repl, count, flags, backup_extension, line_based, demand_different, encoding

Right now I _believe_ what args and opts in the current autosummary code are doing is trying to shove opts at the end _if there is space_ (since there's a maximum length here). I'm not sure why this happens, since all of the keyword arguments have to be last anyway right?

Basically we can use ast and just limited join on the transformed thing if we don't care about necessarily preserving existing behavior. But I'm not sure about the implications of all this ... so thought I would open up the issue for discussion of other possible solutions or maybe downsides to use ast (it definitely doesn't come for free, but it's also a simple parse).

To Reproduce
Steps to reproduce the behavior:

$ git clone https://github.com/svenevs/ci_exec.git
$ cd ci_exec
$ tox -e docs
$ open .tox/docs/tmp/html/index.html

# ... or ...

$ pip install -r docs/requirements.txt
$ cd docs
$ make html 
$ open _build/html/index.html

Expected behavior
No stray str] end up as a "parameter" in the autosummary table.

Your project
https://github.com/svenevs/ci_exec

Screenshots

over_parse

Environment info

  • OS: linux
  • Python version: 3.7.2
  • Sphinx version: 2.0.1
  • Sphinx extensions: sphinx.ext.autosummary
  • Extra tools: n/a
autosummary bug

All 11 comments

Redacted previous comment, sorry for noise. I need to spend more time looking at the function grammar, I made assumptions about ordering that I apparently cannot. I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;) We don't need to do any transforming, but we do need to parse a def foo{sig}: pass\n like thing. I just need to go learn more about python than I expected, the rules of where *args and other things are a lot more complicated than I thought x0

Sorry for late response. I just posted #6344 to fix this.

I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;)

Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (a=<object object at 0x1047a60f0>). So ast module is not good for this case...

Ah interesting. I should be able to test the PR tonight but it looks good to me from inspection!

Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (a=<object object at 0x1047a60f0>). So ast module is not good for this case...

In this case, we could cheat the system if we can rely on <angle brackets> for inspected stuff. Just replace <anything> with object() or some dummy value that will satisfy ast? I think #6344 can be patched to work as needed, but I may bring this up in the context of autodoc soon too. I'll reach out to the mailing list when I figure out exactly what I am trying to accomplish, the goal would be to enable both extensions to use the same type hint stripping.

Sharing the test code that I was playing with back then for ast, the idea would be to make raw_signature available outside of mangle so that we can have a generic way to remove type hints. So maybe rename raw_signature to remove_type_hints (so that users could wield it with autodoc-process-signature if they wanted). The outstanding question is creating test cases where it's an inspected object.

As said previously, I think the easy solution is to find out how to fix #6344. I can revisit this in the future when I have more time. I just figured I'd make this code available publicly in case anybody wants to step in and see if they can get it to work with inspected objects. We'd need to pre-process sig to remove <object at 0xdeadbeef> stuff, replacing it with something that ast will be ok with e.g. object()

#!/usr/bin/env python3

import ast
import itertools


def raw_signature(sig):
    # create a valid function signature to parse with ast module.
    function = "def foo{sig}: pass\n".format(sig=sig)
    parsed = ast.parse(function)
    # parsed will have a body that is the module, with one element: the function
    parsed_function = parsed.body[0]

    # Gather all function parameter names without any annotations.  Store the
    # col_offset as the first index so that argument order can be preserved by
    # sorting.
    all_args = []  # type: List[Tuple[int, str]]
    # args.args: list of positional arguments
    if parsed_function.args.args:
        for item in parsed_function.args.args:
            all_args.append((item.col_offset, item.arg))
    # args.vararg: variadic pack (e.g., *args), * is removed from argument name
    # so we add it back.
    if parsed_function.args.vararg:
        all_args.append((
            parsed_function.args.vararg.col_offset,
            "*{0}".format(parsed_function.args.vararg.arg)
        ))
    # args.kwonlyargs: keyword only args (e.g., after a '*,')
    if parsed_function.args.kwonlyargs:
        for item in parsed_function.args.kwonlyargs:
            all_args.append((item.col_offset, item.arg))
    # args.kwarg: variadic keyword pack (e.g., **kwargs), ** is removed from
    # argument name so we add it back.
    if parsed_function.args.kwarg:
        all_args.append((
            parsed_function.args.kwarg.col_offset,
            "**{0}".format(parsed_function.args.kwarg.arg)
        ))

    # Sort based off the col_offset, but keep the name.
    ordered_arguments = [item[1] for item in sorted(all_args)]
    COMPARE = True
    if COMPARE:
        unordered = [item[1] for item in all_args]
        print("unordered: ", ", ".join(unordered))
    # TODO: this becomes a limited_join
    return ", ".join(ordered_arguments)


if __name__ == "__main__":
    print("==> no parameters:")
    print(raw_signature("()"))

    print("==> no annotations:")
    print(raw_signature("(boom, blam, blah)"))

    print("==> (stage: str, *args, **kwargs)")
    print(raw_signature("(stage: str, *args, **kwargs)"))
    # TODO: what are the rules????
    print(raw_signature("(stage: str, x=12, *args, x=11, **kwargs)"))

    print("==> *, thingy")
    print(raw_signature("(x=12, *, y=13, **kwargs)"))
    print(raw_signature("(foo: str, *, alpha: int = 2, beta: str = 'hi')"))

    sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"
    print("==> complicated example:")
    print(raw_signature(sig))

I found other case.
https://github.com/sphinx-doc/sphinx/blob/165897a74951fb03e497d6e05496ce02e897f820/tests/test_ext_autosummary.py#L43-L44

I don't know where such signatures come from. But it has been supported by mangle_signature().

IMO, current approach is not good. autosummary already have raw python function (or method) object once. So we can generate better signature from it. We don't need to modify signature string.

Note: Users can override the signature of functions (or methods) by docstring when autodoc_docstring_signature is set.

@pv do you happen to know what these test cases for autosummary are?

TL;DR this thread is trying to find a way to remove [ and ] from type hints.

https://github.com/sphinx-doc/sphinx/blame/165897a74951fb03e497d6e05496ce02e897f820/tests/test_ext_autosummary.py#L43

If we removed them, would that be OK or do they serve a specific purpose? There's a couple other tests like this that don't seem to be "valid python function signatures", we're a little confused about what they represent.

You can do it by overriding signature via docstring (when autodoc_docstring_signature enabled) like following:

def foo():
    """foo(x[, y, z])
    docstring.
    """

I don't know why this is useful. But it is current behavior.

@svenevs: that's the output produced by numpy ufunc and f2py docstring generators. The notation means positional-only arguments --- at the time I think Python had not converged on a common convention on these. I'm not immediately sure if this has been changed in numpy/f2py by now.

Supplying the signature on the first line of the docstring is necessary for extension modules (eg written in C), since Python usually cannot inspect the signatures of the functions provided by them.

So, it would be useful to not break this if there's no pressing need to.

Fixed by #6361.
Thank you for reporting.

Yay! Thank you very much for the help and information, I learned about some interesting new features :slightly_smiling_face:

Was this page helpful?
0 / 5 - 0 ratings