Describe the bug
While creating the table for .. autosummary::, the function
will not produce the expected results, some type hints remain, albeit broken up. Here is the long function signature in question:
from pathlib import Path
# ...
def filter_file(path: Union[Path, str], pattern: str,
repl: Union[Callable[[Match], str], str], count: int = 0,
flags: int = 0, backup_extension: str = ".orig",
line_based: bool = False, demand_different: bool = True,
encoding: Optional[str] = None) -> Path:
In the .. autosummary:: table, the following is produced:
filter_file(path, str], pattern, repl, str], …)
The rendered docs can be found online here, but the root of the problem can be seen by the path and repl parameters (first and third parameter).
Stepping through, what actually happens is that everything just gets split on ', ':
So when we have something like Union[Path, str] or the (rather ugly) more complicated Union[Callable[[Match], str], str], splitting on ', ' will give us something like this (splitting each argument on line to make clear):
(Pdb) p args
[
'path: Union[pathlib.Path',
'str]',
'pattern: str',
'repl: Union[Callable[[Match[AnyStr]]',
'str]',
# ... remaining arguments ...
]
This is actually a pretty hard problem to solve with regular expressions, if it's even possible to do so (unless I'm missing something, it's kind of late xD). I think we would have to start counting how many [ we have seen and match them with ] (after having seen a :).
So I was playing around with alternatives, found a nice SO answer that has some relevant stuff, here's a rough prototype:
#!/usr/bin/env python3
import ast
import re
class HintDestroyer(ast.NodeTransformer):
# See: https://stackoverflow.com/a/42734810/3814202
# we only need to visit function since we only create function
def visit_FunctionDef(self, node):
node.returns = None
if node.args.args:
for arg in node.args.args:
# just for a glimpse ...
print("Removing annotation for {}: {}".format(arg.arg, arg.annotation.__dict__))
arg.annotation = None
return node
# sample input that breaks
sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"
# create a valid function signature to parse with ast module.
function = "def foo{sig}: pass\n".format(sig=sig)
# parse and transform
parsed_function = ast.parse(function)
transformed = HintDestroyer().visit(parsed_function)
# we only created a single "function", there should be a single module with a
# function as its body.
transformed_function = transformed.body[0]
# is this what we want?
args = ", ".join([item.arg for item in transformed_function.args.args])
print(args)
# output:
# path, pattern, repl, count, flags, backup_extension, line_based, demand_different, encoding
Right now I _believe_ what args and opts in the current autosummary code are doing is trying to shove opts at the end _if there is space_ (since there's a maximum length here). I'm not sure why this happens, since all of the keyword arguments have to be last anyway right?
Basically we can use ast and just limited join on the transformed thing if we don't care about necessarily preserving existing behavior. But I'm not sure about the implications of all this ... so thought I would open up the issue for discussion of other possible solutions or maybe downsides to use ast (it definitely doesn't come for free, but it's also a simple parse).
To Reproduce
Steps to reproduce the behavior:
$ git clone https://github.com/svenevs/ci_exec.git
$ cd ci_exec
$ tox -e docs
$ open .tox/docs/tmp/html/index.html
# ... or ...
$ pip install -r docs/requirements.txt
$ cd docs
$ make html
$ open _build/html/index.html
Expected behavior
No stray str] end up as a "parameter" in the autosummary table.
Your project
https://github.com/svenevs/ci_exec
Screenshots

Environment info
Redacted previous comment, sorry for noise. I need to spend more time looking at the function grammar, I made assumptions about ordering that I apparently cannot. I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;) We don't need to do any transforming, but we do need to parse a def foo{sig}: pass\n like thing. I just need to go learn more about python than I expected, the rules of where *args and other things are a lot more complicated than I thought x0
Sorry for late response. I just posted #6344 to fix this.
I think ast is the way to go here, please speak up if patches using ast module will not be accepted so I don't keep spending time on it ;)
Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (a=<object object at 0x1047a60f0>). So ast module is not good for this case...
Ah interesting. I should be able to test the PR tonight but it looks good to me from inspection!
Unfortunately, sig is not a valid python code. It sometimes takes an inspected code like (
a=<object object at 0x1047a60f0>). So ast module is not good for this case...
In this case, we could cheat the system if we can rely on <angle brackets> for inspected stuff. Just replace <anything> with object() or some dummy value that will satisfy ast? I think #6344 can be patched to work as needed, but I may bring this up in the context of autodoc soon too. I'll reach out to the mailing list when I figure out exactly what I am trying to accomplish, the goal would be to enable both extensions to use the same type hint stripping.
Sharing the test code that I was playing with back then for ast, the idea would be to make raw_signature available outside of mangle so that we can have a generic way to remove type hints. So maybe rename raw_signature to remove_type_hints (so that users could wield it with autodoc-process-signature if they wanted). The outstanding question is creating test cases where it's an inspected object.
As said previously, I think the easy solution is to find out how to fix #6344. I can revisit this in the future when I have more time. I just figured I'd make this code available publicly in case anybody wants to step in and see if they can get it to work with inspected objects. We'd need to pre-process sig to remove <object at 0xdeadbeef> stuff, replacing it with something that ast will be ok with e.g. object()
#!/usr/bin/env python3
import ast
import itertools
def raw_signature(sig):
# create a valid function signature to parse with ast module.
function = "def foo{sig}: pass\n".format(sig=sig)
parsed = ast.parse(function)
# parsed will have a body that is the module, with one element: the function
parsed_function = parsed.body[0]
# Gather all function parameter names without any annotations. Store the
# col_offset as the first index so that argument order can be preserved by
# sorting.
all_args = [] # type: List[Tuple[int, str]]
# args.args: list of positional arguments
if parsed_function.args.args:
for item in parsed_function.args.args:
all_args.append((item.col_offset, item.arg))
# args.vararg: variadic pack (e.g., *args), * is removed from argument name
# so we add it back.
if parsed_function.args.vararg:
all_args.append((
parsed_function.args.vararg.col_offset,
"*{0}".format(parsed_function.args.vararg.arg)
))
# args.kwonlyargs: keyword only args (e.g., after a '*,')
if parsed_function.args.kwonlyargs:
for item in parsed_function.args.kwonlyargs:
all_args.append((item.col_offset, item.arg))
# args.kwarg: variadic keyword pack (e.g., **kwargs), ** is removed from
# argument name so we add it back.
if parsed_function.args.kwarg:
all_args.append((
parsed_function.args.kwarg.col_offset,
"**{0}".format(parsed_function.args.kwarg.arg)
))
# Sort based off the col_offset, but keep the name.
ordered_arguments = [item[1] for item in sorted(all_args)]
COMPARE = True
if COMPARE:
unordered = [item[1] for item in all_args]
print("unordered: ", ", ".join(unordered))
# TODO: this becomes a limited_join
return ", ".join(ordered_arguments)
if __name__ == "__main__":
print("==> no parameters:")
print(raw_signature("()"))
print("==> no annotations:")
print(raw_signature("(boom, blam, blah)"))
print("==> (stage: str, *args, **kwargs)")
print(raw_signature("(stage: str, *args, **kwargs)"))
# TODO: what are the rules????
print(raw_signature("(stage: str, x=12, *args, x=11, **kwargs)"))
print("==> *, thingy")
print(raw_signature("(x=12, *, y=13, **kwargs)"))
print(raw_signature("(foo: str, *, alpha: int = 2, beta: str = 'hi')"))
sig = "(path: Union[pathlib.Path, str], pattern: str, repl: Union[Callable[[Match[AnyStr]], str], str], count: int = 0, flags: int = 0, backup_extension: str = '.orig', line_based: bool = False, demand_different: bool = True, encoding: Optional[str] = None) -> pathlib.Path"
print("==> complicated example:")
print(raw_signature(sig))
I found other case.
https://github.com/sphinx-doc/sphinx/blob/165897a74951fb03e497d6e05496ce02e897f820/tests/test_ext_autosummary.py#L43-L44
I don't know where such signatures come from. But it has been supported by mangle_signature().
IMO, current approach is not good. autosummary already have raw python function (or method) object once. So we can generate better signature from it. We don't need to modify signature string.
Note: Users can override the signature of functions (or methods) by docstring when autodoc_docstring_signature is set.
@pv do you happen to know what these test cases for autosummary are?
TL;DR this thread is trying to find a way to remove [ and ] from type hints.
If we removed them, would that be OK or do they serve a specific purpose? There's a couple other tests like this that don't seem to be "valid python function signatures", we're a little confused about what they represent.
You can do it by overriding signature via docstring (when autodoc_docstring_signature enabled) like following:
def foo():
"""foo(x[, y, z])
docstring.
"""
I don't know why this is useful. But it is current behavior.
@svenevs: that's the output produced by numpy ufunc and f2py docstring generators. The notation means positional-only arguments --- at the time I think Python had not converged on a common convention on these. I'm not immediately sure if this has been changed in numpy/f2py by now.
Supplying the signature on the first line of the docstring is necessary for extension modules (eg written in C), since Python usually cannot inspect the signatures of the functions provided by them.
So, it would be useful to not break this if there's no pressing need to.
Fixed by #6361.
Thank you for reporting.
Yay! Thank you very much for the help and information, I learned about some interesting new features :slightly_smiling_face: