~Currently~ TorchVision uses different styles for documenting functions. Here are a few:
It's worth selecting one style, standardizing all pydocs and putting a CI test to ensure new PR follows the standard.
Edit: A series of PRs have manually resolved many of the inconsistencies described here. What's missing is adding a CI test to verify that new PRs follow the same standard.
PyTorch seems to have done work towards this direction, so it's worth investigating if we should adopt their solution. See comment: https://github.com/pytorch/vision/pull/3259#issuecomment-765315036
Another alternative is to adopt an open-source docstring linter such as darglint. It seems that it is capable of validating google-style docstrings, it's configurable via setup.cfg and available via pip.
pip install darglint
# setup.cfg
[darglint]
strictness=short
ignore=DAR003,DAR004,DAR005,DAR103,DAR104,DAR105,DAR203,DAR301,DAR302,DAR401,DAR402,DAR501
enable=DAR001,DAR002,DAR101,DAR102,DAR201,DAR202
Unfortunately it's quite slow at the moment:
Not parallelised run:
darglint torchvision
325.77 sec
Parallelised run:
SECONDS=0 ; find torchvision -name "*.py" | xargs -n 1 -P 8 darglint ; echo $SECONDS
119 sec
cc @seemethere @vfdev-5 @fmassa
Good catch!
We should follow the documentation style from PyTorch, which uses Args: most of the time, although even PyTorch is not consistent there
cc @brianjo
I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!
I've been investigating this a bit and here's what I'm thinking. We want to:
Args: format is used (i.e. google style)1) can be done by adding
napoleon_numpy_docstring = False
napoleon_google_docstring = True
in docs/conf.py. I made some experiments in https://github.com/pytorch/vision/pull/3278 and sphinx will properly issue a warning if we use the following syntax:
----------
We can turn sphinx warnings into error as suggested in https://github.com/pytorch/vision/pull/3259#issuecomment-765315036 so that the CI can enforce such checks. Note that turning sphinx warnings into errors means that the CI will start failing with other kinds of doc-related issues: this isn't a big deal IMHO because there are few of these: only 37 sphinx warnings are raised as-of now, so they don't appear too often.
2) can be solved by using pydocstyle and by adding the following line in setup.cfg:
[pydocstyle]
ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414,D415,D416
This deactivate all checks but this one: D417 | Missing argument descriptions in the docstring. Shoudl we want to, we may want to stop ignoring other stuff in the future (?)
pydocstyle takes <= 5 seconds to run on torchvision and if it ever becomes too slow we can always just run it on the diff on each PR.
May I suggest to proceed with the following:
conf.py as abovepydocstyle D417 issues (there are just 20 of them so one PR should be enough) and introduce a new CI check using pydocstyle that would then run on all PRsThoughts @datumbox @fmassa ? I'm happy to follow up with PRs if you're happy with this solution
@NicolasHug Great analysis and proposals, thanks for looking into it.
@mattip Any thoughts on the above?
+1 for changing the conf.py and fixing the sphinx errors, then tun on the nitpicky mode. What is the overlap between the warnings from sphinx and the warnings from pydocstyle? If sphinx makes the pydocstyle tool redundant, then maybe it is not needed?
There are some redundancy (like checking for unexpected indentation), but as far as I can tell sphinx doesn't issue a warning when a parameter is missing from a function's docstring (item 2 above), which is what pydocstyle can be useful for
Most helpful comment
I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!