Vision: Standardize Python Documentation styles

Created on 13 Nov 2020  路  7Comments  路  Source: pytorch/vision

馃摎 Documentation

~Currently~ TorchVision uses different styles for documenting functions. Here are a few:

https://github.com/pytorch/vision/blob/973db14523d338fa4e772574fe4d49763fb25245/torchvision/io/image.py#L116-L130

https://github.com/pytorch/vision/blob/973db14523d338fa4e772574fe4d49763fb25245/torchvision/io/image.py#L135-L144

https://github.com/pytorch/vision/blob/973db14523d338fa4e772574fe4d49763fb25245/torchvision/models/mobilenet.py#L15-L25

https://github.com/pytorch/vision/blob/973db14523d338fa4e772574fe4d49763fb25245/torchvision/transforms/transforms.py#L195-L210

It's worth selecting one style, standardizing all pydocs and putting a CI test to ensure new PR follows the standard.


Edit: A series of PRs have manually resolved many of the inconsistencies described here. What's missing is adding a CI test to verify that new PRs follow the same standard.

PyTorch seems to have done work towards this direction, so it's worth investigating if we should adopt their solution. See comment: https://github.com/pytorch/vision/pull/3259#issuecomment-765315036

Another alternative is to adopt an open-source docstring linter such as darglint. It seems that it is capable of validating google-style docstrings, it's configurable via setup.cfg and available via pip.

pip install darglint

# setup.cfg
[darglint]
strictness=short
ignore=DAR003,DAR004,DAR005,DAR103,DAR104,DAR105,DAR203,DAR301,DAR302,DAR401,DAR402,DAR501
enable=DAR001,DAR002,DAR101,DAR102,DAR201,DAR202

Unfortunately it's quite slow at the moment:

Not parallelised run:
darglint torchvision

325.77 sec

Parallelised run:
SECONDS=0 ; find torchvision -name "*.py" | xargs -n 1 -P 8 darglint ; echo $SECONDS

119 sec

cc @seemethere @vfdev-5 @fmassa

enhancement ci documentation

Most helpful comment

I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!

All 7 comments

Good catch!

We should follow the documentation style from PyTorch, which uses Args: most of the time, although even PyTorch is not consistent there

cc @brianjo

I'll see if I can find any guidance we have for this and I'll write some up, if I can't find anything. Thanks!

I've been investigating this a bit and here's what I'm thinking. We want to:

  1. make sure that the Args: format is used (i.e. google style)
  2. make sure that a function's signature is in sync with its documented parameters (i.e. that all parameters are properly documented)

1) can be done by adding

napoleon_numpy_docstring = False
napoleon_google_docstring = True

in docs/conf.py. I made some experiments in https://github.com/pytorch/vision/pull/3278 and sphinx will properly issue a warning if we use the following syntax:

----------

We can turn sphinx warnings into error as suggested in https://github.com/pytorch/vision/pull/3259#issuecomment-765315036 so that the CI can enforce such checks. Note that turning sphinx warnings into errors means that the CI will start failing with other kinds of doc-related issues: this isn't a big deal IMHO because there are few of these: only 37 sphinx warnings are raised as-of now, so they don't appear too often.

2) can be solved by using pydocstyle and by adding the following line in setup.cfg:

[pydocstyle]
ignore=D100,D101,D102,D103,D104,D105,D106,D107,D200,D201,D202,D203,D204,D205,D206,D207,D208,D209,D210,D211,D212,D213,D214,D215,D300,D301,D302,D400,D401,D401,D402,D403,D404,D405,D406,D407,D408,D409,D410,D411,D412,D413,D414,D415,D416

This deactivate all checks but this one: D417 | Missing argument descriptions in the docstring. Shoudl we want to, we may want to stop ignoring other stuff in the future (?)

pydocstyle takes <= 5 seconds to run on torchvision and if it ever becomes too slow we can always just run it on the diff on each PR.

May I suggest to proceed with the following:

  • Fix current sphinx warnings and update conf.py as above
  • Use sphinx picky mode and start raising errors on warnings
  • Fix all of the pydocstyle D417 issues (there are just 20 of them so one PR should be enough) and introduce a new CI check using pydocstyle that would then run on all PRs

Thoughts @datumbox @fmassa ? I'm happy to follow up with PRs if you're happy with this solution

@NicolasHug Great analysis and proposals, thanks for looking into it.

@mattip Any thoughts on the above?

+1 for changing the conf.py and fixing the sphinx errors, then tun on the nitpicky mode. What is the overlap between the warnings from sphinx and the warnings from pydocstyle? If sphinx makes the pydocstyle tool redundant, then maybe it is not needed?

There are some redundancy (like checking for unexpected indentation), but as far as I can tell sphinx doesn't issue a warning when a parameter is missing from a function's docstring (item 2 above), which is what pydocstyle can be useful for

Was this page helpful?
0 / 5 - 0 ratings