A while ago, we added opinionated autoformatting to our frontend code
with Prettier (see #2466). We’re happy with the results! But, much to my
and @davidsoergel’s chagrin, we still don’t have any autoformatting for
the backend.
The standard Python formatter is Black. It’s maintained by
the Python Software Foundation. It’s used by other Google projects, too,
like the Google Cloud SDK.
Black and Prettier have similar philosophies, and also have similar
style verdicts. As with any autoformatter, the style itself isn’t that
important, but the consistency is a nice bonus.
Here is an online playground for Black: https://black.now.sh/
(Same as at #2466.) Most importantly, you don’t have to think about
formatting while writing, reviewing, or reading code.
Consistent style also makes diffs smaller, and improves greppability
(e.g., you don’t have to account for two quote forms).
The diff is not that large—much smaller than the Prettier diff both in
number of changed lines and total delta in number of lines:
$ git ls-files -z '*.py' | xargs -0 wc -l | tail -n 1
69088 total
$ black . >/dev/null 2>/dev/null
$ git ls-files -z '*.py' | xargs -0 wc -l | tail -n 1
70718 total
$ git diff --shortstat
283 files changed, 18766 insertions(+), 17136 deletions(-)
(Note: The above is running black-2spaces with an 80-character
line limit.)
We can chunk this by package, as with the Prettier change.
Running Black on our entire codebase takes 2.5 seconds on my machine.
The other formatter that we could use is YAPF, which is code
that happens to be owned by Google. Unfortunately, YAPF does not appear
to have any configuration option to avoid column alignment without using
literal tab characters (which we presumably want to avoid). IMHO, this
is such a serious failure that it knocks YAPF out of consideration.
What is column alignment and why is it bad?
Consider three renderings of the same code:
# Black style
response = requests.post(
endpoint,
data=post_body,
timeout=_REQUEST_TIMEOUT_SECONDS,
headers={"User-Agent": "tensorboard/%s" % version.VERSION},
)
# Hanging indent
response = requests.post(
endpoint,
data=post_body,
timeout=_REQUEST_TIMEOUT_SECONDS,
headers={"User-Agent": "tensorboard/%s" % version.VERSION})
# Columnar alignment (never do this)
response = requests.post(endpoint,
data=post_body,
timeout=_REQUEST_TIMEOUT_SECONDS,
headers={"User-Agent": "tensorboard/%s" % version.VERSION})
As with Prettier, Black’s continuation style is nicer on the blame for
the same reason that trailing commas are. It’s also consistent with the
formatting for multi-line lists and dicts. The second style, using
hanging indents, is also okay, though you have the “no trailing comma”
blame pollution problem.
But columnar alignment is the worst of all worlds. It forces code far
to the right, decreasing effective line length. It is brittle with
respect to changes on the left: imagine changing requests.post to
request.get, which is one character shorter. You’d have to either
totally break alignment of the following lines or pollute an
unbounded segment of blame by fixing all of their indentations. Columnar
alignment also creates lines starting at arbitrary columns rather than
nice tab-stop boundaries.
Needless to say, Black never employs this style.
There is also intrinsic merit in selecting the style that is officially
blessed by the PSF, and toward which the ecosystem seems to be moving.
Here’s what we need to do to make this happen.
line-length = 80 withblack --check . to our CI, and reformat all code.CONTRIBUTING.md to show how to configure format-on-save.I’ll post a more detailed analysis of the actual style changes and a
preview of the diff when I get a chance.
👍
Also, +1 to using the standard unforked Black. The 4-space indentations will break blame for our entire codebase, but I think adherence to the established standard and consistency going forward is worth it.
Running Black on our entire codebase takes 2.5 seconds on my machine.
Amendment: This is true, with caveats:
black --check . takes onlyI created #2967 so that folks can inspect the diff for themselves.
The biggest two changes are indentation and quote forms. (Black
standardizes on double quotes; this is not configurable.)
I’ve spot-checked the diff, and it seems fine to me. There are some
small discrepancies from Google style: e.g., when a class has no
docstring and starts with a method definition, Black does not insert a
newline, whereas Google style asks for one. (When there’s a docstring,
Black does add a newline, so the styles are consistent.) Nothing seems
major.
We can be confident that this change is safe because black(1) has a
safety check that verifies that the AST is identical before and after
the formatting. So the only questions are whether it has any
implications on the sync (it doesn’t appear to: http://cl/285013555)
and whether there’s anything so objectionable in the formatting that we
really can’t live with it, bearing in mind the usual caveats that having
one consistent style is far more important than what that style
actually is.
Please comment on this PR if you have comments or objections. If
everyone’s satisfied, I’ll plan to merge this on Saturday to minimize
interruptions.
Most helpful comment
Also, +1 to using the standard unforked Black. The 4-space indentations will break blame for our entire codebase, but I think adherence to the established standard and consistency going forward is worth it.