Black: Performance optimization discussion

Created on 21 Jun 2018  路  7Comments  路  Source: psf/black

This is similar to #349, and #109. I am unfortunate enough to deal with 100+ KB python files, on
a system where black runs at only 60KB/s. For the same reason as in #349, black is not a viable save hook, except that here the problem is single-file throughput, not startup time.

[Low throughput is common to the other python formatters that I've tried (yapf, autopep8). Profiling immediately reveals that, while much of the time is spent by other libraries (esp in lib2to3_parse), but at least 10-20% falls within black itself. There are specific solutions to the save hook performance problem (e.g., by detecting which parts of the file have changed and only reformatting those), but they would all make the program logic much more complicated.]

Would it be feasible to stop using a single-file approach for black and progressively move stable, basic types (like Line or BracketTracker) to a compiled library which is reachable by the main code through Python's C API? This does complicate installation, but would finally provide a path to speed comparable with e.g. gofmt and clang-format.

design question

Most helpful comment

I have a draft branch that makes black able to compile and run with mypyc: https://github.com/msullivan/black/tree/mypyc

(It requires https://github.com/python/mypy/pull/7481 to land pickling support in mypyc before it works right)

All 7 comments

Raw Python C API is not a silver bullet. We'd have to rewrite parts of the program in straight C and since most of it operates on blib2to3.pytree, we'd need that in C, too.

My not-so-secret plan is to cythonize blib2to3 and go from there. But that's something I'll get to after Black becomes more stable and most issues currently open are solved.

In the mean time, Black does safety checks that other formatters don't. If you haven't encountered any problems, use --fast which is over 2X faster than --safe.

I ran black on a big file at work and was confused as to why it's so slow when the file needs many changes and so fast when there are no changes? Naively one would think you load a full syntax AST, run some transformers, dump it and if the new string is different from the old you write it. If this was the case then it should be roughly the same for many changes and no changes... So where is my naive thinking incorrect?

There is a cache. If the file hasn't changed since the last time we ran Black, we don't do any processing.

It would be amazing to compile Black with @mypyc/mypyc. If I understand correctly, @msullivan is interested in making it happen. FTR, if this will require changes to the Black codebase, I'm open to that as long as they are not sweeping or overly disruptive.

I can start sending up PRs to support mypyc in the next week building on work done by my intern from last summer @SanjitKal

The most disruptive part of doing this is that blib2to3 needs to have its type annotations merged into it so it can be typechecked and compiled. If we are doing that, do we want to blacken the blib2to3 source?

I have a draft branch that makes black able to compile and run with mypyc: https://github.com/msullivan/black/tree/mypyc

(It requires https://github.com/python/mypy/pull/7481 to land pickling support in mypyc before it works right)

Thanks so much for working on this @msullivan.

I say we should blacken blib2to3 too if we are going to add annotations to it anyway. @zsol @ambv do you agree?

Was this page helpful?
0 / 5 - 0 ratings