Black: comprehensions poor formatting

Created on 4 Mar 2019  路  4Comments  路  Source: psf/black

black 18.9b

class Test:
    def test_list_fields(self, big, big_j, param):
        assert [asdict(y) for x in big for y in getattr(x, param)] == [
            y for x in big_j for y in x[param]
        ]

I don't know if something better can be done but thats not very readable

design

Most helpful comment

I appreciate that this is a complex topic, with lots of edge cases, and I'm sure opinions abound. I'll also pre-declare that I'm not familiar with the internals of black, so if there are existing terms of art that I'm missing, I apologize.

However, I humbly submit a proposal:

Proposal

Comprehensions should always be treated as multiline expressions, analogous to long lists/dicts that won't fit on a single line, with each "clause" in the expression being an item in the list. By this rule:

[A for C in D if E]
{A for C in D for E in F if G}
{A: B for C in D if E]

becomes

[
    A 
    for C in D 
    if E
]
{
    A 
    for C in D 
    for E in F
    if G
}
{
    A: B 
    for C in D 
    if E
]

The only exception to this would be if the comprehension is "simple"; "simple" expressions would be collapsed to a single line of code, formatted as originally presented; or, as

[
    A for C in D if E
]

if necessary given the context in which the comprehension is being used.

I appreciate that "simple" is a bit of a handwave-y statement; but as a starting point for discussion, I'll assert a comprehension ceases to be "simple" if:

  • It has more than one attribute access in any single term of the overall expression
  • It has more than one operator in any single term of the overall expression
  • A single term in the expression contains both an attribute access and an operator
  • The entire expression has a length exceeding 40 chars
  • It has an if clause
  • It has more than 1 for clause

I won't claim this is a perfect definition of "simple", but it's a starting point.

Using this strategy, the example from the original post would become:

class Test:
    def test_list_fields(self, big, big_j, param):
        assert [
            asdict(y) 
            for x in big 
            for y in getattr(x, param)
        ] == [
            y 
            for x in big_j 
            for y in x[param]
        ]

Yes, this uses a lot more lines of code. However, I would also argue that it is a very dense and complex expression, and the extra whitespace significantly improves readability (although I'll acknowledge that is subjective)

By the rule I've expressed here, if the "for x" clauses weren't needed, both sides would be considered "simple" comprehensions - and the entire statement would fit on one line.

        assert [asdict(y) for y in getattr(x, param)] == [y for y in x[param]]

Underlying reasoning

One of the goals of Black is to produce small, easily readable diffs on future edits of the code. The simplest list/set comprehension is a dense expression containing at least 3 sub-expressions; the simplest dict comprehension has at least 4 sub-expressions. Each one of those sub-expressions could potentially be the subject of a meaningful code change. If there is any complexity in those terms, the code (and, by extension, a diff of that code) immediately becomes harder to read.

A key-value reverse comprehension:

{v: k for k, v in mydata.items()}

is nearing the simplest possible dictionary expression you can write, but to my eyes is also nearing the practical limit of what can be reasonably consumed as a single line (especially if that expression is inlined as an argument to a function or something similar).

The slightly more complex expression:

{v.upper(): k.upper() for k, v in mydata.items()}

would exceed the 40 character "complexity" limit, and be transformed into:

{
    v.upper(): k.upper() 
    for k, v in mydata.items()
}

This approach doesn't require any analysis of "symmetry" or similar aesthetic concerns. It only aims to recognize that dense comprehensions are, in fact, complex statements, and a little whitespace can considerably improve readability.

All 4 comments

I agree this does not look great. However, there's no way for Black to handle all those situations correctly without making it way more complex than it is today. It would need to start understanding concepts like "symmetry" to format this better automatically.

If I tweak the current heuristic to solve this particular situation better, it will suck for other kinds of asserts. I will look into this but this is hard to get right.

If the auto-formatter fails to format very complex expressions, I'd suggest maybe refactoring it slightly:

class Test:
    def test_list_fields(self, big, big_j, param):
        expected = [asdict(y) for x in big for y in getattr(x, param)]
        actual = [y for x in big_j for y in x[param]]
        assert expected == actual

I'm not sure whether this is possible, but this transformation looks very pretty

items = [item for iterable in iterables for item in iterable if item < 5]

To

items = [
    item 
    for iterable in iterables 
        for item in iterable 
            if item < 5
]

The lack of formatting for comprehensions is my sole pain using black.
As comprehensions are such central feature of python, formatting them is an absolute must.
Hence please make this the top priority for new features to implement.

On the implementation

Simple comprehensions, i.e. those not featuring conditionals or nesting are fine just being put on single line. However, once you deal with complex comprehensions proper formatting becomes vital to maintain readability.

The formatting convention suggests putting statements on separate line, like:

```py
values = [
expression
for value in collection
if condition
]
````

For nested comprehensions indention like the one suggested by @devxpy in his comment certainly also helps.

It might also be worthwhile considering keeping each level of iteration on the same level indentation.
Modifying devxpy 's example would then give us:

py items = [ item for iterable in iterables for item in iterable if item < 5 ]

I appreciate that this is a complex topic, with lots of edge cases, and I'm sure opinions abound. I'll also pre-declare that I'm not familiar with the internals of black, so if there are existing terms of art that I'm missing, I apologize.

However, I humbly submit a proposal:

Proposal

Comprehensions should always be treated as multiline expressions, analogous to long lists/dicts that won't fit on a single line, with each "clause" in the expression being an item in the list. By this rule:

[A for C in D if E]
{A for C in D for E in F if G}
{A: B for C in D if E]

becomes

[
    A 
    for C in D 
    if E
]
{
    A 
    for C in D 
    for E in F
    if G
}
{
    A: B 
    for C in D 
    if E
]

The only exception to this would be if the comprehension is "simple"; "simple" expressions would be collapsed to a single line of code, formatted as originally presented; or, as

[
    A for C in D if E
]

if necessary given the context in which the comprehension is being used.

I appreciate that "simple" is a bit of a handwave-y statement; but as a starting point for discussion, I'll assert a comprehension ceases to be "simple" if:

  • It has more than one attribute access in any single term of the overall expression
  • It has more than one operator in any single term of the overall expression
  • A single term in the expression contains both an attribute access and an operator
  • The entire expression has a length exceeding 40 chars
  • It has an if clause
  • It has more than 1 for clause

I won't claim this is a perfect definition of "simple", but it's a starting point.

Using this strategy, the example from the original post would become:

class Test:
    def test_list_fields(self, big, big_j, param):
        assert [
            asdict(y) 
            for x in big 
            for y in getattr(x, param)
        ] == [
            y 
            for x in big_j 
            for y in x[param]
        ]

Yes, this uses a lot more lines of code. However, I would also argue that it is a very dense and complex expression, and the extra whitespace significantly improves readability (although I'll acknowledge that is subjective)

By the rule I've expressed here, if the "for x" clauses weren't needed, both sides would be considered "simple" comprehensions - and the entire statement would fit on one line.

        assert [asdict(y) for y in getattr(x, param)] == [y for y in x[param]]

Underlying reasoning

One of the goals of Black is to produce small, easily readable diffs on future edits of the code. The simplest list/set comprehension is a dense expression containing at least 3 sub-expressions; the simplest dict comprehension has at least 4 sub-expressions. Each one of those sub-expressions could potentially be the subject of a meaningful code change. If there is any complexity in those terms, the code (and, by extension, a diff of that code) immediately becomes harder to read.

A key-value reverse comprehension:

{v: k for k, v in mydata.items()}

is nearing the simplest possible dictionary expression you can write, but to my eyes is also nearing the practical limit of what can be reasonably consumed as a single line (especially if that expression is inlined as an argument to a function or something similar).

The slightly more complex expression:

{v.upper(): k.upper() for k, v in mydata.items()}

would exceed the 40 character "complexity" limit, and be transformed into:

{
    v.upper(): k.upper() 
    for k, v in mydata.items()
}

This approach doesn't require any analysis of "symmetry" or similar aesthetic concerns. It only aims to recognize that dense comprehensions are, in fact, complex statements, and a little whitespace can considerably improve readability.

Was this page helpful?
0 / 5 - 0 ratings