Wemake-python-styleguide: Forbid slow `in` inside comprehensions

Created on 9 Oct 2019 · 16Comments · Source: wemake-services/wemake-python-styleguide

Rule request

Thesis

len with comprehensions

Iterators have no len, and sometimes I forgetting it.

Bad:

len(1 for el in a if el in b)

Better:

len([1 for el in a if el in b])

`in` inside comprehensions

Good:

sum(1 for el in a if el in b)

Twice slower, but also ok:

sum(el in b for el in a)

Reasoning

Detect runtime TypeError in advance. We could also detect len from yield-like iterators, but resolving symbols in python always is a non-trivial thing, unfortunately.

Hacktoberfest help wanted starter rule request

Source

orsinium

Most helpful comment

Would like to take this! Can u plz assign this to me

ManishAradwad on 9 Oct 2019

👍2

All 16 comments

Good idea!

But, we don't assume types as the best practice. mypy catches this case perfectly:

# ex.py
len(1 for el in a if el in b)

Output:

ex.py:1: error: Argument 1 to "len" has incompatible type "Generator[int, None, None]"; expected "Sized"

But, sum is the whole new case. That's about performance and the best practice.
That's something I want to have.

We can forbid to use x in y inside value part of the comprehension. And force people to write 1 or True and if x in y.

sobolevn on 9 Oct 2019

We need to measure the speed for different types with timeit
And make a decision based on the results.

sobolevn on 9 Oct 2019

From worst to best:

from random import randint 
elements = list(randint(-100000, 100000) for _ in range(1000000))  

%timeit sum(a > 0 for a in elements) 
# 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(True for a in elements if a > 0) 
# 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = 0
for a in elements:
  if a > 0:
    s += 1
# 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(1 for a in elements if a > 0)  
# 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit len([1 for a in elements if a > 0])  
# 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

orsinium on 9 Oct 2019

The last one has a big std deviation. So, I'm not sure is it really the best or some optimization for repetitive list creation.

orsinium on 9 Oct 2019

Would like to take this! Can u plz assign this to me

ManishAradwad on 9 Oct 2019

👍2

From worst to best:

from random import randint 
elements = list(randint(-100000, 100000) for _ in range(1000000))  

%timeit sum(a > 0 for a in elements) 
# 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(True for a in elements if a > 0) 
# 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = 0
for a in elements:
  if a > 0:
    s += 1
# 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(1 for a in elements if a > 0)  
# 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit len([1 for a in elements if a > 0])  
# 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So, which one should be considered as a best practice??

ManishAradwad on 13 Oct 2019

sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) win!

It means that it is better to use sum(1 for a in elements if a in b) then sum(a in b for a in elements). And this rule is perfectly valid.

sobolevn on 13 Oct 2019

So, I'm done with the local setup and ready to implement the changes. But, I'm quite confused about how should I approach the issue...
I suppose I should create a new visitor and violation, or is there another simpler way to do this??
Any help is appreciated, Thanks!

ManishAradwad on 19 Oct 2019

@ManishAradwad you need to create a new violation in best_practices, then we can create a new visitor here: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py

We need to visit:

ast.ListComp
ast.SetComp
ast.DictComp
ast.GeneratorExp

You can add a new method in the visitor: self._check_slow_in_expression(node)

Here's how our bad node ((a > 0 for a in elements)) looks like (just one example):

GeneratorExp(elt=Compare(left=Name(id='a', ctx=Load(), lineno=1, col_offset=4), ops=[Gt()], comparators=[Num(n=0, lineno=1, col_offset=8)], lineno=1, col_offset=4), generators=[comprehension(target=Name(id='a', ctx=Store(), lineno=1, col_offset=14), iter=Name(id='elements', ctx=Load(), lineno=1, col_offset=19), ifs=[], is_async=0)], lineno=1, col_offset=4)

Then you write the required logic, test it, and submit a PR. I am here to help.

sobolevn on 19 Oct 2019

👍1

@ManishAradwad sorry for misleading you. This is a refactoring violation.

sobolevn on 19 Oct 2019

sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) win!

We are using both sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) to find the length of the list, right??

Then what do u mean by

It means that it is better to use sum(1 for a in elements if a in b) then sum(a in b for a in elements). And this rule is perfectly valid.

ManishAradwad on 20 Oct 2019

Also are you saying that I should add the function self._check_slow_in_expression(node) in the Wrong ComprehensionVisitor??

ManishAradwad on 20 Oct 2019

We don't check sum or len function. We check a comprehension inside them or inside any other python code: 1 for a in elements if a in b. And yes, we check it inside WrongComprehensionVisitor

sobolevn on 20 Oct 2019

Hi @ManishAradwad, how's it going? Do you need any help?

sobolevn on 4 Nov 2019

This should be joined with https://github.com/wemake-services/wemake-python-styleguide/issues/1008

sobolevn on 15 Nov 2019

We don't check sum or len function.

I think some clarification on exactly what is forbidden is needed here. I've just read through this twice, and I'm still not clear.

Are we forbidding slow sum/len calls (in which case we do need to check the functions), or something in all comprehensions?

Because some of the bad examples don't make sense to forbid outside of a len/sum call. e.g. [a > 0 for a in elements] might be used as a sequence of True/False values, there's no reason to forbid this unless you know only the True values are actually used (e.g. in sum()).

Dreamsorcerer on 8 Feb 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Forbid too long call chains

sobolevn · 4Comments

WPS115 (Found upper-case constant in a class) in enum.Enum.

orsinium · 5Comments

Add wemake-python-styleguide to awesome-python

sobolevn · 3Comments

[docs] Violation set of pages is broken: code numbers missing

webknjaz · 3Comments

Forbid to use ternary inside `if` clause