Iterators have no len, and sometimes I forgetting it.
Bad:
len(1 for el in a if el in b)
Better:
len([1 for el in a if el in b])
in inside comprehensionsGood:
sum(1 for el in a if el in b)
Twice slower, but also ok:
sum(el in b for el in a)
Detect runtime TypeError in advance. We could also detect len from yield-like iterators, but resolving symbols in python always is a non-trivial thing, unfortunately.
Good idea!
But, we don't assume types as the best practice. mypy catches this case perfectly:
# ex.py
len(1 for el in a if el in b)
Output:
ex.py:1: error: Argument 1 to "len" has incompatible type "Generator[int, None, None]"; expected "Sized"
But, sum is the whole new case. That's about performance and the best practice.
That's something I want to have.
We can forbid to use x in y inside value part of the comprehension. And force people to write 1 or True and if x in y.
We need to measure the speed for different types with timeit
And make a decision based on the results.
From worst to best:
from random import randint
elements = list(randint(-100000, 100000) for _ in range(1000000))
%timeit sum(a > 0 for a in elements)
# 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sum(True for a in elements if a > 0)
# 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
s = 0
for a in elements:
if a > 0:
s += 1
# 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit sum(1 for a in elements if a > 0)
# 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit len([1 for a in elements if a > 0])
# 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The last one has a big std deviation. So, I'm not sure is it really the best or some optimization for repetitive list creation.
Would like to take this! Can u plz assign this to me
From worst to best:
from random import randint elements = list(randint(-100000, 100000) for _ in range(1000000)) %timeit sum(a > 0 for a in elements) # 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit sum(True for a in elements if a > 0) # 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) %%timeit s = 0 for a in elements: if a > 0: s += 1 # 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit sum(1 for a in elements if a > 0) # 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit len([1 for a in elements if a > 0]) # 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, which one should be considered as a best practice??
sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) win!
It means that it is better to use sum(1 for a in elements if a in b) then sum(a in b for a in elements). And this rule is perfectly valid.
So, I'm done with the local setup and ready to implement the changes. But, I'm quite confused about how should I approach the issue...
I suppose I should create a new visitor and violation, or is there another simpler way to do this??
Any help is appreciated, Thanks!
@ManishAradwad you need to create a new violation in best_practices, then we can create a new visitor here: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py
We need to visit:
ast.ListCompast.SetCompast.DictCompast.GeneratorExpYou can add a new method in the visitor: self._check_slow_in_expression(node)
Here's how our bad node ((a > 0 for a in elements)) looks like (just one example):
GeneratorExp(elt=Compare(left=Name(id='a', ctx=Load(), lineno=1, col_offset=4), ops=[Gt()], comparators=[Num(n=0, lineno=1, col_offset=8)], lineno=1, col_offset=4), generators=[comprehension(target=Name(id='a', ctx=Store(), lineno=1, col_offset=14), iter=Name(id='elements', ctx=Load(), lineno=1, col_offset=19), ifs=[], is_async=0)], lineno=1, col_offset=4)
Then you write the required logic, test it, and submit a PR. I am here to help.
@ManishAradwad sorry for misleading you. This is a refactoring violation.
sum(1 for a in elements if a in b)andlen([1 for a in elements if a > 0])win!
We are using both sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) to find the length of the list, right??
Then what do u mean by
It means that it is better to use
sum(1 for a in elements if a in b)thensum(a in b for a in elements). And this rule is perfectly valid.
Also are you saying that I should add the function self._check_slow_in_expression(node) in the Wrong ComprehensionVisitor??
We don't check sum or len function. We check a comprehension inside them or inside any other python code: 1 for a in elements if a in b. And yes, we check it inside WrongComprehensionVisitor
Hi @ManishAradwad, how's it going? Do you need any help?
This should be joined with https://github.com/wemake-services/wemake-python-styleguide/issues/1008
We don't check
sumorlenfunction.
I think some clarification on exactly what is forbidden is needed here. I've just read through this twice, and I'm still not clear.
Are we forbidding slow sum/len calls (in which case we do need to check the functions), or something in all comprehensions?
Because some of the bad examples don't make sense to forbid outside of a len/sum call. e.g. [a > 0 for a in elements] might be used as a sequence of True/False values, there's no reason to forbid this unless you know only the True values are actually used (e.g. in sum()).
Most helpful comment
Would like to take this! Can u plz assign this to me