Pylint: Checker to replace 'any/all(not condition)' by 'not any/all(condition)' when 'not condition' is slower than 'condition'

Created on 24 Feb 2021  路  7Comments  路  Source: PyCQA/pylint

The following function are equivalent but one is faster because it cut the execution tree at the first not instead of checking all the values.

import time
from timeit import timeit


def not_all(n, step=1):
    if not all(range(-n, n, step)):
        ...


def any_not(n, step=1):
    if any(x for x in range(-n, n, step)):
        ...


assert not_all(3, 1) == any_not(3, 1)
assert not_all(3, 2) == any_not(3, 2)
assert not_all(5, 1) == any_not(5, 1)

start = time.time()
for i in range(20):
    any_not(1000000)
end = time.time()
print(end - start)
start = time.time()
for i in range(20):
    not_all(1000000)
end = time.time()
print(end - start)

print(
    timeit(
        "import numpy as np; arr = np.ones(int(1e8)); arr[3]=0; np.any(arr<=0)",
        number=10,
    )
)
print(
    timeit(
        "import numpy as np; arr = np.ones(int(1e8)); arr[3]=0; np.all(arr>0)",
        number=10,
    )
)

Results:

1.52587890625e-05
0.28366875648498535
2.1408786789979786
1.6885079700005008

Notice that changing from one generator to two generators with a casting to bool is slower but just changing the condition is faster.

Describe the solution you'd like

New checker for detecting not all(values) and any(not v for v in values) to advise to convert it to the most efficient version of the check.

checkers proposal

All 7 comments

After thinking about it it seems the example I gave work because of the dataset used, not all(...) is not inherently faster than any(not ...) (unless you know something about the likeliness of something being False or True).

After thinking about it it seems the example I gave work because of the dataset used, not all(...) is not inherently faster than any(not ...) (unless you know something about the likeliness of something being False or True).

Not exactly, there is a performance difference although you need a dataset with a lot of True values before the first False one. The reason is that not all(...) calls UNARY_NOT only once whereas any(not ...) calls it on every value.

This example illustrates that

lst = [True] * 99999 + [False]

def not_all_2():
    it = (v for v in lst)
    return not all(it)

def any_not_2():
    it = (not v for v in lst)
    return any(it), it

start = time.time()
for i in range(1000):
    any_not_2()
end = time.time()
print(end - start)
start = time.time()
for i in range(1000):
    not_all_2()
end = time.time()
print(end - start)

The result on my system

4.846180200576782
3.75608229637146

not all(...) is faster, albeit not by much.

So interestingly, there is a performance difference, just in the opposite direction than originally suggested.
I guess it is probably not worth an extra check though.

The number of call to UNARY_NOT makes a lot of sense, thanks for pointing that out. If we makes a checker for literal any(not x for x in iterator) or all(not x for x in iterator) then there is probably a lot of case where reversing the conditional (from > to <= for example) is possible, and then the difference becomes a (insignificant?) single UNARY_NOT聽call.

It is not exactly the original question, but perhaps, some time could also be gained if pylint could suggest one to use generators instead of evaluating a condition on a whole sequence and then running all or any? The gain could be significant if condition evaluation takes computation effort, and this kind of check could possibly be automated?

The number of call to UNARY_NOT makes a lot of sense, thanks for pointing that out. If we makes a checker for literal any(not x for x in iterator) or all(not x for x in iterator) then there is probably a lot of case where reversing the conditional (from > to <= for example) is possible, and then the difference becomes a (insignificant?) single UNARY_NOT聽call.

@Pierre-Sassoulas I'm not sure I understand you correctly, but maybe it helps if I explain it a bit differently. The not all(...) case is usually faster, for small lists it's basically random but for large ones (where the one False entry is at the end) it's noticeable.

For the example from above lst = [True] * 99999 + [False] it would mean that not all(...) only does one UNARY_NOT call at the end, whereas for any(not ...) Python would do one call for each list entry (so about 100.000). In the example the UNARY_NOT operation is quite fast and thus the difference is barely noticeable. There might however be cases where the __bool__ isn't just a variable load and therefore takes longer. In these even smaller lists would see an impact.

Besides the performance impact, I find the not all(...) call easier to reason about (although that is personal preference).
In conclusion, there would be a value in adding a checker for it.

It is not exactly the original question, but perhaps, some time could also be gained if pylint could suggest one to use generators instead of evaluating a condition on a whole sequence and then running all or any? The gain could be significant if condition evaluation takes computation effort, and this kind of check could possibly be automated?

@Alexander-Serov There was a checker added for just this case, recently.
I believe it is included 2.7.0: consider-using-generator (R1728) - Documentation related MR : https://github.com/PyCQA/pylint/pull/3309

Was this page helpful?
0 / 5 - 0 ratings

Related issues

glmdgrielson picture glmdgrielson  路  3Comments

sambarluc picture sambarluc  路  3Comments

jrial picture jrial  路  3Comments

GergelyKalmar picture GergelyKalmar  路  3Comments

pylint-bot picture pylint-bot  路  3Comments