Pylint: len-as-sequence with numpy.arrays

Created on 12 Feb 2018  路  7Comments  路  Source: PyCQA/pylint

For

import numpy

a = numpy.array([0])

if len(a) > 0:
    print('a')

if a:
    print('b')

pylint suggests

C:  5, 3: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)

However, there _is_ a difference between those two conditions above; that's because numpy.array([0]) evaluates to False. (As opposed to numpy.array([1.121]), for example.)

Now, pylint cannot know everything about all Python modules, but numpy is so popular and the above behavior so devious that I'm wondering if len-as-condition should be removed entirely. In any case, the warning message should explicitly except numpy arrays.

bug false-positive

Most helpful comment

Closing, there should no longer false positive following #3821. However, generator and list comprehension inside functions are now "false negative" that would require an evolution in astroid to be fixed, the follow up issue is: #4002

All 7 comments

Another issue with len-as-condition is with pandas which actually throws a ValueError when you do if pd.Series(): pass but if len(pd.Series()): pass is fine

I'm thinking pylint should only apply this check for builtin containers such as dictionaries and lists and sets

Feel free to disable the rule locally for such constructs. If a rule works most of the time but there are libraries such as numpy that decided to have a different behaviour, then this is not an argument against the rule, but against outliers in behaviour such as these libraries.

If a rule works most of the time but there are

That's what I meant: Numpy isn't just "a library" -- it's so immensely popular that I have problems naming a Python software that _doesn't_ use it. One can disable the rule alright if you know about the behavior, but I'd argue that most people don't. Suggesting to replace len(arr)>0 without any warning about numpy might do the community a disservice.

To be fair, pylint doesn't recommend itself as being one size fits it all. If something does not work for a particular library, it's okay to just disable the check in that area of the code of across the entire project. Supporting numpy's idiosyncrasies might be worth it, but this can get easily out of hand if we're talking about adapting every rule for every popular library out there. Also adapting a check is a completely different topic than completely removing it, as your initial comment suggested, I wouldn't go as far as removing checks just for satisfying irregularities of other software.

Sorry to revive an old discussion, but I guess this is better than opening a new issue. This was the MWE I intended to post:

"""Demonstrate inappropriate len() warning for Pandas data frames warning."""
import pandas as pd

df = pd.DataFrame()

if len(df):
    print("this works, but pylint tells me not to use len() without comparison")

if len(df) > 0:
    print("this works and pylint likes it, but it's not the solution intended by PEP-8")

if df:
    print("this does not work (truth value of dataframe is ambiguous)")

On additional argument (to the ones pointed out above) would be that PEP 8 says

For sequences, (strings, lists, tuples), use the fact that empty sequences are false:

So while pylint's behavior may be in line with PEP 8 for these types, I don't see PEP 8 recommend applying the same rule to other types, so pylint may be extending its unnecessarily to other libraries. A point to support this may be that the list of sequence types is well-defined elsewhere: https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range
So one solution would be to enable the warning only for variables that pylint is certain are of the above type. But I can see why one would not want that.

Another solution would be for pylint to disable the check if the object has a __bool__() method, which is a pretty strong indicator for "the fact that empty sequences are false" to be wrong (because if that was intended, the developer could have achieved that by implementing only a __len___() method). This way, pylint would apply the warning to as many types as reasonably possible; and would be able to not show a warning for if len(df) or if len(array) because it understands that if df and if array mean something else, and because it knows that if len(x) > 0 is not intended. Would this be doable?

Hello, thank you for bringing new arguments to the table @bersbersbers.

I'm reopening for three reasons:

  • It is true that numpy/pandas are very popular, so I suppose that this is a false positive that is affecting a lot of users. And we want to make pylint friendlier for newcomers with less false positives as discussed in #3512.
  • Disabling len-as-condition is certainly not something that I want to do as it make the code a lot clearer in code base from ex-C/C++ developers
  • Even if we don't want to maintain specific libraries (that would be a slippery slope), the proposed solution of checking for an implementation of __bool__ is not specific to numpy and seems sensible.

This might require change in astroid for libs that use system libraries like numpy.

Closing, there should no longer false positive following #3821. However, generator and list comprehension inside functions are now "false negative" that would require an evolution in astroid to be fixed, the follow up issue is: #4002

Was this page helpful?
0 / 5 - 0 ratings