Pandas: BUG: skipna parameter in series.any() returns wrong result

Created on 12 Oct 2018  Â·  6Comments  Â·  Source: pandas-dev/pandas

#importing pandas module 
import pandas as pd 

#importing numpy module
import numpy as np

data=pd.DataFrame({'A':[1,2,3,4,0,np.nan,3],
                  'B':[3,1,4,5,0,np.nan,5]})

data.any(axis=1,skipna=True)

Expected output:
0 True
1 True
2 True
3 True
4 False
5 True
6 True
dtype: bool

Returned output:

0 True
1 True
2 True
3 True
4 False
5 False
6 True
dtype: bool

As written in documentation, If an entire row/column is NA, the result will be NA
But NA isn't returned in any of the cases (Keeping skipna True or False)

Docs good first issue

All 6 comments

As written in documentation, If an entire row/column is NA, the result will be NA

I think the docs at https://github.com/pandas-dev/pandas/blob/12a0dc49ac63b63bcc5f93bccf71bce51c60bdad/pandas/core/generic.py#L9728-L9730 are incorrect. Skipna should be the same as the operation on the values with NAs removed (is that right @jorisvandenbossche?).

Isn't this a better way? ↓
If skipana is None which is default then it returns NA on whole NaN rows/column
If True/False then return True/False respectively for whole NA

Because if the docs are incorrect then there is probably no way to return NA for Null values

min_count I thikn

On Fri, Oct 12, 2018 at 11:25 AM Kartikay Bhutani notifications@github.com
wrote:

Isn't this a better way? ↓
If skipana is None which is default then it returns NA on whole NaN
rows/column
If True/False then return True/False respectively for whole NA

Because if the docs are incorrect then there is probably no way to return
NA for Null values

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/23109#issuecomment-429382736,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIhLJLLq1TczRspP-JDekrFBh8mKiks5ukMJtgaJpZM4XZlWL
.

My opinion: I think the problem is the documentation; the result is actually correct. If you ask if any of an empty set of statements is True, the answer is no. This is consistent with numpy:

In [1]: import numpy as np

In [2]: np.any([])
Out[2]: False

Skipna should be the same as the operation on the values with NAs removed (is that right @jorisvandenbossche?).

I suppose this as well. any/all can be seen as reductions like sum or prod, so we should probably follow their design.

So I think @dsaxton is right that it is only the documentation that is incorrect.

@jorisvandenbossche What would you say is the appropriate fix for this? If the documentation is a general statement about the skipna parameter, maybe it makes sense to just remove the claim that the result will be NA (since it's not true for any, but presumably would be true in other contexts)?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

songololo picture songololo  Â·  3Comments

swails picture swails  Â·  3Comments

mfmain picture mfmain  Â·  3Comments

matthiasroder picture matthiasroder  Â·  3Comments

ericdf picture ericdf  Â·  3Comments