Pandas: mean with skipna either True or False on groupby gives error

Created on 14 Mar 2020  Â·  3Comments  Â·  Source: pandas-dev/pandas

Problem description

Using .mean(skipna = True) or .mean(skipna = False) on a groupby object gives error:

UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).mean() instead

skipna is a crucial parameter while doing analysis. By default skipna = True, so if not given explicitly results in expected output but if given explicitly either as True or False, returns error.

Code Sample:

import pandas as pd
df = pd.DataFrame({'elements':[144,214,166,166,145,144,214],
                  'points':[1,2,3,None,1,1,1]})

df.groupby('elements').mean() ###this works
df.groupby('elements').mean(skipna = True) ###this doesn't work
df.groupby('elements').mean(skipna = False) ###this doesn't work

Expected Output

with mean(skipna = True):

elements  points
144       1.0
145       1.0
166       3.0
214       1.5

with mean(skipna = False):

elements  points
144       1.0
145       1.0
166       NaN
214       1.5

Workaround I tried which gives expected output:

def custom_mean(df):
    return df.mean(skipna=False)
df.groupby('elements').agg({'points':custom_mean})

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0.post20200309
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.5.4
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0

Quite similar to issue #19806

Duplicate Groupby Reductions

Most helpful comment

Quick workaround

One possible alternative to df.groupby('elements').mean(skipna = False):

>>> df.replace(np.nan, np.inf).groupby('elements').mean().replace(np.inf, np.nan)

          points
elements        
144          1.0
145          1.0
166          NaN
214          1.5

Warning: Preexisted np.inf elements will be lost.

More solutions here: https://stackoverflow.com/q/54106112/

More examples related to this problem

groupby() + max() + skipna=False

skipna=False – no errors, just ignored:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).max(skipna=False)

A    11.0
B    22.0
dtype: float64

Expected output:

A    NaN
B    NaN
dtype: float64

groupby() + median() + skipna

skipna works as expected:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=False)

A    NaN
B    NaN
dtype: float64

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=True)

A    11.0
B    22.0
dtype: float64

All 3 comments

Quick workaround

One possible alternative to df.groupby('elements').mean(skipna = False):

>>> df.replace(np.nan, np.inf).groupby('elements').mean().replace(np.inf, np.nan)

          points
elements        
144          1.0
145          1.0
166          NaN
214          1.5

Warning: Preexisted np.inf elements will be lost.

More solutions here: https://stackoverflow.com/q/54106112/

More examples related to this problem

groupby() + max() + skipna=False

skipna=False – no errors, just ignored:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).max(skipna=False)

A    11.0
B    22.0
dtype: float64

Expected output:

A    NaN
B    NaN
dtype: float64

groupby() + median() + skipna

skipna works as expected:

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=False)

A    NaN
B    NaN
dtype: float64

>>> pd.Series([11, np.nan, 22, np.nan], index=iter('AABB')) \
        .groupby(level=0).median(skipna=True)

A    11.0
B    22.0
dtype: float64

The skipna keyword is indeed not yet implemented for groupby reductions. An enhancement request to add this is covered in #15675, so going to close this issue as a duplicate.

Duplicate of #15675

Was this page helpful?
0 / 5 - 0 ratings