Pandas: Using agg with groupy, as_index=False still returning group variable as index

Created on 29 Jan 2019  路  4Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

Code sample:

# Import packages
import pandas as pd
import numpy as np
# Set up test DataFrame
test_array = np.arange(50) + 100
test_matrix = test_array.reshape((10,5))
test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
test_df.loc[0:5,'shouldnt be index'] = 3
test_df.loc[5:8,'shouldnt be index'] = 4
# groupby and agg
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
print(end_result)

execution:

>>> # Import packages
... import pandas as pd
>>> import numpy as np
>>> # Set up test DataFrame
... test_array = np.arange(50) + 100
>>> test_matrix = test_array.reshape((10,5))
>>> test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
>>> # Make groupby data more grouped
... test_df.loc[0:5,'shouldnt be index'] = 3
>>> test_df.loc[5:8,'shouldnt be index'] = 4
>>> # groupby and agg
... end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
>>> print(end_result)
                     1               2               3               4
                   min  max count  min  max count  min  max count  min  max count
shouldnt be index
3                  101  121     5  102  122     5  103  123     5  104  124     5
4                  126  141     4  127  142     4  128  143     4  129  144     4
145                146  146     1  147  147     1  148  148     1  149  149     1

Problem description

I'm trying to use groupby with as_index=False and then do an aggregate statement. I included an example where the groupby variable ends up as an index rather than staying as a column. My understanding is that this should result in the groupby variable being a column and not an index (as below), but perhaps I am mistaken.

This is my first time creating an issue, so my apologies if this is operator error or I didn't include important information. Please let me know if this is the case.

Maybe this is related to #22546?

Expected Output

You can see what the result should be when using "reset_index"

>>> end_result.reset_index()
  shouldnt be index    1               2               3               4
                     min  max count  min  max count  min  max count  min  max count
0                 3  101  121     5  102  122     5  103  123     5  104  124     5
1                 4  126  141     4  127  142     4  128  143     4  129  144     4
2               145  146  146     1  147  147     1  148  148     1  149  149     1

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0
pytest: 3.8.0
pip: 19.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Duplicate

Most helpful comment

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

All 4 comments

Thanks for the report. I think the problem here is a conflict between the as_index keyword and how we are piecing together the result of multiple agg function applications.

Specifically, this is fine:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg(min) 

but this would reproduce the error you are seeing:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg([min])

Investigation and PRs would certainly be welcome

Curious if there is any update on this. I just ran into this issue. I appreciate all the work being done on this great project!

closing as duplicate of #13217. ping me if I'm missing something.

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

Was this page helpful?
0 / 5 - 0 ratings