df = pd.DataFrame([[0,1],[0,1],[0,2],[1,1]], columns=['a','b'])
df
a b
0 0 1
1 0 1
2 0 2
3 1 1
df.groupby('a').b.value_counts().reset_index()
ValueError: cannot insert b, already exists
#### Expected Output
In version 0.18.0, the output was:
a b 0
0 0 1 2
1 0 2 1
2 1 1 1
dtype: int64
The difference is that now the groupby() value_counts() operation returns a Series named equivalently to the column on which value_counts() was computed.
df.groupby('a').b.value_counts()
0.18.0
a b
0 1 2
2 1
1 1 1
dtype: int64
0.18.1 (including 0.18.1+367.g6b7857b)
a b
0 1 2
2 1
1 1 1
Name: b, dtype: int64
This change in behavior is not completely unexpected given that outside of groupby(), value_counts() has historically returned a Series named equivalently to the column the operation was performed on:
df.a.value_counts()
0 3
1 1
Name: a, dtype: int64
A manual workaround would be to rename the Series before reset_index() as follows:
g = df.groupby('a').b.value_counts()
g.name = 0
g.reset_index()
a b 0
0 0 1 2
1 0 2 1
2 1 1 1
However, the one-line functionality was much appreciated. Being able to pass a new name to value_counts() could solve this issue?
#### output of `pd.show_versions()`
INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 pandas: 0.18.1 (also verified with 0.18.1+367.g6b7857b) nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.0 statsmodels: None xarray: None IPython: 5.0.0 sphinx: None patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: 3.2.3.1 numexpr: 2.6.1 matplotlib: 1.5.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: None
Probably a result of https://github.com/pydata/pandas/issues/12363 fixing groupby sometimes losing the name.
In this case I'd say that
In [37]: df.groupby('a').b.value_counts().reset_index(name='counts')
Out[37]:
a b counts
0 0 1 2
1 0 2 1
2 1 1 1
is even clearer than your original. Thoughts?
Even better, thanks!
Most helpful comment
Probably a result of https://github.com/pydata/pandas/issues/12363 fixing groupby sometimes losing the name.
In this case I'd say that
is even clearer than your original. Thoughts?