Pandas: ffill with groupby removes index on version >= 0.25.0

Created on 9 Dec 2019 · 2Comments · Source: pandas-dev/pandas

data = [[1, 2, 2],
       [1 ,2, np.nan],
       [1, 2, np.nan],
       [3,4,4],
       [3,4, np.nan], 
       [3,0, np.nan]]
data = pd.DataFrame(data, columns = ["A","B", "C"])

data.groupby(["A", "B"]).ffill().reset_index()

Problem description

Forward fill with groupby doesn't leave keys in the index (or at all).

mock data:

A | B | C
-- | -- | --
1 | 2 | 2.0
1 | 2 | NaN
1 | 2 | NaN
3 | 4 | 4.0
3 | 4 | NaN
3 | 0 | NaN

Output

index | C
-- | --
0 | 2.0
1 | 2.0
2 | 2.0
3 | 4.0
4 | 4.0
5 | NaN

Expected Output (for pandas version 0.24.2)

index | A | B | C
-- | -- | -- | --
0 | 1 | 2 | 2.0
1 | 1 | 2 | 2.0
2 | 1 | 2 | 2.0
3 | 3 | 4 | 4.0
4 | 3 | 4 | 4.0
5 | 3 | 0 | NaN

Note : data.groupby(["A", "B"]).fillna(method = 'ffill').reset_index() from pandas 0.24.2 has same behaviour as pandas version 0.25.3 (missing keys).

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.0.0-36-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.3
pytz : 2018.9
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.6
pytest : 4.3.1
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.0
xlsxwriter : 1.1.5
lxml.etree : 4.3.2
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10
IPython : 7.4.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.2
matplotlib : 3.0.3
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.1
tables : 3.5.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.5