Sometimes when shifting a variable by groups, if there are NaNs in the group column, it crashes my kernel. Sometimes the operation completes successfully, though it crashes over half the time.
from numpy import nan
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(data = [
(Timestamp('2003-01-15 00:00:00'), 1, 1),
(Timestamp('2003-01-15 00:00:00'), nan, nan),
(Timestamp('2003-02-14 00:00:00'), 1, 2),
], columns=['Date','ID','var'])
test.groupby('ID')['var'].shift(1) #crashes kernel sometimes
test.dropna(subset=['ID']).groupby('ID')['var'].shift(1) #does not crash kernel
test.groupby('ID')['var'].apply(lambda x: x) #does not crash kernel
0 NaN
1 NaN
2 1.0
Name: var, dtype: float64
pd.show_versions()INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 24.0.2
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
this is a dupe of #13813 fixed in #13819 and will be in 0.19.0
Thanks, I didn't find it because I only searched open issues... my mistake.
I have all the latest versions as of today and kernel crashes because one of the records in the groupby has a nan. Had to go around it filtering first with .nonull()
@rojour if you are having an issue, you can create a new issue with a reproducible example.