Describe the bug
cudf.DataFrame().apply() cannot produce result, issue detail please see the reproducer code below
Steps/Code to reproduce bug
issue_df = cudf.DataFrame({'A':[1,1,2,2,3,3,4,4,5,5],
'B':[0.01,np.nan,0.03,0.04,np.nan,0.06,0.07,0.08,0.09,1.0]})
print(issue_df)
A B
0 1 0.01
1 1 <NA>
2 2 0.03
3 2 0.04
4 3 <NA>
5 3 0.06
6 4 0.07
7 4 0.08
8 5 0.09
9 5 1.0
def map_fun(x):
x = x[~x['B'].isna()]
ticker = x.shape[0]
num = 10
full = ticker / num
print(full)
return full
result = issue_df.groupby('A').apply(lambda x: map_fun(x))
0.1
0.2
0.1
0.2
0.2
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-146-f9f92077b053> in <module>
7 return full
8
----> 9 result = issue_df.groupby('A').apply(lambda x: map_fun(x))
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/groupby/groupby.py in apply(self, function)
362 grouped_values[s:e] for s, e in zip(offsets[:-1], offsets[1:])
363 ]
--> 364 result = cudf.concat([function(chk) for chk in chunks])
365 if self._sort:
366 result = result.sort_index()
/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/reshape.py in concat(objs, axis, ignore_index, sort)
199 typs.add(cudf.Series)
200 else:
--> 201 raise ValueError(f"cannot concatenate object of type {type(o)}")
202
203 allowed_typs = {cudf.Series, cudf.DataFrame}
ValueError: cannot concatenate object of type <class 'float'>
Expected behavior
should give me the new dataframe, correct result can be produced by pandas
Environment overview (please complete the following information)
docker pull & docker run commands usedEnvironment details
Environment location: Docker
Method of cuDF install: Docker
If method of install is [Docker], provide docker pull & docker run commands used
sudo docker run --gpus all -it -v /data/project/demo/:/rapids/notebooks/demo/ -p 8889:8888 nvcr.io/nvidia/rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04
Please let me know if this is enough to reproduce the issue, thanks !
@dominicshanshan Thanks for the reproducer. I have identified the root cause, will issue a fix.
@galipremsagar figure out a trick to get it work alternatively (trick is on bold one):
def map_cal(x):
df = cudf.DataFrame(columns=['full_coverage'])
x = x[~x['signal'].isna()]
ticker = x.shape[0]
num = 4000
full = ticker / num
coverage = df.append({'full_coverage':full}, ignore_index=True)
return coverage
df_cal = cudf.DataFrame(signal_data_gpu.groupby('tradeDate').apply(lambda x: map_cal(x))).reset_index(drop=True)
print(df_cal)
tradeDate = cudf.DataFrame(signal_data_gpu.tradeDate.unique(), columns=['tradeDate']).reset_index(drop=True)
print(tradeDate)
coverage_data = df_cal.join(tradeDate, sort=True)
@galipremsagar figure out a trick to get it work alternatively (trick is on bold one):
def map_cal(x):
df = cudf.DataFrame(columns=['full_coverage'])
x = x[~x['signal'].isna()]
ticker = x.shape[0]
num = 4000
full = ticker / num
coverage = df.append({'full_coverage':full}, ignore_index=True)
return coveragedf_cal = cudf.DataFrame(signal_data_gpu.groupby('tradeDate').apply(lambda x: map_cal(x))).reset_index(drop=True)
print(df_cal)tradeDate = cudf.DataFrame(signal_data_gpu.tradeDate.unique(), columns=['tradeDate']).reset_index(drop=True)
print(tradeDate)coverage_data = df_cal.join(tradeDate, sort=True)
really helped, thanks a lot
@galipremsagar figure out a trick to get it work alternatively (trick is on bold one):
def map_cal(x):
df = cudf.DataFrame(columns=['full_coverage'])
x = x[~x['signal'].isna()]
ticker = x.shape[0]
num = 4000
full = ticker / num
coverage = df.append({'full_coverage':full}, ignore_index=True)
return coverage
df_cal = cudf.DataFrame(signal_data_gpu.groupby('tradeDate').apply(lambda x: map_cal(x))).reset_index(drop=True)
print(df_cal)
tradeDate = cudf.DataFrame(signal_data_gpu.tradeDate.unique(), columns=['tradeDate']).reset_index(drop=True)
print(tradeDate)
coverage_data = df_cal.join(tradeDate, sort=True)really helped, thanks a lot
bug fixed in nightly version, thank you three thousand
Most helpful comment
@dominicshanshan Thanks for the reproducer. I have identified the root cause, will issue a fix.