Cudf: [BUG] rapidsai-nightly/label/cuda10.0 cudf.fillna(0) fail

Created on 21 Jun 2019  路  6Comments  路  Source: rapidsai/cudf

Describe the bug
I use cudf 0.8 rapidsai-nightly/label/cuda10.0 as i implement
df.fillna(0) encouter
/databricks/python/lib/python3.6/site-packages/numba/cuda/compiler.py:234: UserWarning: Could not autotune, using default tpb of 128 warnings.warn('Could not autotune, using default tpb of 128') TypeError: fill_value must be a string or a string series

Steps/Code to reproduce bug
88
Expected behavior
Implement df.fillna(0) without error.
Environment details (please complete the following information):

2

bug cuDF (Python)

Most helpful comment

@polarbeargo trying to run fillna on the full dataframe object is going to be problematic for you since you have mixed dtypes. Here you're trying to fill numeric columns with a string which isn't allowed and we don't want to automatically typecast all of your numeric columns to strings.

All 6 comments

It looks like you have some string dtype columns which can't just be filled with 0 naively. Pandas uses Python objects under the hood whereas we use an actual UTF8 string type.

@randerzander @beckernick any views on how this should best be handled from your perspective? Would you expect it to fill with "0" or throw an error as it currently does?

IMO it's best to keep it an error as is.

This feels like one of those places where automatic type conversion (0 -> "0") is highly likely to induce confusion downstream.

Consider the scenario where CSV dtype inference reads a string when the user expected numerics. Prior to running some algorithm, they do:
for col in df.columns: df[col] = df[col].fillna(0)

At best, simple numeric operators surface the problem early.

At worst, a numba kernel or call to fit gives an inscrutable error, and significant time gets wasted trying to figure out what's going wrong.

If I write df.fillna("0") will bumped into following error message:
33

@polarbeargo trying to run fillna on the full dataframe object is going to be problematic for you since you have mixed dtypes. Here you're trying to fill numeric columns with a string which isn't allowed and we don't want to automatically typecast all of your numeric columns to strings.

Thank you @kkraus14 very appreciate your guidance:)!

Was this page helpful?
0 / 5 - 0 ratings