>>> import cudf
>>> df = cudf.DataFrame()
>>> df ['a'] = range(10)
>>> df['b'] = 1234324233.13
>>> df['c'] = None
>>> df
a b c
0 0 1.234324e+09 None
1 1 1.234324e+09 None
2 2 1.234324e+09 None
3 3 1.234324e+09 None
4 4 1.234324e+09 None
5 5 1.234324e+09 None
6 6 1.234324e+09 None
7 7 1.234324e+09 None
8 8 1.234324e+09 None
9 9 1.234324e+09 None
>>> df['c'][df.a < 10] = df.b[df.a < 10]/10
>>> df
a b c
0 0 1.234324e+09 123432423.3
1 1 1.234324e+09 123432423.3
2 2 1.234324e+09 123432423.3
3 3 1.234324e+09 123432423.3
4 4 1.234324e+09 123432423.3
5 5 1.234324e+09 123432423.3
6 6 1.234324e+09 123432423.3
7 7 1.234324e+09 123432423.3
8 8 1.234324e+09 123432423.3
9 9 1.234324e+09 123432423.3
>>> df['c'] = df['c'].astype('int64')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/series.py", line 1445, in astype
raise e
File "/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/series.py", line 1441, in astype
data=self._column.astype(dtype, **kwargs)
File "/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/column.py", line 840, in astype
return self.as_numerical_column(dtype, **kwargs)
File "/conda/envs/rapids/lib/python3.7/site-packages/cudf/core/column/string.py", line 2073, in as_numerical_column
type due to presence of non-integer values."
ValueError: Could not convert strings to integer type due to presence of non-integer values.
after this line df['c'][df.a < 10] = df.b[df.a < 10]/10 the dtype of col 'c' is still 'object' and not 'float64'.
This only happens with a boolean mask.
This only happens on latest branch-0.14 nightly.
@mlahir1 this change has been introduced recently with PR #5054, and you can observe that this would fail in pandas as well, as 123432423.3 is not an integer,
In [13]: import pandas as pd
In [14]: df = pd.DataFrame({"a":["1.1", "2.2", "3.3"]})
In [15]: df['a'] = df['a'].astype('int64')
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: '1.1'
You can try something like this
In [3]: df['c'] = df['c'].astype('float64').astype('int64')
In [4]: df
Out[4]:
a b c
0 0 1.234324e+09 123432423
1 1 1.234324e+09 123432423
2 2 1.234324e+09 123432423
3 3 1.234324e+09 123432423
4 4 1.234324e+09 123432423
5 5 1.234324e+09 123432423
6 6 1.234324e+09 123432423
7 7 1.234324e+09 123432423
8 8 1.234324e+09 123432423
9 9 1.234324e+09 123432423
Thanks, I am using the same workaround.
Sorry, closed by mistake.
Think there is no other solution apart from what is suggested as the behavior follows pandas.
@mlahir1 is there any other expectation?
I'm confused as to why df['c'] is a string when trying to typecast here.
I'm guessing df['c'][df.a < 10] = df.b[df.a < 10]/10 isn't changing the type to float as expected. Does Pandas keep this as an object type?
df['c'] still remains object in pandas, and as it is updating slice/map of elements in a column it make sense to keep the type same.
In [40]: df['c']
Out[40]:
0 1.23432e+08
1 1.23432e+08
2 1.23432e+08
3 1.23432e+08
4 1.23432e+08
5 1.23432e+08
6 1.23432e+08
7 1.23432e+08
8 1.23432e+08
9 1.23432e+08
Name: c, dtype: object
In [41]: type(df)
Out[41]: pandas.core.frame.DataFrame
@kkraus14
df['c'][df.a < 10] = df.b[df.a < 10]/10 isn't changing it float. I just checked and pandas also doesn't change it to float.
@rgsl888prabhu
I reported it because something that was working yesterday wasn't working today. If you think it is in line with pandas. you can close the issue. I will put the workaround for that in my code.