Is your feature request related to a problem? Please describe.
As a user, I'd like to be able to round a Series to a specified number of decimal places, like in Pandas.
Describe the solution you'd like
I'd like to call series.round(n) and round the values in the series to n decimal places.
Describe alternatives you've considered
The alternative is to define a rounding UDF or go to the CPU.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
It looks like the // operator does the right thing. Presumably the functionality exists to do this today.
In [1]: import cudf
In [2]: df = cudf.DataFrame({'x': [1.2345, 6.7890]})
In [3]: print(df)
x
0 1.2345
1 6.789
In [4]: print(df // 0.01 * 0.01)
x
0 1.23
1 6.78
In [5]: df.to_pandas().round(2)
Out[5]:
x
0 1.23
1 6.79
Floor division will work often but doesn't resolve the discrepancies shown when you should be rounding up for that place.
Stopgap Python layer functionality for this feature has been merged with #1745. Explicitly backlogging this and defer to your prioritization on the libcudf side, @harrism
@beckernick I think the reason that C/C++ doesn't have a round(x, decimals) to round floating point values to a number of decimal points is because binary floating point numbers don't have decimals, by definition. Rounding can lead to unrepresentable values and thus result in less accuracy than the decimals parameter suggests... In C++, people usually round on printing, rather than changing the stored value.
But if you insist...
@harrism that makes sense to me. As long as the end user has dataframe/series functionality that they expect (which includes rounding since it's in pandas/numpy), I'm happy. Completely defer to you and @kkraus14 as to whether this functionality should exist (or not exist) in libcuDF.
Yeah, there are lots of technical reasons to not round. It's good to remind ourselves that our users aren't technical, and don't care about those reasons :)
I wonder how often users round their data and then complain that the results are wrong. However, I think it's unfair to call users of tools like Python and Pandas non-technical...
I believe this is now ready, based on discussion with @codereport in https://github.com/rapidsai/cudf/pull/6976
@ChrisJar would you be able to explore plumbing cudf::round up to python and removing the old numba.cuda round code?
Thanks for digging up this old issue, @beckernick , I forgot about it! Retitled this and relabeled it to reflect that it is a Python feature request now, since the libcudf feature is implemented.
Note that this allows us to remove some Numba code. See https://github.com/rapidsai/cudf/pull/6976#issuecomment-742825673
Yeah, would love to work on this
Yeah, would love to work on this
Thanks!
Most helpful comment
I wonder how often users round their data and then complain that the results are wrong. However, I think it's unfair to call users of tools like Python and Pandas non-technical...