Cudf: [FEA] Series or DataFrame support for equivalent of numpy.isclose

Created on 6 May 2020  路  4Comments  路  Source: rapidsai/cudf

Is your feature request related to a problem? Please describe.

It's useful for testing to be able to perform the equivalent of numpy.isclose on instances of cudf.DataFrame and cudf.Series.

Describe the solution you'd like

This would be a nice thing to have:

>>> import cudf
>>> s1 = cudf.Series([1.9876543,   2.9876654,   3.9876543])
>>> s2 = cudf.Series([1.987654321, 2.987654321, 3.987654321])
>>> rel_tol=1e-5
>>> abs_tol=0.0
>>> s2.isclose(s1, rel_tol, abs_tol)
0    True
1    True
2    True
dtype: bool
>>> 

Describe alternatives you've considered

Here's my current hand-rolled solution:

>>> import cudf
>>> s1 = cudf.Series([1.9876543,   2.9876654,   3.9876543])
>>> s2 = cudf.Series([1.987654321, 2.987654321, 3.987654321])
>>> rel_tol=1e-5
>>> abs_tol=0.0
>>> s2.abs().mul(rel_tol).add(abs_tol).sub(s1.sub(s2).abs()).gt(0)
0    True
1    True
2    True
dtype: bool
>>> 

There's nothing wrong with using this approach pervasively. I figured it'd be more convenient to have isclose as a built-in method.

I'm happy to help in whatever way I can.

cuDF (Python) feature request

Most helpful comment

Could you use cupy.isclose for this? I believe with pandas you would still use np.isclose.

All 4 comments

Could you use cupy.isclose for this? I believe with pandas you would still use np.isclose.

Could you use cupy.isclose for this? I believe with pandas you would still use np.isclose.

@paul-tqh-nguyen this would go from cudf --> cupy zero copy so it wouldn't cause any performance degradation. Could you give that a shot?

Using cupy.isclose solves all of my problems!

My apologies; I should've looked there earlier.

Thanks for the quick and helpful responses!

One note that this won't handle null values so I'm going to reopen this to provide some syntactic sugar in cuDF surrounding the cupy function in the future.

Was this page helpful?
0 / 5 - 0 ratings