Cudf: [BUG] test_reductions.py test case failures

Created on 30 Apr 2019 · 15Comments · Source: rapidsai/cudf

Describe the bug
test_reductions.py test case failures in cuDF Build with commit level: 2cf8a535831dc555020f9668f3b8fd1dc8fb4dcb.

Steps/Code to reproduce bug

site-packages/cudf/tests$ pytest -v test_reductions.py

FAILED tests: 
test_reductions.py::test_sum[int8-2] FAILED
test_reductions.py::test_sum[int8-200] FAILED                                                                                                         [ 18%]
test_reductions.py::test_sum[int8-10000] FAILED                                                                                                       [ 19%]

test_reductions.py::test_min[int8-2] FAILED                                                                                                           [ 74%]
test_reductions.py::test_min[int8-3] FAILED                                                                                                           [ 74%]
test_reductions.py::test_min[int8-127] FAILED                                                                                                         [ 75%]
test_reductions.py::test_min[int8-128] FAILED                                                                                                         [ 75%]
test_reductions.py::test_min[int8-129] FAILED                                                                                                         [ 76%]
test_reductions.py::test_min[int8-200] FAILED                                                                                                         [ 76%]
test_reductions.py::test_min[int8-10000] FAILED

Expected behavior
test_reductions.py tests cases should work fine.

Environment details (please complete the following information):
cuDF build on commit:2cf8a535831dc555020f9668f3b8fd1dc8fb4dcb. using GCC 7.3.0 conda tool chain on Ubuntu 18.04 (ppc64le)

bug cuDF (Python)

Source

pradghos

All 15 comments

Snippet of Detailed log :

dtype = <class 'numpy.int8'>, nelem = 2

    @pytest.mark.parametrize('dtype,nelem', params)
    def test_sum(dtype, nelem):
        data = gen_rand(dtype, nelem)
        sr = Series(data)

        got = sr.sum()
        expect = dtype(data.sum())

        significant = 4 if dtype == np.float32 else 6
>       np.testing.assert_approx_equal(expect, got, significant=significant)
E       AssertionError:
E       Items are not equal to 6 significant digits:
E        ACTUAL: -32.0
E        DESIRED: 224.0

test_reductions.py:36: AssertionError

____________________________________________________________________ test_sum[int8-200] _____________________________________________________________________

dtype = <class 'numpy.int8'>, nelem = 200

    @pytest.mark.parametrize('dtype,nelem', params)
    def test_sum(dtype, nelem):
        data = gen_rand(dtype, nelem)
        sr = Series(data)

        got = sr.sum()
        expect = dtype(data.sum())

        significant = 4 if dtype == np.float32 else 6
>       np.testing.assert_approx_equal(expect, got, significant=significant)
E       AssertionError:
E       Items are not equal to 6 significant digits:
E        ACTUAL: -121.0
E        DESIRED: 135.0

test_reductions.py:36: AssertionError


>       assert expect == got
E       assert -11 == 245
E         --11
E         +245

test_reductions.py:90: AssertionError

>       assert expect == got
E       assert -7 == 249
E         --7
E         +249

test_reductions.py:90: AssertionError

dtype = <class 'numpy.int8'>, nelem = 128

    @pytest.mark.parametrize('dtype,nelem', params)
    def test_min(dtype, nelem):
        data = gen_rand(dtype, nelem)
        sr = Series(data)

        got = sr.min()
        expect = dtype(data.min())

>       assert expect == got
E       assert -32 == 224
E         --32
E         +224

test_reductions.py:90: AssertionError
___________________________________________

pradghos on 30 Apr 2019

is there any recommended numpy version to be used for cuDF 0.7.0* ? Thanks!

pradghos on 2 May 2019

@kovaltan can you have a look?

harrism on 13 May 2019

I'm looking the issue.

kovaltan on 13 May 2019

👍1

Hi @pradghos
Could you give more environment detail as below? since I could not reproduce the issue yet.

Environment details (please complete the following information):

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
Method of cuDF install: [conda, Docker, or from source]
- If method of install is [Docker], provide docker pull & docker run commands used
Please run and attach the output of the cudf/print_env.sh script to gather relevant environment details

kovaltan on 13 May 2019

Sorry for the delayed update -
Environment location: docker container ;
Method of cuDF install: Build cuDF conda package using gcc 7.3 conda tool chain. then install them locally. However, I am going to try the test soon on latest cudF 0.7.1 and share the result.

pradghos on 15 May 2019

@pradghos are you still on a Power CPU?

harrism on 17 May 2019

@pradghos are you still on a Power CPU?

Yes @harrism - I am using Power CPU.

pradghos on 17 May 2019

I see that all the failures are on the numpy.int8 type. That sounds suspiciously similar to the narrowing warnings we saw in https://github.com/rapidsai/cudf/issues/1544 caused by the difference in how x86 vs POWER interpret char.

I'd double check what gdf_dtype is being passed into libcudf for the numpy.int8 column.

E       assert -32 == 224
E         --32
E         +224

224 == 11100000 (unsigned)
-32 == 11100000 (signed twos-complement)

This looks to me like the "expected" result is being interpreted as a _signed_ integer, and the "actual" result is being interpreted as _unsigned_, because they have the same binary representation.

jrhemstad on 17 May 2019

👍1

@jrhemstad : any further suggestion for this issue? Thanks!

pradghos on 24 May 2019

I could able to recreate with cuDF 0.7.2 , with conda tool chain gcc 7.3 on Power Platform (ppc64le).

pradghos on 24 May 2019

@jrhemstad : any further suggestion for this issue? Thanks!

This looks to me like an issue on the Python side.

@kkraus14 any idea why it seems like expect is being treated as signed, but got as unsigned?

Note this only happens on POWER where char is interpreted as unsigned char vs. x86 where char is signed char.

jrhemstad on 24 May 2019

I think I am able to find the fix for the problem -

file - reduce.pyx in function apply_reduce is basically handling all the reduce operations-

    with nogil:
        c_result = reduction(
            <gdf_column*>c_col,
            c_op,
            c_out_dtype
            )

    free(c_col)
    result = get_scalar_value(c_result)
 ```

`get_scalar_value()` is returning `GDF_INT8` scalar data  which is nothing but `char` in  `ctypedef union gdf_data`.

cdef get_scalar_value(gdf_scalar scalar):
"""
Returns typed value from a gdf_scalar
0-dim array is retuned if dtype is date32/64, timestamp
"""
print("get_scalar_value called")
return {
GDF_FLOAT64: scalar.data.fp64,
GDF_FLOAT32: scalar.data.fp32,
GDF_INT64: scalar.data.si64,
GDF_INT32: scalar.data.si32,
GDF_INT16: scalar.data.si16,
GDF_INT8: scalar.data.si08, ===============>>> its char
GDF_BOOL8: np.array(scalar.data.b08).astype(np.bool_),
GDF_DATE32: np.array(scalar.data.dt32).astype('datetime64[D]'),
GDF_DATE64: np.array(scalar.data.dt64).astype('datetime64[ms]'),
GDF_TIMESTAMP: np.array(scalar.data.tmst).astype('datetime64[ns]'),
}[scalar.dtype]


In POWER where char is interpreted as unsigned char. below fixes worked fine -

diff --git a/python/cudf/bindings/cudf_cpp.pxd b/python/cudf/bindings/cudf_cpp.pxd
index a5c4970..a4631d4 100644
--- a/python/cudf/bindings/cudf_cpp.pxd
+++ b/python/cudf/bindings/cudf_cpp.pxd
@@ -172,7 +172,7 @@ cdef extern from "cudf.h" nogil:
GDF_WINDOW_VA

 ctypedef union gdf_data:

- char si08
+ signed char si08
short si16
int si32
long si64

I saw similar problem in libcudf also -

diff --git a/cpp/include/cudf/types.h b/cpp/include/cudf/types.h
index 22f118b..c1a2df0 100644
--- a/cpp/include/cudf/types.h
+++ b/cpp/include/cudf/types.h
@@ -107,7 +107,7 @@ typedef struct {
/
// TODO: #1119 Use traits to set gdf_data elements
typedef union {
- char si08; /< GDF_INT8 */
+ signed char si08; /< GDF_INT8 */
short si16; /< GDF_INT16 */
int si32; /< GDF_INT32 */
long si64; /*< GDF_INT64 */
```
@jrhemstad @harrism @kkraus14 : Pls let me know if I can create PR for the same.

pradghos on 28 May 2019

👍1

Aha! Good detective work @pradghos. Feel free to open a PR with your fix and link this issue.

jrhemstad on 28 May 2019

👍1

Aha! Good detective work @pradghos. Feel free to open a PR with your fix and link this issue.

Thanks for looking into the issue and reviewing the fix @jrhemstad @harrism @kkraus14 ! I have opened a PR for the same.

pradghos on 28 May 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[BUG] DataFrame.empty doesn't perform as advertised in the documentation

yasmina-altair · 3Comments

[BUG] cudf.read_csv: KeyError: 8

randerzander · 3Comments

[BUG] datetime dtype not being inferred by cudf avro reader

galipremsagar · 3Comments

[FEA] Remove `experimental` namespace from libcudf

jrhemstad · 3Comments

[QST] Appending Dataframe Rows

saifrahmed · 3Comments