Cudf: [BUG] from_dlpack returns incorrect results

Created on 21 May 2019  路  8Comments  路  Source: rapidsai/cudf

Describe the bug
from_dlpack does not appear to be creating the correct dataframe from 2d cupy arrays, though I may be doing something wrong. I expect that calling from_dlpack on a cupy array created with column-major order will return the correct dataframe.

It looks like we skip the from_dlpack tests in CI as it requires cupy.

import cupy
import cudf

arr = cupy.array([
    [0,1,2.],
    [4,5,6,],
    [7,8,9]
])
print(arr)
print(cudf.from_dlpack(arr.toDlpack()))
print(cudf.from_dlpack(arr.T.toDlpack()))
print(cupy.fromDlpack(arr.toDlpack()))
[[0. 1. 2.]
 [4. 5. 6.]
 [7. 8. 9.]]
     0    1    2
0  1.0  1.0  1.0
1  2.0  2.0  2.0
2  4.0  4.0  4.0
     0    1    2
0  4.0  4.0  4.0
1  5.0  5.0  5.0
2  6.0  6.0  6.0
[[0. 1. 2.]
 [4. 5. 6.]
 [7. 8. 9.]]
import cupy
import cudf

arr = cupy.array(
    [
        [0,1,2.],
        [4,5,6,],
        [7,8,9]
    ],
    order='F' # column major
)
print(arr)
print(cudf.from_dlpack(arr.toDlpack()))
print(cudf.from_dlpack(arr.T.toDlpack()))
print(cupy.fromDlpack(arr.toDlpack()))

[[0. 1. 2.]
 [4. 5. 6.]
 [7. 8. 9.]]
     0    1    2
0  1.0  1.0  1.0
1  5.0  5.0  5.0
2  8.0  8.0  8.0
     0    1    2
0  4.0  4.0  4.0
1  7.0  7.0  7.0
2  1.0  1.0  1.0
[[0. 1. 2.]
 [4. 5. 6.]
 [7. 8. 9.]]

cuDF commit
commit 24ab9736d53c7859e6364a9d33861c2858d7f752 (HEAD -> branch-0.8, origin/branch-0.8, origin/HEAD)
Merge: feec0c5e 1357a57a
Author: Jake Hemstad jhemstad@nvidia.com
Date: Fri May 17 08:12:00 2019 -0500

Merge pull request #1746 from jrhemstad/fea-ext-removed-dead-code

[REVIEW] Removed unused, untested, and dead code

CuPy installed via pip for cuda 9.2

bug libcudf

All 8 comments

Investigating the list of devicearrays (res) shows that every devicearray is the same, which makes me think this may be caused by something in the C++/Cython code.

https://github.com/rapidsai/cudf/blob/d6b7bd1b736e8bd498b2c9f18f00474744f6b2b6/python/cudf/io/dlpack.py#L36

Are the 2d cupy arrays column or row major? The from/to_dlpack in libcudf only supports column major.

CC @harrism

@jrhemstad I tried both. order='F' during the array construction creates the column major cupy array.

import cupy
import cudf

arr = cupy.array([
    [0,1,2.],
    [4,5,6,],
    [7,8,9]
])
print(cupy.isfortran(arr))


arr = cupy.array([
    [0,1,2.],
    [4,5,6,],
    [7,8,9]
],
    order='F')
print(cupy.isfortran(arr))
False
True

Here's an example without using CuPy:

df = cudf.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
df_dlpack = cudf.from_dlpack(df.to_dlpack())
print(df)
print(df_new)
   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9
   0  1  2
0  4  4  4
1  5  5  5
2  6  6  6

It would be great if someone can triage whether this is a cuDF or libcudf issue.

@harrism In the Cython code, the numba devicearray created by this section of the code:

https://github.com/rapidsai/cudf/blob/2f36e57df9a67d09b645b9a4bedf4ea138d1d245/python/cudf/bindings/dlpack.pyx#L55-L64

always has the same values, despite the idx and data_ptr being different as the loop progresses for each column. This makes me think this is a libcudf issue, since result_cols is created from the libcudf function.

A specific example:

Given this input dataframe: df = cudf.DataFrame({'a': [0, 4, 2.0, 39], 'b':[1, 1, 3.0, 50]})

idx = 0
data_ptr = 140305823171072
[ 1.  1.  3. 50.] # the created devicearray copied to host
idx = 1
data_ptr = 140305823172608
[ 1.  1.  3. 50.] # the created devicearray copied to host
      0     1
0   1.0   1.0
1   1.0   1.0
2   3.0   3.0
3  50.0  50.0

Thanks. Just need help setting priority. Can this wait until 0.9?

I think that's fine. CuPy users can still convert via Numba's array interface.

Something like: cudf.DataFrame.from_gpu_matrix(numba.cuda.as_cuda_array(cupy_arr))

Was this page helpful?
0 / 5 - 0 ratings