Cudf: [BUG] Series built from ephemeral CuPy arrays change due to CuPy's memory reuse

Created on 11 Sep 2019  Â·  3Comments  Â·  Source: rapidsai/cudf

Converting cupy arrays to Series looks like it may be changing the values based on other arrays. Could be a pointer issue?

x = cudf.Series(cp.zeros(5))
print(x)
y = cudf.Series(cp.ones(5))
print(y)
print(x)
0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
dtype: float64
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64
import cudf
import cupy as cp
​
df = cudf.DataFrame()
df['a'] = cp.zeros(5)
print(df)
df['b'] = cp.ones(5)
print(df)
df['c'] = cp.array([1,2,3,4,5])
print(df)
     a
0  0.0
1  0.0
2  0.0
3  0.0
4  0.0
     a    b
0  1.0  1.0
1  1.0  1.0
2  1.0  1.0
3  1.0  1.0
4  1.0  1.0
               a              b  c
0  4.940656e-324  4.940656e-324  1
1  9.881313e-324  9.881313e-324  2
2  1.482197e-323  1.482197e-323  3
3  1.976263e-323  1.976263e-323  4
4  2.470328e-323  2.470328e-323  5

Built from source using cupy-cuda100.

bug cuDF (Python)

All 3 comments

@shwina Looks like the zero copy is biting us here 😄

@shwina @kkraus14 , it looks like CuPy (and NumPy) objects without an explicit reference are assigned the same pointer?


a = cp.zeros(5)
b = cp.ones(5)
c = cp.random.normal(5,5,5)
print(a.__cuda_array_interface__)
print(b.__cuda_array_interface__)
print(c.__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162624, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162112, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
print(cp.zeros(5).__cuda_array_interface__)
print(cp.ones(5).__cuda_array_interface__)
print(cp.random.normal(5,5,5).__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}

NumPy

a = np.zeros(5)
b = np.ones(5)
c = np.random.normal(5,5,5)
print(a.__array_interface__)
print(b.__array_interface__)
print(c.__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209825391664, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209820973168, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
print(np.random.normal(5,5,5).__array_interface__)
print(np.zeros(5).__array_interface__)
print(np.ones(5).__array_interface__)
print(np.random.normal(5,5,5).__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}

@beckernick Discussing offline with @shwina, but basically what's happening is you're creating an ephemeral cupy array, which is immediately freed, but the caching allocator of CuPy makes the memory still valid to be accessed and reused.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

henningpeters picture henningpeters  Â·  3Comments

yasmina-altair picture yasmina-altair  Â·  3Comments

saifrahmed picture saifrahmed  Â·  3Comments

MurrayData picture MurrayData  Â·  3Comments

Polarbeargo picture Polarbeargo  Â·  3Comments