Cudf: [BUG] Series built from ephemeral CuPy arrays change due to CuPy's memory reuse

Created on 11 Sep 2019 · 3Comments · Source: rapidsai/cudf

Converting cupy arrays to Series looks like it may be changing the values based on other arrays. Could be a pointer issue?

x = cudf.Series(cp.zeros(5))
print(x)
y = cudf.Series(cp.ones(5))
print(y)
print(x)
0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
dtype: float64
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64
0    1.0
1    1.0
2    1.0
3    1.0
4    1.0
dtype: float64

import cudf
import cupy as cp

df = cudf.DataFrame()
df['a'] = cp.zeros(5)
print(df)
df['b'] = cp.ones(5)
print(df)
df['c'] = cp.array([1,2,3,4,5])
print(df)
     a
0  0.0
1  0.0
2  0.0
3  0.0
4  0.0
     a    b
0  1.0  1.0
1  1.0  1.0
2  1.0  1.0
3  1.0  1.0
4  1.0  1.0
               a              b  c
0  4.940656e-324  4.940656e-324  1
1  9.881313e-324  9.881313e-324  2
2  1.482197e-323  1.482197e-323  3
3  1.976263e-323  1.976263e-323  4
4  2.470328e-323  2.470328e-323  5

Built from source using cupy-cuda100.

bug cuDF (Python)

Source

beckernick

All 3 comments

@shwina Looks like the zero copy is biting us here 😄

kkraus14 on 11 Sep 2019

👍1

@shwina @kkraus14 , it looks like CuPy (and NumPy) objects without an explicit reference are assigned the same pointer?


a = cp.zeros(5)
b = cp.ones(5)
c = cp.random.normal(5,5,5)
print(a.__cuda_array_interface__)
print(b.__cuda_array_interface__)
print(c.__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162624, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162112, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}

print(cp.zeros(5).__cuda_array_interface__)
print(cp.ones(5).__cuda_array_interface__)
print(cp.random.normal(5,5,5).__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}

NumPy

a = np.zeros(5)
b = np.ones(5)
c = np.random.normal(5,5,5)
print(a.__array_interface__)
print(b.__array_interface__)
print(c.__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209825391664, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209820973168, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}

print(np.random.normal(5,5,5).__array_interface__)
print(np.zeros(5).__array_interface__)
print(np.ones(5).__array_interface__)
print(np.random.normal(5,5,5).__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}

beckernick on 11 Sep 2019

@beckernick Discussing offline with @shwina, but basically what's happening is you're creating an ephemeral cupy array, which is immediately freed, but the caching allocator of CuPy makes the memory still valid to be accessed and reused.

kkraus14 on 11 Sep 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings