Converting cupy arrays to Series looks like it may be changing the values based on other arrays. Could be a pointer issue?
x = cudf.Series(cp.zeros(5))
print(x)
y = cudf.Series(cp.ones(5))
print(y)
print(x)
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
dtype: float64
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
dtype: float64
import cudf
import cupy as cp
​
df = cudf.DataFrame()
df['a'] = cp.zeros(5)
print(df)
df['b'] = cp.ones(5)
print(df)
df['c'] = cp.array([1,2,3,4,5])
print(df)
a
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
a b
0 1.0 1.0
1 1.0 1.0
2 1.0 1.0
3 1.0 1.0
4 1.0 1.0
a b c
0 4.940656e-324 4.940656e-324 1
1 9.881313e-324 9.881313e-324 2
2 1.482197e-323 1.482197e-323 3
3 1.976263e-323 1.976263e-323 4
4 2.470328e-323 2.470328e-323 5
Built from source using cupy-cuda100.
@shwina Looks like the zero copy is biting us here 😄
@shwina @kkraus14 , it looks like CuPy (and NumPy) objects without an explicit reference are assigned the same pointer?
a = cp.zeros(5)
b = cp.ones(5)
c = cp.random.normal(5,5,5)
print(a.__cuda_array_interface__)
print(b.__cuda_array_interface__)
print(c.__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162624, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923162112, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
print(cp.zeros(5).__cuda_array_interface__)
print(cp.ones(5).__cuda_array_interface__)
print(cp.random.normal(5,5,5).__cuda_array_interface__)
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
{'shape': (5,), 'typestr': '<f8', 'descr': [('', '<f8')], 'data': (139953923163136, False), 'version': 0}
NumPy
a = np.zeros(5)
b = np.ones(5)
c = np.random.normal(5,5,5)
print(a.__array_interface__)
print(b.__array_interface__)
print(c.__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209825391664, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209820973168, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
print(np.random.normal(5,5,5).__array_interface__)
print(np.zeros(5).__array_interface__)
print(np.ones(5).__array_interface__)
print(np.random.normal(5,5,5).__array_interface__)
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
{'data': (94209826245616, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (5,), 'version': 3}
@beckernick Discussing offline with @shwina, but basically what's happening is you're creating an ephemeral cupy array, which is immediately freed, but the caching allocator of CuPy makes the memory still valid to be accessed and reused.