Hi there,
I tried to apply some nd operations on data. But the returned array seemed to randomly contain a nan element for a certain index.
Built from source
----------Python Info----------
('Version :', '2.7.12')
('Compiler :', 'GCC 5.4.0 20160609')
('Build :', ('default', 'Nov 19 2016 06:48:10'))
('Arch :', ('64bit', 'ELF'))
----------MXNet Info-----------
('Version :', '1.2.0')
('Directory :', '/home/workspace/incubator-mxnet/python/mxnet')
----------System Info----------
('Platform :', 'Linux-4.4.0-96-generic-x86_64-with-Ubuntu-16.04-xenial')
('system :', 'Linux')
('release :', '4.4.0-96-generic')
Package used (Python/R/Scala/Julia):
Python
MXNet commit hash:
97570916f844bcb4515d972c75fb0a75da345d97
# `Ctrl+F` to find the place where `nan` occurs
# $ python batch_loss_bug.py
[[ 0.54881352 0.59284461]
[ 0.71518934 0.84426576]
[ 0.60276335 0.85794562]
[ 0.54488319 0.84725171]
[ 0.42365479 0.62356371]]
[[ 0.79172504 0.81216872]
[ 0.5288949 0.47997716]
[ 0.56804454 0.3927848 ]
[ 0.92559665 0.83607876]
[ 0.07103606 0.33739617]]
--------------------
[[ 0.24291152 0.21932411]
[ 0.24291152 0.21932411]
[ 0.24291152 0.21932411]
[ 0.24291152 0.21932411]
[ 0.24291152 0.21932411]
[-0.18629444 -0.3642886 ]
[-0.18629444 -0.3642886 ]
[-0.18629444 -0.3642886 ]
[-0.18629444 -0.3642886 ]
[-0.18629444 -0.3642886 ]
[-0.03471881 -0.46516082]
[-0.03471881 -0.46516082]
[-0.03471881 -0.46516082]
[-0.03471881 -0.46516082]
[-0.03471881 -0.46516082]
[ 0.38071346 -0.01117295]
[ 0.38071346 -0.01117295]
[ 0.38071346 -0.01117295]
[ nan -0.01117295] # NaN occurs here!
[ 0.38071346 -0.01117295]
[-0.35261875 -0.28616753]
[-0.35261875 -0.28616753]
[-0.35261875 -0.28616753]
[-0.35261875 -0.28616753]
[-0.35261875 -0.28616753]]
Run the following code several times (must be less than 10 times in my experiments), and there is a chance to observe the trouble output.
import mxnet as mx
import numpy as np
def bench_mark(D_slice, P_slice):
n= D_slice.shape[0]
e3 = mx.nd.empty((n*n,2))
for i in xrange(n):
e3[n*i:n*(i+1)] = (e3[n*i:n*(i+1)]*0+1)*( P_slice[i] - D_slice[i])
return e3
x = mx.nd.random.uniform(shape=(2,5,2))
y = mx.nd.random.uniform(shape=(2,5,2))
e3_bm = bench_mark(x[0], y[0])
print x[0].asnumpy()
print y[0].asnumpy()
print '-'*20
print (e3_bm).asnumpy()
Maybe the function of bench_mark should be replaced by the other( but I still think the problem makes sense).
@CXMA479 Thanks for reporting this!. It seems to be a bug. I've revised the code to repeat the process multiple times:
import mxnet as mx
import numpy as np
def bench_mark(D_slice, P_slice):
n= D_slice.shape[0]
e3 = mx.nd.empty((n*n,2))
for i in range(n):
e3[n*i:n*(i+1)] = (e3[n*i:n*(i+1)]*0+1)*( P_slice[i] - D_slice[i])
return e3
for i in range(100):
x = mx.nd.random.uniform(shape=(2,5,2))
y = mx.nd.random.uniform(shape=(2,5,2))
e3_bm = bench_mark(x[0], y[0])
print(x[0].asnumpy())
print(y[0].asnumpy())
print('-'*20)
# print((e3_bm).asnumpy())
e3_bm_npy = e3_bm.asnumpy()
assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)
Result:
17 # print((e3_bm).asnumpy())
18 e3_bm_npy = e3_bm.asnumpy()
---> 19 assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)
20
AssertionError: [[ 0.5124911 0.45078474]
[ 0.5124911 0.45078474]
[ 0.5124911 0.45078474]
[ 0.5124911 0.45078474]
[ 0.5124911 0.45078474]
[ nan nan]
[-0.57626355 0.13732177]
[-0.57626355 0.13732177]
[-0.57626355 0.13732177]
[-0.57626355 0.13732177]
[ 0.2129727 -0.06519118]
[ 0.2129727 -0.06519118]
[ 0.2129727 -0.06519118]
[ 0.2129727 -0.06519118]
[ 0.2129727 -0.06519118]
[ 0.17914966 -0.17914736]
[ 0.17914966 -0.17914736]
[ 0.17914966 -0.17914736]
[ 0.17914966 -0.17914736]
[ 0.17914966 -0.17914736]
[-0.6420211 0.68399173]
[-0.6420211 0.68399173]
[-0.6420211 0.68399173]
[-0.6420211 0.68399173]
[-0.6420211 0.68399173]]
@mxnet-label-bot add [NDArray]
Most helpful comment
@CXMA479 Thanks for reporting this!. It seems to be a bug. I've revised the code to repeat the process multiple times:
Result: