Incubator-mxnet: Unpredictable nan in Array

Created on 20 Mar 2018 · 2Comments · Source: apache/incubator-mxnet

Description

Hi there,
I tried to apply some nd operations on data. But the returned array seemed to randomly contain a nan element for a certain index.

Environment info (Required)

Built from source

----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Nov 19 2016 06:48:10'))
('Arch         :', ('64bit', 'ELF'))
----------MXNet Info-----------
('Version      :', '1.2.0')
('Directory    :', '/home/workspace/incubator-mxnet/python/mxnet')
----------System Info----------
('Platform     :', 'Linux-4.4.0-96-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('release      :', '4.4.0-96-generic')

Package used (Python/R/Scala/Julia):
Python

MXNet commit hash:
97570916f844bcb4515d972c75fb0a75da345d97

Error Message:

# `Ctrl+F` to find the place where `nan` occurs
# $ python batch_loss_bug.py 
[[ 0.54881352  0.59284461]
 [ 0.71518934  0.84426576]
 [ 0.60276335  0.85794562]
 [ 0.54488319  0.84725171]
 [ 0.42365479  0.62356371]]
[[ 0.79172504  0.81216872]
 [ 0.5288949   0.47997716]
 [ 0.56804454  0.3927848 ]
 [ 0.92559665  0.83607876]
 [ 0.07103606  0.33739617]]
--------------------
[[ 0.24291152  0.21932411]
 [ 0.24291152  0.21932411]
 [ 0.24291152  0.21932411]
 [ 0.24291152  0.21932411]
 [ 0.24291152  0.21932411]
 [-0.18629444 -0.3642886 ]
 [-0.18629444 -0.3642886 ]
 [-0.18629444 -0.3642886 ]
 [-0.18629444 -0.3642886 ]
 [-0.18629444 -0.3642886 ]
 [-0.03471881 -0.46516082]
 [-0.03471881 -0.46516082]
 [-0.03471881 -0.46516082]
 [-0.03471881 -0.46516082]
 [-0.03471881 -0.46516082]
 [ 0.38071346 -0.01117295]
 [ 0.38071346 -0.01117295]
 [ 0.38071346 -0.01117295]
 [        nan -0.01117295]     #     NaN occurs here!
 [ 0.38071346 -0.01117295]
 [-0.35261875 -0.28616753]
 [-0.35261875 -0.28616753]
 [-0.35261875 -0.28616753]
 [-0.35261875 -0.28616753]
 [-0.35261875 -0.28616753]]

Minimum reproducible example

Run the following code several times (must be less than 10 times in my experiments), and there is a chance to observe the trouble output.

import mxnet as mx
import numpy as np
def bench_mark(D_slice, P_slice):
    n= D_slice.shape[0]
    e3 = mx.nd.empty((n*n,2))
    for i in xrange(n):
        e3[n*i:n*(i+1)]  = (e3[n*i:n*(i+1)]*0+1)*( P_slice[i] - D_slice[i])
    return e3

x = mx.nd.random.uniform(shape=(2,5,2))
y = mx.nd.random.uniform(shape=(2,5,2))
e3_bm = bench_mark(x[0], y[0])
print x[0].asnumpy()
print y[0].asnumpy()
print '-'*20
print (e3_bm).asnumpy()

What have you tried to solve it?

Maybe the function of bench_mark should be replaced by the other( but I still think the problem makes sense).

Bug NDArray

Source

CXMA479

Most helpful comment

@CXMA479 Thanks for reporting this!. It seems to be a bug. I've revised the code to repeat the process multiple times:

import mxnet as mx
import numpy as np
def bench_mark(D_slice, P_slice):
    n= D_slice.shape[0]
    e3 = mx.nd.empty((n*n,2))
    for i in range(n):
        e3[n*i:n*(i+1)]  = (e3[n*i:n*(i+1)]*0+1)*( P_slice[i] - D_slice[i])
    return e3

for i in range(100):
    x = mx.nd.random.uniform(shape=(2,5,2))
    y = mx.nd.random.uniform(shape=(2,5,2))
    e3_bm = bench_mark(x[0], y[0])
    print(x[0].asnumpy())
    print(y[0].asnumpy())
    print('-'*20)
    # print((e3_bm).asnumpy())
    e3_bm_npy = e3_bm.asnumpy()
    assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)

Result:

     17     # print((e3_bm).asnumpy())
     18     e3_bm_npy = e3_bm.asnumpy()
---> 19     assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)
     20

AssertionError: [[ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [        nan         nan]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]]

sxjscience on 22 Mar 2018

👍2

All 2 comments

@CXMA479 Thanks for reporting this!. It seems to be a bug. I've revised the code to repeat the process multiple times:

import mxnet as mx
import numpy as np
def bench_mark(D_slice, P_slice):
    n= D_slice.shape[0]
    e3 = mx.nd.empty((n*n,2))
    for i in range(n):
        e3[n*i:n*(i+1)]  = (e3[n*i:n*(i+1)]*0+1)*( P_slice[i] - D_slice[i])
    return e3

for i in range(100):
    x = mx.nd.random.uniform(shape=(2,5,2))
    y = mx.nd.random.uniform(shape=(2,5,2))
    e3_bm = bench_mark(x[0], y[0])
    print(x[0].asnumpy())
    print(y[0].asnumpy())
    print('-'*20)
    # print((e3_bm).asnumpy())
    e3_bm_npy = e3_bm.asnumpy()
    assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)

Result:

     17     # print((e3_bm).asnumpy())
     18     e3_bm_npy = e3_bm.asnumpy()
---> 19     assert(np.isnan(e3_bm_npy).any() == False), str(e3_bm_npy)
     20

AssertionError: [[ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [ 0.5124911   0.45078474]
 [        nan         nan]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [-0.57626355  0.13732177]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.2129727  -0.06519118]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [ 0.17914966 -0.17914736]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]
 [-0.6420211   0.68399173]]

sxjscience on 22 Mar 2018

👍2

@mxnet-label-bot add [NDArray]