@ykim362
Operating System: OSX, pip install --pre mxnet-mkl
MXNet commit hash (git rev-parse HEAD): b6b8da0ac2b1ef8b84089458c757ce8b19aab0d7
Python version and distribution: Python 2.7.13
from __future__ import print_function
import mxnet as mx
import skimage.io as io
import numpy as np
def print_activation(net, url):
I = io.imread(url)
if I.shape[2] == 4:
I = rgba2rgb(I)
image = mx.nd.array(I).astype(np.uint8)
image = mx.image.resize_short(image, 256)
image, _ = mx.image.center_crop(image, (224, 224))
image = mx.image.color_normalize(image.astype(np.float32)/255,
mean=mx.nd.array([0.485, 0.456, 0.406]),
std=mx.nd.array([0.229, 0.224, 0.225]))
image = mx.nd.transpose(image.astype('float32'), (2,1,0))
image = mx.nd.expand_dims(image, axis=0)
out = net(image)
return (out - out.max()).exp().sum().asscalar(), mx.nd.log_softmax(out), out
act1, log_softmax_out1, out1 = print_activation(mx.gluon.model_zoo.vision.squeezenet1_1(pretrained=True), 'https://github.com/zackchase/mxnet-the-straight-dope/raw/master/img/real_hotdog.jpg')
act2, log_softmax_out2, out2 = print_activation(mx.gluon.model_zoo.vision.squeezenet1_1(pretrained=True), 'https://github.com/zackchase/mxnet-the-straight-dope/raw/master/img/real_hotdog.jpg')
print(act1, act2)
print((out1-out2).max(), (out1-out2).min())
Expected outputs should be
1.16364 1.16364
[ 0. ]
<NDArray 1 @cpu(0)>
[ 0. ]
<NDArray 1 @cpu(0)>
Instead I got
MKL Build:20170720
3.0 nan
[ 0.]
<NDArray 1 @cpu(0)>
[ 0.]
<NDArray 1 @cpu(0)>
And the first two values are different every time I run it.
% python bug.py
MKL Build:20170720
1.0 nan
[ 0.]
<NDArray 1 @cpu(0)>
[ 0.]
<NDArray 1 @cpu(0)>
% python bug.py
MKL Build:20170720
nan 718.584
[ 0.]
<NDArray 1 @cpu(0)>
[ 0.]
<NDArray 1 @cpu(0)>
or if you are running standard examples, please provide the commands you have run that lead to the error.
pip install --pre mxnet-mklpip install scikit-imagebug.pypython bug.pymx.nd.log_softmax(out) before the term (out - out.max()).exp().sum().asscalar() in print_activation function seems to make the results right.is there an update on this issue? we are keen to include MKL in an upcoming project.
@ykim362 stated that he started working on this a while back, but there's no update from him yet.
Help on finding the root cause is welcome.
My MKLDNN code seems to solve the problem. The output is very deterministic. It's always
1.1635 1.1635
[ 0.]
<NDArray 1 @cpu(0)>
[ 0.]
<NDArray 1 @cpu(0)>
@zheng-da Thanks! I have been also assuming the NDArray re-factoring solves this problem. Does the NDArrady refactoring also enable gluon APIs not currently available for CPU? (Inplace operations)
Most helpful comment
My MKLDNN code seems to solve the problem. The output is very deterministic. It's always