Setting MXNET_MKLDNN_DEBUG=1 as environment variable will produce the following error in tests. This happens across all configurations and seeds. I do not think that this is a test failure.
======================================================================
ERROR: test_gluon_model_zoo.test_models
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new
orig_test(*args, **kwargs)
File "/work/mxnet/tests/python/unittest/test_gluon_model_zoo.py", line 50, in test_models
model(mx.nd.random.uniform(shape=data_shape)).wait_to_read()
File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 1650, in wait_to_read
check_call(_LIB.MXNDArrayWaitToRead(self.handle))
File "/work/mxnet/python/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [17:10:12] src/operator/nn/mkldnn/mkldnn_base.cc:395: Check failed: similar
Stack trace returned 10 entries:
[bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f06ccf3745b]
[bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f06ccf38478]
[bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::OpCheck::Run(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x3ca8) [0x7f06ccf54198]
[bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2a910d9) [0x7f06cf55a0d9]
[bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFComputeEx(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x7c) [0x7f06cf77608c]
[bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3148fdb) [0x7f06cfc11fdb]
[bt] (6) /work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0xcb5) [0x7f06cfc0b1a5]
[bt] (7) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0xd9) [0x7f06cfc1d309]
[bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7f06cfc1c43a]
[bt] (9) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f06d7ca4c80]
-------------------- >> begin captured stdout << ---------------------
ResNetV1(
(features): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False)
(4): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(5): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(6): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(7): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(8): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True)
)
(output): Dense(512 -> 1000, linear)
)
ResNetV1(
(features): HybridSequential(
(0): Conv2D(None -> 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): MaxPool2D(size=(3, 3), stride=(2, 2), padding=(1, 1), ceil_mode=False)
(4): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(5): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(64 -> 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(3): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(6): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(128 -> 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(3): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(4): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(5): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(7): HybridSequential(
(0): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
(downsample): HybridSequential(
(0): Conv2D(256 -> 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(1): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
(2): BasicBlockV1(
(body): HybridSequential(
(0): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
(2): Activation(relu)
(3): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm(fix_gamma=False, use_global_stats=False, eps=1e-05, momentum=0.9, axis=1, in_channels=None)
)
)
)
(8): GlobalAvgPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True)
)
(output): Dense(512 -> 1000, linear)
)
--------------------- >> end captured stdout << ----------------------
-------------------- >> begin captured logging << --------------------
common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1825457337 to reproduce.
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1579343143 to reproduce.
--------------------- >> end captured logging << ---------------------
@marcoabreu @cjolivier01 This is a very useful function to verify the correctness of MKL-DNN OP.
After debugging, there are two types of failures in DEBUG mode.
1. Numerical precision for convolution @cjolivier01
The current both rtol and atol in SimilarArray are 1e-3.
In the fail case, the two data are 0.67509 and 0.677384. Because these two numbers are small, the atol is larger than 1e-3. But I think this difference is acceptable.
I suggest that we change the atol to 1e-2.
[15:28:36] src/operator/nn/mkldnn/mkldnn_base.cc:346: data1[i]: 0.675079 data2[i]: 0.677384
[15:28:36] src/operator/nn/mkldnn/mkldnn_base.cc:347: atol + rtol * std::abs(data2[i]):0.00167738Abs(abs(data1[i] - data2[i] )= 0.002305
if (std::abs(data1[i] - data2[i]) > atol + rtol * std::abs(data2[i])) return false
2. The Flatten in gluon model should call symbolic API rather than implemented it by reshape @piiswrong
https://github.com/apache/incubator-mxnet/blob/46e47cbc6183d2812a2e405851f0b209383e72ad/python/mxnet/gluon/nn/basic_layers.py#L408
I suggest changing this to F.Flatten(x) so that the MKL-DNN flatten function will be used.
After I changed these two, all cases passed.
[patric@mlt-skx080 master]$ git diff
diff --git a/3rdparty/mkldnn b/3rdparty/mkldnn
--- a/3rdparty/mkldnn
+++ b/3rdparty/mkldnn
@@ -1 +1 @@
-Subproject commit f5218ff4fd2d16d13aada2e632afd18f2514fee3
+Subproject commit f5218ff4fd2d16d13aada2e632afd18f2514fee3-dirty
diff --git a/python/mxnet/gluon/nn/basic_layers.py b/python/mxnet/gluon/nn/basic_layers.py
index 3801c84..b7bba8a 100644
--- a/python/mxnet/gluon/nn/basic_layers.py
+++ b/python/mxnet/gluon/nn/basic_layers.py
@@ -405,7 +405,7 @@ class Flatten(HybridBlock):
super(Flatten, self).__init__(**kwargs)
def hybrid_forward(self, F, x):
- return x.reshape((0, -1))
+ return F.Flatten(x)
def __repr__(self):
return self.__class__.__name__
diff --git a/src/operator/nn/mkldnn/mkldnn_base.cc b/src/operator/nn/mkldnn/mkldnn_base.cc
index 820cca1..c9582cd 100644
--- a/src/operator/nn/mkldnn/mkldnn_base.cc
+++ b/src/operator/nn/mkldnn/mkldnn_base.cc
@@ -388,7 +388,7 @@ void OpCheck::Run(mxnet::FCompute fn, const nnvm::NodeAttrs &attrs,
if (req[i] == kNullOp)
continue;
MSHADOW_TYPE_SWITCH(outputs[i].dtype(), DType, {
- bool similar = SimilarArray<DType>(outputs[i], outputs_[i], 1e-3, 1e-3);
+ bool similar = SimilarArray<DType>(outputs[i], outputs_[i], 1e-3, 1e-2);
if (!similar) {
LOG(ERROR) << attrs.op->name << " fails";
}
3. The bug in x.reshap((0, -1)) @zheng-da
In theory, this implementation should work with MKL-DNN type as well. But it doesn't.
I think there is a bug. I am still debugging this now.
@marcoabreu We got the root cause of 3) in above comments. It's not the MKL-DNN implementation issues. Just need to improve the test method under MXNET_MKLDNN_DEBUG
@cjolivier01 Thanks for the nice functionality to check the results of MKL-DNN.
We found there is a special situation which needs to be improved as 3) in my above comments.
The reshape is executed to change the view of NDAarry but the real data don't change yet.
When come to debug mode, OpCheck tries to get the MKLDNN memory without checking if it is a view, so the case fails.
So, a possible fix for OpCheck.Init, @zheng-da please help take a review.
+ //auto mem = inputs_[i].GetMKLDNNData();
+ NDArray data = inputs_[i];
+ const TShape& ishape = inputs_[i].shape();
+ if (data.IsMKLDNNData() && data.IsView())
+ data = data.MKLDNNDataReshape(Shape2(ishape.ProdShape(0, ishape.ndim()-1),
+ ishape[ishape.ndim()-1]));
+ auto mem = data.GetMKLDNNData();
When I wrote the test, I followed the python test. https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L470
When assert_almost_equal is called https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/test_utils.py#L1313, it uses 1e-3 for both rtol and atol. I didn't know why the test fails.
As for modifying OpCheck.Init, you can do
if (data.IsMKLDNNData() && data.IsView())
data = in_data[fullc::kData].Reorder2Default();
Please see the example here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/mkldnn/mkldnn_fully_connected.cc#L95
This is how we should deal with reshaped MKLDNN NDArray. Unfortunately, we can't do in-place layout conversion in the NDArray, which caused a race condition.
@zheng-da I have looked into Reorder2Default, but it will also convert to the original shape rather than the new shape of 'reshape'.
And that's another point we want to improve later.
@pengzhao-intel I believe when you call copyfrom it will convert the input memory shape into the same shape as the target. so if you Reorder2Default but then call copyfrom the mkldnn memory will be the new shape
NDArray data = inputs_[i];
inputs.emplace_back(data.shape(), ctx, false, data.dtype());
if (data.IsMKLDNNData() && data.IsView())
data = in_data[fullc::kData].Reorder2Default();
auto mem = inputs_[i].GetMKLDNNData();
inputs[i].CopyFrom(*mem);
above PR addresses issue. @marcoabreu can you close?
Most helpful comment
@marcoabreu We got the root cause of 3) in above comments. It's not the MKL-DNN implementation issues. Just need to improve the test method under MXNET_MKLDNN_DEBUG
@cjolivier01 Thanks for the nice functionality to check the results of MKL-DNN.
We found there is a special situation which needs to be improved as 3) in my above comments.
The
reshapeis executed to change the view of NDAarry but the real data don't change yet.When come to debug mode, OpCheck tries to get the MKLDNN memory without checking if it is a view, so the case fails.
https://github.com/apache/incubator-mxnet/blob/bd9b9c8b76d68b2b7cd957dc0bd07fb4fbc29c4c/src/operator/nn/mkldnn/mkldnn_base.cc#L357
So, a possible fix for OpCheck.Init, @zheng-da please help take a review.