I am running the following code:
import mxnet as mx
import mxnet.gluon as gl
class Foo(gl.nn.HybridBlock):
def __init__(self):
super(Foo,self).__init__()
self.conv = gl.nn.HybridSequential()
filters = [(2,16), (4,32)]
for k, c in filters:
self.conv.add(
gl.nn.Conv2D(c, (1,k), padding=(0,k//2), activation="relu", layout='NCHW')
)
def hybrid_forward(self, F, x, valid_lengths):
mask = F.ones_like(x.slice_axis(axis=1, begin=0, end=1).reshape(shape=(0,-1)))
mask = F.SequenceMask(mask, sequence_length=valid_lengths, use_sequence_length=True, axis=1)
mask = mask.reshape(shape=(0,1,1,-1))
x = F.broadcast_mul(x, mask)
y = self.conv(x)
y = y.reshape(shape=(0,0,-1)).transpose(axes=(0, 2, 1))
masked_y = F.SequenceMask(y, sequence_length=valid_lengths+1, use_sequence_length=True, axis=1,value=-1e7)
output = masked_y.max(axis=1).reshape(shape=(0,0,1))
print(output)
return output
foo = Foo()
foo.collect_params().initialize()
#foo.hybridize()
x = mx.nd.random.randn(2,10,1,20)
valid_lengths = mx.nd.array([20,0])
output = foo(x, valid_lengths)
print(output)
It crashes with the following error message upon printing out an array for debug purposes:
Unsupported MKLDNN dimensions: 3
Inside the code there is a place where I take the output of convolution, of shape N1WC and reshape it to get rid of the extra dimension -> NWC. If I remove that reshape from the code and keep the underlying array 4 dimensions, then the crash goes away.
Here is full stack trace:
Traceback (most recent call last):
File "/Users/sterawls/mxnet-bugs/mkldnn-dim.py", line 37, in <module>
output = foo(x, valid_lengths)
File "/usr/local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 542, in __call__
out = self.forward(*args)
File "/usr/local/lib/python3.7/site-packages/mxnet/gluon/block.py", line 919, in forward
return self.hybrid_forward(ndarray, x, *args, **params)
File "/Users/sterawls/mxnet-bugs/mkldnn-dim.py", line 27, in hybrid_forward
print(output)
File "/usr/local/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 189, in __repr__
return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
File "/usr/local/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 1972, in asnumpy
ctypes.c_size_t(data.size)))
File "/usr/local/lib/python3.7/site-packages/mxnet/base.py", line 251, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:02:28] src/operator/nn/mkldnn/mkldnn_base.cc:290: Unsupported MKLDNN dimensions: 3
Stack trace returned 10 entries:
[bt] (0) 0 libmxnet.so 0x000000010c14e740 libmxnet.so + 26432
[bt] (1) 1 libmxnet.so 0x000000010c14e4ef libmxnet.so + 25839
[bt] (2) 2 libmxnet.so 0x000000010c14bc79 libmxnet.so + 15481
[bt] (3) 3 libmxnet.so 0x000000010c152a93 libmxnet.so + 43667
[bt] (4) 4 libmxnet.so 0x000000010d7c65de MXNDListFree + 1428094
[bt] (5) 5 libmxnet.so 0x000000010c692779 libmxnet.so + 5547897
[bt] (6) 6 libmxnet.so 0x000000010c1540a2 libmxnet.so + 49314
[bt] (7) 7 libmxnet.so 0x000000010d6f1ee2 MXNDListFree + 557954
[bt] (8) 8 libmxnet.so 0x000000010d679b44 MXNDListFree + 65508
[bt] (9) 9 libmxnet.so 0x000000010d67c218 MXNDListFree + 75448
Here is my environment setting (Using a version of mxnet I just installed today:
----------Python Info----------
Version : 3.7.2
Compiler : Clang 9.0.0 (clang-900.0.39.2)
Build : ('default', 'Dec 27 2018 07:35:45')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 18.1
Directory : /usr/local/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.3.1
Directory : /usr/local/lib/python3.7/site-packages/mxnet
Commit Hash : 19c501680183237d52a862e6ae1dc4ddc296305b
----------System Info----------
Platform : Darwin-16.7.0-x86_64-i386-64bit
system : Darwin
node : dca90487206c.ant.amazon.com
release : 16.7.0
version : Darwin Kernel Version 16.7.0: Thu Dec 20 21:53:35 PST 2018; root:xnu-3789.73.31~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0039 sec, LOAD: 0.6373 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0352 sec, LOAD: 0.3969 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0348 sec, LOAD: 0.3390 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0340 sec, LOAD: 0.2251 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0321 sec, LOAD: 0.7873 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0430 sec, LOAD: 0.1886 sec.
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug
@mxnet-label-bot add [Bug]
Not sure how this works. Seeing if I can add a label.
Thanks for reporting this. the error message Unsupported MKLDNN dimensions: 3 is from
mkldnn_memory_format_t GetDefaultFormat(int num_dims) {
switch (num_dims) {
case 1: return mkldnn_x;
case 2: return mkldnn_nc;
case 3: return mkldnn_ncw;
case 4: return mkldnn_nchw;
case 5: return mkldnn_goihw;
default:
LOG(FATAL) << "Unsupported MKLDNN dimensions: " << num_dims;
return mkldnn_format_undef;
}
}
If num_dims is 3, mkldnn_ncw should be returned. This is already fixed in master about a month ago. Can you have a try on the latest nightly build? @xinyu-intel Does this fix merge into 1.3.1?
@stephenrawls Can you try the latest MXNet? #13530 has enabled Conv1d and 3D data layout.
@ZhennanQin No, only in master branch.
Is it in 1.4 branch? I actually changed this code specifically to use 2d convolution because 1d convolution in mxnet 1.3 is super slow on CPUs, so great to hear that a fix is coming.
I'm in a setup where I do need to use a current release of mxnet, but will try building from master and confirming if that fixes the problem. Thanks for the quick response!
@stephenrawls Not in 1.4 branch yet. You can try to build from master and your feedback is welcome:)
Thanks the quick answer from @ZhennanQin @xinyu-intel
@stephenrawls thanks to trying the MKLDNN backend and report the potential issues.
Looking forward to your results with the new conv1d interface (ps, please try to run in the Skylake machine since there're known performance issue on old architecture like broadwell).
Feel free to ping us if anything we can help:)
@mxnet-label-bot add [MKLDNN]
Resolved by the latest master, so I am closing this issue.
Feel free to re-open if there is any other open issue.
Most helpful comment
Is it in 1.4 branch? I actually changed this code specifically to use 2d convolution because 1d convolution in mxnet 1.3 is super slow on CPUs, so great to hear that a fix is coming.
I'm in a setup where I do need to use a current release of mxnet, but will try building from master and confirming if that fixes the problem. Thanks for the quick response!