Incubator-mxnet: MKLDNN-1.0 doesn't support slice operator for Large Tensor

Created on 5 Nov 2019 · 15Comments · Source: apache/incubator-mxnet

Description

when MXNet is built for CPU MKL slice operator doesn't work.

Error Message

could not initialize a sub-memory

To Reproduce

Use MXNET cpu build with MKL and MKLDNN enabled from master

Steps to reproduce

(Paste the commands you ran that produced the error.)

Run command MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_slice

Environment

Ubuntu 16.04 DeepLearning AMI

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

# paste outputs here

----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.0
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Version      : 1.6.0
Directory    : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1095-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-82-110
release      : 4.4.0-1095-aws
version      : #106-Ubuntu SMP Wed Sep 18 13:33:48 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2700.882
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.08
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0127 sec, LOAD: 0.4722 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3578 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0976 sec, LOAD: 0.0698 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0259 sec, LOAD: 0.1256 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1084 sec, LOAD: 0.1252 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0288 sec, LOAD: 0.4309 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0025 sec, LOAD: 0.0944 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0015 sec, LOAD: 0.0324 sec.

Bug MKLDNN R1.6.0

Source

access2rohit

All 15 comments

@pengzhao-intel @TaoLv

access2rohit on 5 Nov 2019

@mxnet-label-bot add [MKLDNN]

access2rohit on 5 Nov 2019

@rongzha1 @wuxun-zhang could you take a look ASAP?

pengzhao-intel on 6 Nov 2019

@access2rohit is this a necessary part for r1.6 or we can fix in master?

pengzhao-intel on 6 Nov 2019

@pengzhao-intel I am looking into this.

wuxun-zhang on 6 Nov 2019

👍1

Can't reproduce this case in our skylake machine . Will keep debug
(mxnet) [rongzha1@mlt-skx141 rong_git_mxnet]$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel(R) MKL-DNN v1.0.0 (Git Hash 553c23fc020dfda19f8145e92e57b0e40ecdff56),Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,create,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.00610352
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.092041
ok

rongzha1 on 6 Nov 2019

@access2rohit could you add some bt info? thanks

rongzha1 on 6 Nov 2019

@rongzha1 Have you ever tried to build MXNet with USE_INT64_TENSOR_SIZE=1?

TaoLv on 6 Nov 2019

@TaoLv Yes, we have enabled the int64 flag, building command is make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INT64_TENSOR_SIZE=1 USE_INTEL_PATH=/opt/intel -j.

I also cannot reproduce this issue. However, I found another bug about the offset assignment in slice and will file a PR soon.

wuxun-zhang on 6 Nov 2019

Also tested with AWS EC2 m5.8 instance, and found no error (master commit 3c404a512829d2894ffe3612dc3cb29a12a36597).

ubuntu@ip-172-29-133-38:~/incubator-mxnet/tests/nightly$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166
ok

----------------------------------------------------------------------
Ran 1 test in 1.625s

OK

Env Configuration:

----------Python Info----------
Version      : 3.6.6
Compiler     : GCC 7.2.0
Build        : ('default', 'Jun 28 2018 17:14:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs     : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-1096-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-29-133-38
release      : 4.4.0-1096-aws
version      : #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping:              4
CPU MHz:               2499.998
BogoMIPS:              4999.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              33792K
NUMA node0 CPU(s):     0-31
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0013 sec, LOAD: 0.4353 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0002 sec, LOAD: 0.5455 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1184 sec, LOAD: 0.0858 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0190 sec, LOAD: 0.2150 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0027 sec, LOAD: 0.1508 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0250 sec, LOAD: 0.4320 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0011 sec, LOAD: 0.1035 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0017 sec, LOAD: 0.0272 sec.

wuxun-zhang on 6 Nov 2019

@pengzhao-intel I tried with branch v1.6.x @rongzha1 can you try with that branch too.
Also, I never got this message in my run

mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166

It seems this was run on an instance that supports AVX512. Can you try on an instance that has just AVX2 ?

access2rohit on 6 Nov 2019