when MXNet is built for CPU MKL slice operator doesn't work.
could not initialize a sub-memory
Use MXNET cpu build with MKL and MKLDNN enabled from master
(Paste the commands you ran that produced the error.)
MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_sliceUbuntu 16.04 DeepLearning AMI
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
----------Python Info----------
Version : 3.6.4
Compiler : GCC 7.2.0
Build : ('default', 'Jan 16 2018 18:10:19')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 18.0
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Version : 1.6.0
Directory : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.4.0-1095-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-82-110
release : 4.4.0-1095-aws
version : #106-Ubuntu SMP Wed Sep 18 13:33:48 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2700.882
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.08
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0127 sec, LOAD: 0.4722 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0003 sec, LOAD: 0.3578 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0976 sec, LOAD: 0.0698 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0259 sec, LOAD: 0.1256 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1084 sec, LOAD: 0.1252 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0288 sec, LOAD: 0.4309 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0025 sec, LOAD: 0.0944 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0015 sec, LOAD: 0.0324 sec.
@pengzhao-intel @TaoLv
@mxnet-label-bot add [MKLDNN]
@rongzha1 @wuxun-zhang could you take a look ASAP?
@access2rohit is this a necessary part for r1.6 or we can fix in master?
@pengzhao-intel I am looking into this.
Can't reproduce this case in our skylake machine . Will keep debug
(mxnet) [rongzha1@mlt-skx141 rong_git_mxnet]$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/nightly/test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel(R) MKL-DNN v1.0.0 (Git Hash 553c23fc020dfda19f8145e92e57b0e40ecdff56),Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,create,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.00610352
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.092041
ok
@access2rohit could you add some bt info? thanks
@rongzha1 Have you ever tried to build MXNet with USE_INT64_TENSOR_SIZE=1?
@TaoLv Yes, we have enabled the int64 flag, building command is make USE_OPENCV=1 USE_MKLDNN=1 USE_BLAS=mkl USE_INT64_TENSOR_SIZE=1 USE_INTEL_PATH=/opt/intel -j.
I also cannot reproduce this issue. However, I found another bug about the offset assignment in slice and will file a PR soon.
Also tested with AWS EC2 m5.8 instance, and found no error (master commit 3c404a512829d2894ffe3612dc3cb29a12a36597).
ubuntu@ip-172-29-133-38:~/incubator-mxnet/tests/nightly$ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s test_large_array.py:test_slice
test_large_array.test_slice ... mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166
ok
----------------------------------------------------------------------
Ran 1 test in 1.625s
OK
Env Configuration:
----------Python Info----------
Version : 3.6.6
Compiler : GCC 7.2.0
Build : ('default', 'Jun 28 2018 17:14:51')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.3.1
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ubuntu/incubator-mxnet/python/mxnet
Num GPUs : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.4.0-1096-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-29-133-38
release : 4.4.0-1096-aws
version : #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz
Stepping: 4
CPU MHz: 2499.998
BogoMIPS: 4999.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 33792K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0013 sec, LOAD: 0.4353 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0002 sec, LOAD: 0.5455 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1184 sec, LOAD: 0.0858 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0190 sec, LOAD: 0.2150 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0027 sec, LOAD: 0.1508 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0250 sec, LOAD: 0.4320 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0011 sec, LOAD: 0.1035 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0017 sec, LOAD: 0.0272 sec.
@pengzhao-intel I tried with branch v1.6.x @rongzha1 can you try with that branch too.
Also, I never got this message in my run
mkldnn_verbose,info,Intel MKL-DNN v1.0.4 (commit a0a87d662edeef38d01db4ac5dd25f59a1f0881f)
mkldnn_verbose,info,Detected ISA is Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions
mkldnn_verbose,exec,cpu,reorder,simple:any,undef,src_f32:0:blocked:ab:f0 dst_f32::blocked:ab:f0,num:1,1000x49,0.045166
It seems this was run on an instance that supports AVX512. Can you try on an instance that has just AVX2 ?
@access2rohit is this a necessary part for r1.6 or we can fix in master?
@pengzhao-intel
Yes, it is very important for Deep Graph Library(DGL) support that relies heavily on slice operator.
You can try to use export MKLDNN_VERBOSE=1 to get these logs.
Also I just filed a PR related to slice op, but not sure if it will resolve this issue. Could you help double check?
PR: https://github.com/apache/incubator-mxnet/pull/16737
fixes the issue
Glad it works.