While testing individual ops for large tensor (dimension >= 2^32) input functionality, I found an error in MKLDNN. Within 3rdparty/mkldnn/src/cpu/gemm/gemm.cpp on line 43 there is a function which takes in several parameters, including M (the variable used to accept the data dimension in the input). M is designated as an int, so when the value 2^32 is passed in as the first dimension of the input data the > 0 assertion on the next line fails (since the int dtype in C++ interprets 2^32 as 0).
Note that this error occurs whenever MKLDNN is enabled - whether the BLAS engine is MKL, OpenBLAS, or none. When MKLDNN is disabled, the error does not occur.
All tests were run on the latest master, building from source.
----------Python Info----------
Version : 3.6.6
Compiler : GCC 7.2.0
Build : ('default', 'Jun 28 2018 17:14:51')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.3.1
Directory : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /home/ubuntu/mxnet/python/mxnet
Num GPUs : 0
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-4.4.0-1098-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-47-40
release : 4.4.0-1098-aws
version : #109-Ubuntu SMP Fri Nov 8 09:30:18 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping: 7
CPU MHz: 2500.000
BogoMIPS: 5000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
Create a Python script with the following content:
import mxnet as mx
print(mx.nd.FullyConnected(data=mx.nd.random_normal(shape=(2**32,1)), weight=mx.nd.random_normal(shape=(1,1)), bias=mx.nd.random_normal(shape=(1,)), flatten=False, num_hidden=1))
and run it with Python3.
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
python3: /home/ubuntu/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm.cpp:43: void dnnl::impl::cpu::msan_unpoison_matrix(void*, int, int, int, size_t): Assertion `C
!= nullptr && M > 0 && N > 0 && LDC >= M && typesize' failed.
Aborted (core dumped)
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
python3: /home/ubuntu/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm.cpp:43: void dnnl::impl::cpu::msan_unpoison_matrix(void*, int, int, int, size_t): Assertion `C
!= nullptr && M > 0 && N > 0 && LDC >= M && typesize' failed.
Aborted (core dumped)
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
python3: /home/ubuntu/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm.cpp:43: void dnnl::impl::cpu::msan_unpoison_matrix(void*, int, int, int, size_t): Assertion `C
!= nullptr && M > 0 && N > 0 && LDC >= M && typesize' failed.
Aborted (core dumped)
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
[[1.1367434]
[1.1367434]
[1.1367434]
...
[1.1367434]
[1.1367434]
[1.1367434]]
<NDArray 4294967296x1 @cpu(0)>
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
[[1.1367434]
[1.1367434]
[1.1367434]
...
[1.1367434]
[1.1367434]
[1.1367434]]
<NDArray 4294967296x1 @cpu(0)>
โ CUDA, โ CUDNN, โ NCCL, โ CUDA_RTC, โ TENSORRT, โ CPU_SSE, โ CPU_SSE2, โ CPU_SSE3, โ CPU_SSE4_1, โ CPU_SSE4_2, โ CPU_SSE4A, โ CPU_AVX, โ CPU_AVX2, โ OPENMP, โ SSE, โ F16C, โ JEMALLOC, โ BLAS_OPEN, โ BLAS_ATLAS, โ BLAS_MKL, โ BLAS_APPLE, โ LAPACK, โ MKLDNN, โ OPENCV, โ CAFFE, โ PROFILER, โ DIST_KVSTORE, โ CXX14, โ INT64_TENSOR_SIZE, โ SIGNAL_HANDLER, โ DEBUG, โ TVM_OP
[[1.1367434]
[1.1367434]
[1.1367434]
...
[1.1367434]
[1.1367434]
[1.1367434]]
<NDArray 4294967296x1 @cpu(0)>
@mxnet-label-bot add [MKLDNN]
@PatricZhao Could your team please take a look at this? Thanks!
@connorgoggins thanks for bringing this up
@PatricZhao @TaoLv looks like blas=MKL/openblas/none(mnative mxnet) and MKLDNN=OFF are supporting gemm on int64 but with MKLDNN its not. If its not a known issue with MKLDNN can you guys please take a look
Thank you for reporting the issue. I will take a look at this. But my initial thought is that MKL-DNN itself already supports int64 shape since the v1.0 upgrading, while I don't think the current integration of MKL/openblas supports int64 GEMM.
I can reproduce the crash.
@TaoLv thanks for taking a look !
@access2rohit @connorgoggins This was confirmed to be a bug of the DNNL library. But we still need to wait for the next release of the library to get the bug fixed.
This issue is resolved in oneDNN v1.4.
@vpirogov Thanks for the update. We will upgrade the 3rdparty dependency soon.
@vpirogov great news! Once this is added to MXNet we can resolve this issue :)
Most helpful comment
I can reproduce the crash.