I've trained a NAS searched ShuffleNet related model, which contains some rare operators like Channel Shuffle, hard Swish, hard Sigmoid, etc.. It runs fine on both GPU and raw CPU backend but failed (val_acc = 0.0) on MKL CPU backend.
----------Python Info----------
Version : 3.7.3
Compiler : Clang 10.0.1 (clang-1001.0.46.3)
Build : ('default', 'Mar 27 2019 09:23:15')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.0.3
Directory : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet
Commit Hash : 1d0d1e687fdf436896f8ca106c4915adfd29c8cb
Library : ['/Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
โ CUDA
โ CUDNN
โ NCCL
โ CUDA_RTC
โ TENSORRT
โ CPU_SSE
โ CPU_SSE2
โ CPU_SSE3
โ CPU_SSE4_1
โ CPU_SSE4_2
โ CPU_SSE4A
โ CPU_AVX
โ CPU_AVX2
โ OPENMP
โ SSE
โ F16C
โ JEMALLOC
โ BLAS_OPEN
โ BLAS_ATLAS
โ BLAS_MKL
โ BLAS_APPLE
โ LAPACK
โ MKLDNN
โ OPENCV
โ CAFFE
โ PROFILER
โ DIST_KVSTORE
โ CXX14
โ INT64_TENSOR_SIZE
โ SIGNAL_HANDLER
โ DEBUG
โ TVM_OP
----------System Info----------
Platform : Darwin-18.2.0-x86_64-i386-64bit
system : Darwin
node : yaoxis-MacBook-Pro.local
release : 18.2.0
version : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0358 sec, LOAD: 0.6333 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0269 sec, LOAD: 0.1421 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0581 sec, LOAD: 0.2112 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0193 sec, LOAD: 0.1315 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0483 sec, LOAD: 0.6375 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0008 sec, LOAD: 0.2240 sec.
----------Environment----------
This ShuffleNet related model has been built by:
| Layers | Ops
| :--------------------- | :-----: |
|Common ops | conv, BN, Activation('relu') |
|Concat | concat |
|Shuffle Channel & Slice | reshape-swapaxes-reshape-slice |
|Hard swish | plusscalar-clip-divscalar-mul |
|Hard sigmoid | plusscalar-clip-divscalar |
|Global Average Pooling | pool |
For how the Shuffle Channel is implemented:

The model was planned to be quantized by using MXNet 1.6.0 (master) quantization tool. The "error" occurs when trying to use MKL backend to run the raw model before quantization as well as the quantized model.
INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)
git clone https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS.git
pip install mxnet --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
...
# output would be like:
INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)
# clean the previous no mkl MXNet
pip uninstall mxnet
pip install mxnet-mkl --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
...
# output would be like:
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Bug
Thank to report the issue @ZhennanQin will take a look :)
This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.
This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.
Hi @ZhennanQin, thanks for the prompt response. I'm participating a competition and desperately need this tool to quantize. Could you please let me know whether there could be a quick fix / walk-around?
@CanyonWind it's great to know you're using MXNet Quantization solution in your competition.
Could you cite the MXNet and the related quantization solution in your further publication?
Sure, I'm happy to do that. Could you please let me know which quantization method (paper) you are using in the MKL so that I could cite it. Thanks
@CanyonWind https://github.com/apache/incubator-mxnet/pull/16455 is created to address the fp32 accuracy issue. With this patch, your command can provide below accuracy result:
INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)
For quantization, I've tried, but still got 0 acc due to some mkldnn correctness issue. We need to wait mkldnn team to fix that. This may take a few days or weeks. Sorry about that.
Reopen to wait for quantization fix
@pengzhao-intel Got it. Thanks
@ZhennanQin Thanks for the help! I will give it a try. A quick question is that is this problem caused by reshaping? If I can find some way to avoid using it in the model, should the quantization work?
for quantization, it's not about reshape. there're some int8 convolutions produces wrong result, causing acc to 0. Currently the only workaround solution is skipping quantize those trouble convolution, I can help to identify those layers in your model when I have time.
Thank you very much for taking the time to help. I will try it too and let you know if there is any update.
@ZhennanQin does MKLDNN 1.0 upgrade fix this problem?
With https://github.com/apache/incubator-mxnet/pull/16734 merged, the computation error is fixed. Then you can have a try for intel int8 solution.
Hi @ZhennanQin, thanks a lot for your effort!
I tried to verify the quantized model's performance with the nightly built release (mxnet-mkl-1.6.0b20191107, the merge commit was completed on 1106 so I assumed this release already contains the updated codes) on the Mac to get a quick result.
git clone https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet.git
cd Single-Path-One-Shot-NAS-MXNet
python3 -m venv env
source env/bin/activate
pip install mxnet-mkl --pre
cd quantization
I've tried the following:
With calib-mode: none
# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=none
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.009375
With calib-mode: naive
# quantize model
python quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=naive
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-naive-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.003125
With calib-mode: entropy
# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=entropy
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-entropy-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# error was thrown when doing inference
Could you please guide me how did you verify the quantization accuracy or could you please try any of the above quantization procedure (it wouldn't take more than 10min to finish) at your convenience?
Thanks again for your generous help, I do appreciate it a lot!
@CanyonWind Below PR is created as a demo for CPU quantization.
https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/pull/12
I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.
Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine
Thanks a lot for the help again @pengzhao-intel @ZhennanQin!
I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.
Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine
Thanks a lot for the help again @pengzhao-intel @ZhennanQin!
Sure. You can send the mail to me ([email protected]) and I will add Zhennan in the loop.
That would be great, thanks!