Incubator-mxnet: [Channel Shuffle / Hard Swish / Hard Sigmoid] running in MKL CPU backend failed

Created on 10 Oct 2019 · 19Comments · Source: apache/incubator-mxnet

Description

I've trained a NAS searched ShuffleNet related model, which contains some rare operators like Channel Shuffle, hard Swish, hard Sigmoid, etc.. It runs fine on both GPU and raw CPU backend but failed (val_acc = 0.0) on MKL CPU backend.

Environment info (Required)

----------Python Info----------
Version      : 3.7.3
Compiler     : Clang 10.0.1 (clang-1001.0.46.3)
Build        : ('default', 'Mar 27 2019 09:23:15')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.0.3
Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet
Commit Hash   : 1d0d1e687fdf436896f8ca106c4915adfd29c8cb
Library      : ['/Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✖ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✔ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform     : Darwin-18.2.0-x86_64-i386-64bit
system       : Darwin
node         : yaoxis-MacBook-Pro.local
release      : 18.2.0
version      : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0358 sec, LOAD: 0.6333 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0269 sec, LOAD: 0.1421 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0581 sec, LOAD: 0.2112 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0193 sec, LOAD: 0.1315 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0483 sec, LOAD: 0.6375 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0008 sec, LOAD: 0.2240 sec.
----------Environment----------

Background

This ShuffleNet related model has been built by:

For how the Shuffle Channel is implemented:

alt text

Error Message

The model was planned to be quantized by using MXNet 1.6.0 (master) quantization tool. The "error" occurs when trying to use MKL backend to run the raw model before quantization as well as the quantized model.

Raw model: when using imagenet_inference.py with MXNet CPU only (no MKL), it works fine

INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)

Raw model: while using the same model and same code but with MXNet-mkl, it failed:

INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)

Quantized model: interestingly, with MXNet-mkl, the quantization process works smoothly and generates a quantized model. But when using imagenet_inference.py to verify the quantized model's performance, it failed again just like the raw model before quantization.

INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)

Steps to reproduce:

Clone the code and the model has been put in there:

git clone https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS.git

Reproduce MXNet CPU only without MKL:

pip install mxnet --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu

...
# output would be like:
INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)

Reproduce MXNet-mkl with failed validation accuracy:

# clean the previous no mkl MXNet
pip uninstall mxnet

pip install mxnet-mkl --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu

...
# output would be like:
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)

MKLDNN Quantization

Source

CanyonWind

All 19 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Bug

mxnet-label-bot on 10 Oct 2019

Thank to report the issue @ZhennanQin will take a look :)

pengzhao-intel on 11 Oct 2019

This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.

ZhennanQin on 11 Oct 2019

This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.

Hi @ZhennanQin, thanks for the prompt response. I'm participating a competition and desperately need this tool to quantize. Could you please let me know whether there could be a quick fix / walk-around?

CanyonWind on 11 Oct 2019

@CanyonWind it's great to know you're using MXNet Quantization solution in your competition.
Could you cite the MXNet and the related quantization solution in your further publication?

pengzhao-intel on 11 Oct 2019

Sure, I'm happy to do that. Could you please let me know which quantization method (paper) you are using in the MKL so that I could cite it. Thanks

CanyonWind on 12 Oct 2019

👍1

Please cite the blog, https://medium.com/apache-mxnet/model-quantization-for-production-level-neural-network-inference-f54462ebba05

pengzhao-intel on 12 Oct 2019

@CanyonWind https://github.com/apache/incubator-mxnet/pull/16455 is created to address the fp32 accuracy issue. With this patch, your command can provide below accuracy result:

INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)

For quantization, I've tried, but still got 0 acc due to some mkldnn correctness issue. We need to wait mkldnn team to fix that. This may take a few days or weeks. Sorry about that.

ZhennanQin on 12 Oct 2019

Reopen to wait for quantization fix

pengzhao-intel on 12 Oct 2019

@pengzhao-intel Got it. Thanks

@ZhennanQin Thanks for the help! I will give it a try. A quick question is that is this problem caused by reshaping? If I can find some way to avoid using it in the model, should the quantization work?

CanyonWind on 12 Oct 2019

for quantization, it's not about reshape. there're some int8 convolutions produces wrong result, causing acc to 0. Currently the only workaround solution is skipping quantize those trouble convolution, I can help to identify those layers in your model when I have time.

ZhennanQin on 13 Oct 2019

Thank you very much for taking the time to help. I will try it too and let you know if there is any update.

CanyonWind on 14 Oct 2019

👍1

@ZhennanQin does MKLDNN 1.0 upgrade fix this problem?

pengzhao-intel on 1 Nov 2019

With https://github.com/apache/incubator-mxnet/pull/16734 merged, the computation error is fixed. Then you can have a try for intel int8 solution.

ZhennanQin on 7 Nov 2019

👍1

Hi @ZhennanQin, thanks a lot for your effort!
I tried to verify the quantized model's performance with the nightly built release (mxnet-mkl-1.6.0b20191107, the merge commit was completed on 1106 so I assumed this release already contains the updated codes) on the Mac to get a quick result.

git clone https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet.git
cd Single-Path-One-Shot-NAS-MXNet
python3 -m venv env
source env/bin/activate
pip install mxnet-mkl --pre
cd quantization

I've tried the following:
With calib-mode: none

# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=none
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.009375

With calib-mode: naive

# quantize model
python quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=naive
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-naive-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.003125

With calib-mode: entropy

# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=entropy
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-entropy-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# error was thrown when doing inference

Could you please guide me how did you verify the quantization accuracy or could you please try any of the above quantization procedure (it wouldn't take more than 10min to finish) at your convenience?

Thanks again for your generous help, I do appreciate it a lot!

CanyonWind on 7 Nov 2019

@CanyonWind Below PR is created as a demo for CPU quantization.
https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/pull/12

ZhennanQin on 8 Nov 2019

I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.

Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine

Thanks a lot for the help again @pengzhao-intel @ZhennanQin!

CanyonWind on 12 Nov 2019

I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.

Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine

Thanks a lot for the help again @pengzhao-intel @ZhennanQin!

Sure. You can send the mail to me ([email protected]) and I will add Zhennan in the loop.

pengzhao-intel on 12 Nov 2019

That would be great, thanks!

CanyonWind on 12 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings