Incubator-mxnet: [Channel Shuffle / Hard Swish / Hard Sigmoid] running in MKL CPU backend failed

Created on 10 Oct 2019  ยท  19Comments  ยท  Source: apache/incubator-mxnet

Description

I've trained a NAS searched ShuffleNet related model, which contains some rare operators like Channel Shuffle, hard Swish, hard Sigmoid, etc.. It runs fine on both GPU and raw CPU backend but failed (val_acc = 0.0) on MKL CPU backend.

Environment info (Required)

----------Python Info----------
Version      : 3.7.3
Compiler     : Clang 10.0.1 (clang-1001.0.46.3)
Build        : ('default', 'Mar 27 2019 09:23:15')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.0.3
Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet
Commit Hash   : 1d0d1e687fdf436896f8ca106c4915adfd29c8cb
Library      : ['/Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet/libmxnet.so']
Build features:
โœ– CUDA
โœ– CUDNN
โœ– NCCL
โœ– CUDA_RTC
โœ– TENSORRT
โœ” CPU_SSE
โœ” CPU_SSE2
โœ” CPU_SSE3
โœ” CPU_SSE4_1
โœ” CPU_SSE4_2
โœ– CPU_SSE4A
โœ” CPU_AVX
โœ– CPU_AVX2
โœ– OPENMP
โœ– SSE
โœ– F16C
โœ– JEMALLOC
โœ– BLAS_OPEN
โœ– BLAS_ATLAS
โœ– BLAS_MKL
โœ” BLAS_APPLE
โœ” LAPACK
โœ” MKLDNN
โœ” OPENCV
โœ– CAFFE
โœ– PROFILER
โœ” DIST_KVSTORE
โœ– CXX14
โœ– INT64_TENSOR_SIZE
โœ” SIGNAL_HANDLER
โœ– DEBUG
โœ– TVM_OP
----------System Info----------
Platform     : Darwin-18.2.0-x86_64-i386-64bit
system       : Darwin
node         : yaoxis-MacBook-Pro.local
release      : 18.2.0
version      : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0358 sec, LOAD: 0.6333 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0269 sec, LOAD: 0.1421 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0581 sec, LOAD: 0.2112 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0193 sec, LOAD: 0.1315 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0483 sec, LOAD: 0.6375 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0008 sec, LOAD: 0.2240 sec.
----------Environment----------

Background

This ShuffleNet related model has been built by:

| Layers | Ops
| :--------------------- | :-----: |
|Common ops | conv, BN, Activation('relu') |
|Concat | concat |
|Shuffle Channel & Slice | reshape-swapaxes-reshape-slice |
|Hard swish | plusscalar-clip-divscalar-mul |
|Hard sigmoid | plusscalar-clip-divscalar |
|Global Average Pooling | pool |

For how the Shuffle Channel is implemented:

alt text

Error Message

The model was planned to be quantized by using MXNet 1.6.0 (master) quantization tool. The "error" occurs when trying to use MKL backend to run the raw model before quantization as well as the quantized model.

INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)
  • Raw model: while using the same model and same code but with MXNet-mkl, it failed:
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)
  • Quantized model: interestingly, with MXNet-mkl, the quantization process works smoothly and generates a quantized model. But when using imagenet_inference.py to verify the quantized model's performance, it failed again just like the raw model before quantization.
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)

Steps to reproduce:

  1. Clone the code and the model has been put in there:
git clone https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS.git
  1. Reproduce MXNet CPU only without MKL:
pip install mxnet --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu

...
# output would be like:
INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)
  1. Reproduce MXNet-mkl with failed validation accuracy:
# clean the previous no mkl MXNet
pip uninstall mxnet
pip install mxnet-mkl --pre
cd MXNet-Single-Path-One-Shot-NAS/quantization/
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu

...
# output would be like:
INFO:logger:('accuracy', 0.0)
INFO:logger:('top_k_accuracy_5', 0.003125)
MKLDNN Quantization

All 19 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Bug

Thank to report the issue @ZhennanQin will take a look :)

This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.

This bug can be reproduced locally, and found the root cause. Internal patch is ready, need more time for verification.

Hi @ZhennanQin, thanks for the prompt response. I'm participating a competition and desperately need this tool to quantize. Could you please let me know whether there could be a quick fix / walk-around?

@CanyonWind it's great to know you're using MXNet Quantization solution in your competition.
Could you cite the MXNet and the related quantization solution in your further publication?

Sure, I'm happy to do that. Could you please let me know which quantization method (paper) you are using in the MKL so that I could cite it. Thanks

@CanyonWind https://github.com/apache/incubator-mxnet/pull/16455 is created to address the fp32 accuracy issue. With this patch, your command can provide below accuracy result:

INFO:logger:('accuracy', 0.771875)
INFO:logger:('top_k_accuracy_5', 0.909375)

For quantization, I've tried, but still got 0 acc due to some mkldnn correctness issue. We need to wait mkldnn team to fix that. This may take a few days or weeks. Sorry about that.

Reopen to wait for quantization fix

@pengzhao-intel Got it. Thanks

@ZhennanQin Thanks for the help! I will give it a try. A quick question is that is this problem caused by reshaping? If I can find some way to avoid using it in the model, should the quantization work?

for quantization, it's not about reshape. there're some int8 convolutions produces wrong result, causing acc to 0. Currently the only workaround solution is skipping quantize those trouble convolution, I can help to identify those layers in your model when I have time.

Thank you very much for taking the time to help. I will try it too and let you know if there is any update.

@ZhennanQin does MKLDNN 1.0 upgrade fix this problem?

With https://github.com/apache/incubator-mxnet/pull/16734 merged, the computation error is fixed. Then you can have a try for intel int8 solution.

Hi @ZhennanQin, thanks a lot for your effort!
I tried to verify the quantized model's performance with the nightly built release (mxnet-mkl-1.6.0b20191107, the merge commit was completed on 1106 so I assumed this release already contains the updated codes) on the Mac to get a quick result.

git clone https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet.git
cd Single-Path-One-Shot-NAS-MXNet
python3 -m venv env
source env/bin/activate
pip install mxnet-mkl --pre
cd quantization

I've tried the following:
With calib-mode: none

# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=none
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.009375

With calib-mode: naive

# quantize model
python quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=naive
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-naive-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# accuracy: 0.003125

With calib-mode: entropy

# quantize model
python3 quantize_mkldnn.py --model=ShuffleNas_fixArch --num-calib-batches=5 --calib-mode=entropy
# verify performance
python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-quantized-5batches-entropy-symbol.json --param-file model/ShuffleNas_fixArch-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
# error was thrown when doing inference

Could you please guide me how did you verify the quantization accuracy or could you please try any of the above quantization procedure (it wouldn't take more than 10min to finish) at your convenience?

Thanks again for your generous help, I do appreciate it a lot!

@CanyonWind Below PR is created as a demo for CPU quantization.
https://github.com/CanyonWind/Single-Path-One-Shot-NAS-MXNet/pull/12

I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.

Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine

Thanks a lot for the help again @pengzhao-intel @ZhennanQin!

I've tried the updates from the pull request, it works like a charm. Seems like the degraded quantization performance comes from the model precision itself and the large kernel dw conv.

Though I still have some questions about why and how these factors influence the quantization performance, they are not related to this issue anymore. Hi @ZhennanQin, I might throw some educational questions on the pr thread after more experiments. If you got some time in the future, any quick answer would be appreciated and it might save a good amount time of mine

Thanks a lot for the help again @pengzhao-intel @ZhennanQin!

Sure. You can send the mail to me ([email protected]) and I will add Zhennan in the loop.

That would be great, thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dmadeka picture dmadeka  ยท  3Comments

dushoufu picture dushoufu  ยท  3Comments

Fzz123 picture Fzz123  ยท  3Comments

Shiro-LK picture Shiro-LK  ยท  3Comments

xzqjack picture xzqjack  ยท  3Comments