I tried to use the newly added Graph Optimization in MKDNN backend: Graph optimization and Quantization (experimental), but it makes error which is fine without set export MXNET_SUBGRAPH_BACKEND=MKLDNN.
----------Python Info----------
('Version :', '2.7.15')
('Compiler :', 'GCC 7.2.0')
('Build :', ('default', 'May 1 2018 23:32:55'))
('Arch :', ('64bit', ''))
------------Pip Info-----------
('Version :', '10.0.1')
('Directory :', '/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version :', '1.5.0')
('Directory :', '/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet')
('Commit Hash :', '655f1c6f7a0706dd622f73db9af2e6df895ca213')
----------System Info----------
('Platform :', 'Linux-4.4.0-1072-aws-x86_64-with-debian-stretch-sid')
('system :', 'Linux')
('node :', 'ip-172-31-10-142')
('release :', '4.4.0-1072-aws')
('version :', '#82-Ubuntu SMP Fri Nov 2 15:00:21 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping: 3
CPU MHz: 3000.000
BogoMIPS: 6000.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 25344K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0014 sec, LOAD: 0.5628 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0022 sec, LOAD: 0.1005 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0281 sec, LOAD: 0.1317 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0025 sec, LOAD: 0.0809 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0474 sec, LOAD: 0.2117 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1398 sec, LOAD: 0.5508 sec.
[07:06:46] src/operator/subgraph/mkldnn/mkldnn_conv_property.cc:138: Start to execute MKLDNN Convolution optimization pass.
Traceback (most recent call last):
File "ecgdetector_rnn.py", line 14, in
rnn_result , shape_main, shape_cfg, time_infos, args = rnn_process(parser)
File "/data/cardio_workspace/cardio_deploy/rnn_process.py", line 64, in rnn_process
mxnet = MxnetModel(args.configfile, args.archfile, edf_category, is_hrnn, buckets)
File "/data/cardio_workspace/cardio_deploy/ecg_script_frequency.py", line 282, in __init__
for_training=True)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/module/bucketing_module.py", line 343, in bind
force_rebind=False, shared_module=None, grad_req=grad_req)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/module/module.py", line 429, in bind
state_names=self._state_names)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 279, in __init__
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 375, in bind_exec
shared_group))
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 662, in _bind_ith_exec
shared_buffer=shared_data_arrays, **input_shapes)
File "/home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1529, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
category: (900,)
data: (900, 137, 9)
label: (900, 2)
[07:06:46] src/pass/gradient.cc:192: Operator _sg_mkldnn_conv is non-differentiable because it didn't register FGradient attribute.
Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x21f5a4) [0x7fd1060045a4]
[bt] (1) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x21f981) [0x7fd106004981]
[bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x37553df) [0x7fd10953a3df]
[bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2d7ce11) [0x7fd108b61e11]
[bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x373d6ed) [0x7fd1095226ed]
[bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2ac0197) [0x7fd1088a5197]
[bt] (6) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2b41d16) [0x7fd108926d16]
[bt] (7) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2b42c8e) [0x7fd108927c8e]
[bt] (8) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2b4372f) [0x7fd10892872f]
[bt] (9) /home/ubuntu/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map
You need to set grad_req=null in simple_bind so that only forward graph is constructed.
thank you for the fast feed back, I'll try thank you
This feature is only for the inference.
https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md#6
Maybe we need to improve the error message to avoid the confusion :)
@ZhennanQin @xinyu-intel please help take a look and see if we can improve the error message. Thanks.
@reminisce Thank you for the right answer, I loaded using for_training=True in somewhere in code, then predict with is_train=False. So it made the problem.
And I agree with @pengzhao-intel, it needs improvement in the error message.
Thank you for the quick and kind feed back @reminisce,@pengzhao-intel!
@Soonhwan-Kwon thanks to trying the new features of MKLDNN.
Feel free to ping me or drop a mail for any question or request 馃憤
@pengzhao-intel Thanks for the great feature, it seems that the task time reduced to 112 seconds, in 117 seconds long task that already using MKL.
@Soonhwan-Kwon Delete the previous comments, I think I have a wrong understanding.
The current flow is CNN friendly and most of fusion pattern in CNN are covered, like conv + bn, conv + relu, etc. But for other type of the fusion for non-CNN, like FC + activiation, is still on the developments.
If you can share the basic block of your network, we'd like to take a look :)
@pengzhao-intel thank you for the deep dive explanation and our team at samsungsds is interested in quantization also, and soon we'll try out int8 performance on our model, so your previous answer was quite useful. I'll report back as soon as we complete the test, again thank you for your kind comments.