Trying to import the GPU version of CNTK 2.4 in Python 3.6 results in the following error:
>>> import cntk
terminate called after throwing an instance of 'Xbyak::Error'
what(): internal error
Aborted (core dumped)
CNTK 2.3.1 works fine on the same system.
System information:
The error seems to be from MKL-DNN. @KeDengMS any idea?
I tried CNTK 2.4 in tensorflow/tensorflow:latest-gpu, and didn't hit this issue. The python version is 2.7 there. What's your repro?
The same problem occurs in my system and I'm using the older version 2.3.1 as a temporary measure.
System information:
The error occurs when I just start the Python interpreter and run import cntk:
$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cntk
terminate called after throwing an instance of 'Xbyak::Error'
what(): internal error
Aborted (core dumped)
More system details:
I am facing the same issue here. Can this issue be caused by the CPU. Specifically the Ryzen platform? I tried installing 2.4 on a cloud instance (Intel CPU) and faced no such issue. They (my PC and the cloud instance) were running Ubuntu 16.04 in order to remove any OS related errors that might arise.
System Details (on my PC):
CPU: Ryzen 1600
GPU: GTX 1060 6GB
OS: Ubutu 16.04
Python: Miniconda with a conda env of python 3.5
Wheel installed: CPU-Only/cntk-2.4-cp35-cp35m-linux_x86_64.whl
2.4 updated MKL version to mkl-dnn 0.12. It seems MKL-DNN has some checks that failed and throw Xbyak::Error when running on Ryzen. CNTK's build/test environment does not have Ryzen CPUs so this is not caught before release.
You may compile CNTK from source and remove -DUSE_MKLDNN from https://github.com/Microsoft/CNTK/blob/master/Makefile#L174
Ok.. So i tried installing mkl-dnn as instructed in their readme inside a ubuntu 16.04 docker container. The build completed with no errors.
[ 96%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/conv_aux.cpp.o
[ 97%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/ref_conv.cpp.o
[ 98%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/bench_conv.cpp.o
[ 98%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/perf_report.cpp.o
[ 99%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/cfg.cpp.o
[ 99%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/common.cpp.o
[100%] Linking CXX executable benchdnn
[100%] Built target benchdnn
Although, on running make test all the 32 test cases failed.
The following tests FAILED:
1 - simple-net-c (OTHER_FAULT)
2 - simple-net-cpp (OTHER_FAULT)
3 - simple-training-net-c (OTHER_FAULT)
4 - simple-training-net-cpp (OTHER_FAULT)
5 - api-c (OTHER_FAULT)
6 - test_c_symbols-c (OTHER_FAULT)
7 - test_iface_pd_iter (OTHER_FAULT)
8 - test_iface_attr (OTHER_FAULT)
9 - test_sum (OTHER_FAULT)
10 - test_reorder (OTHER_FAULT)
11 - test_concat (OTHER_FAULT)
12 - test_softmax_forward (OTHER_FAULT)
13 - test_eltwise (OTHER_FAULT)
14 - test_relu (OTHER_FAULT)
15 - test_lrn_forward (OTHER_FAULT)
16 - test_lrn_backward (OTHER_FAULT)
17 - test_pooling_forward (OTHER_FAULT)
18 - test_pooling_backward (OTHER_FAULT)
19 - test_batch_normalization (OTHER_FAULT)
20 - test_inner_product_forward (OTHER_FAULT)
21 - test_inner_product_backward_data (OTHER_FAULT)
22 - test_inner_product_backward_weights (OTHER_FAULT)
23 - test_convolution_format_any (OTHER_FAULT)
24 - test_convolution_forward_f32 (OTHER_FAULT)
25 - test_convolution_forward_s16s16s32 (OTHER_FAULT)
26 - test_convolution_forward_u8s8s32 (OTHER_FAULT)
27 - test_convolution_forward_u8s8fp (OTHER_FAULT)
28 - test_convolution_relu_forward_f32 (OTHER_FAULT)
29 - test_convolution_relu_forward_s16s16s32 (OTHER_FAULT)
30 - test_convolution_backward_data_f32 (OTHER_FAULT)
31 - test_convolution_backward_data_s16s16s32 (OTHER_FAULT)
32 - test_convolution_backward_weights (OTHER_FAULT)
Errors while running CTest
Makefile:127: recipe for target 'test' failed
make: *** [test] Error 8
Seems like a compatibility issue in MKL-DNN. Thanks for letting us informed. Can you try build CNTK without -DUSE_MKLDNN and see if CNTK works on Ryzen?
Built CNTK from the release/2.4 branch without the flag and it does indeed work!
I didn't download and build the mkl-dnn dependency, only mklml.
Besides the -USE_MKLDNN flag, I also had to remove mkldnn from the LIBS_LIST.
Thanks for the confirmation. We'll contact MKL-DNN.
MKL-DNN supports IA-compatible processors, including AMD Ryzen. We identified an issue with recently introduced cache detection functionality. The patch is in validation and will be shortly available in MKL-DNN master.
Hi folks, fix for this issue is already available in MKL-DNN master. Please update MKL-DNN to the new version (commit c5e2ac6de62a0e4f9054e06fbf9def3c2f86e406).
@KeDengMS, will you be able to pickup the latest MKL-DNN master or you expect a patch release?
@vpirogov Thanks for the quick fix! May I know what's the next release planned for MKL-DNN?
From our side, we are working on publishing nightly builds from latest CNTK master. Since MKL-DNN is optional in 2.4, we'll disable it in 2.4+ nightly for now. In 2.5 release (in about 6-week) there will be more MKL-DNN functions and we'd like to pick up MKL-DNN release with the fix, and then made it included by default.
If there's no MKL-DNN release planned before that, we'll add instructions to build MKL-DNN from github commit of the fix, and create our release binaries to include the patch.
Does this plan sound good to you?
@KeDengMS Sounds good.
We'll publish MKL-DNN v0.13 release that will contain this fix in the end of next week, so you'll need to pick up this version for CNTK 2.5
Thanks everyone. The temporary fix to disable MKLDNN has been merged into master, and that should unblock AMD Ryzen users if they build from source.
Most helpful comment
Hi folks, fix for this issue is already available in MKL-DNN master. Please update MKL-DNN to the new version (commit c5e2ac6de62a0e4f9054e06fbf9def3c2f86e406).