Cntk: Error importing CNTK 2.4 GPU: Throws Xbyak::Error

Created on 1 Feb 2018 · 16Comments · Source: microsoft/CNTK

Trying to import the GPU version of CNTK 2.4 in Python 3.6 results in the following error:

>>> import cntk
terminate called after throwing an instance of 'Xbyak::Error'
  what():  internal error
Aborted (core dumped)

CNTK 2.3.1 works fine on the same system.

System information:

Ubuntu 16.04
Python 3.6
Working CUDA 9.0 / cuDNN 7 installation (verified with tensorflow)
Wheel installed: cntk-2.4-cp36-cp36m-linux_x86_64.whl (GPU)

Source

jsundh

Most helpful comment

Hi folks, fix for this issue is already available in MKL-DNN master. Please update MKL-DNN to the new version (commit c5e2ac6de62a0e4f9054e06fbf9def3c2f86e406).

tprimak on 14 Feb 2018

👍2

All 16 comments

The error seems to be from MKL-DNN. @KeDengMS any idea?

manikjindal on 6 Feb 2018

I tried CNTK 2.4 in tensorflow/tensorflow:latest-gpu, and didn't hit this issue. The python version is 2.7 there. What's your repro?

KeDengMS on 7 Feb 2018

The same problem occurs in my system and I'm using the older version 2.3.1 as a temporary measure.

System information:

Ubuntu 16.04, 4.13.0-32-generic
Python 3.5.2
CUDA 8.0 / cuDNN 6.0
Wheel installed: cntk-2.4-cp36-cp36m-linux_x86_64.whl (GPU)
CPU: AMD Ryzen Threadripper 1950X

jongho-park on 7 Feb 2018

👍1

The error occurs when I just start the Python interpreter and run import cntk:

$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cntk
terminate called after throwing an instance of 'Xbyak::Error'
  what():  internal error
Aborted (core dumped)

More system details:

CPU: AMD Ryzen 7 1700X
GPU: Nvidia GeForce GTX 1080Ti - driver version 390.25
Ubuntu 16.04.3 LTS with HWE kernel (4.13.0-32-generic)
Miniconda 3 environment with Python 3.6.3
Wheel installed: cntk-2.4-cp36-cp36m-linux_x86_64.whl (GPU)

jsundh on 7 Feb 2018

👍2

I am facing the same issue here. Can this issue be caused by the CPU. Specifically the Ryzen platform? I tried installing 2.4 on a cloud instance (Intel CPU) and faced no such issue. They (my PC and the cloud instance) were running Ubuntu 16.04 in order to remove any OS related errors that might arise.

System Details (on my PC):
CPU: Ryzen 1600
GPU: GTX 1060 6GB
OS: Ubutu 16.04
Python: Miniconda with a conda env of python 3.5
Wheel installed: CPU-Only/cntk-2.4-cp35-cp35m-linux_x86_64.whl

Arkoprabho on 8 Feb 2018

2.4 updated MKL version to mkl-dnn 0.12. It seems MKL-DNN has some checks that failed and throw Xbyak::Error when running on Ryzen. CNTK's build/test environment does not have Ryzen CPUs so this is not caught before release.

You may compile CNTK from source and remove -DUSE_MKLDNN from https://github.com/Microsoft/CNTK/blob/master/Makefile#L174

KeDengMS on 8 Feb 2018

Ok.. So i tried installing mkl-dnn as instructed in their readme inside a ubuntu 16.04 docker container. The build completed with no errors.

[ 96%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/conv_aux.cpp.o
[ 97%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/ref_conv.cpp.o
[ 98%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/bench_conv.cpp.o
[ 98%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/perf_report.cpp.o
[ 99%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/conv/cfg.cpp.o
[ 99%] Building CXX object tests/benchdnn/CMakeFiles/benchdnn.dir/common.cpp.o
[100%] Linking CXX executable benchdnn
[100%] Built target benchdnn

Although, on running make test all the 32 test cases failed.

The following tests FAILED:
      1 - simple-net-c (OTHER_FAULT)
      2 - simple-net-cpp (OTHER_FAULT)
      3 - simple-training-net-c (OTHER_FAULT)
      4 - simple-training-net-cpp (OTHER_FAULT)
      5 - api-c (OTHER_FAULT)
      6 - test_c_symbols-c (OTHER_FAULT)
      7 - test_iface_pd_iter (OTHER_FAULT)
      8 - test_iface_attr (OTHER_FAULT)
      9 - test_sum (OTHER_FAULT)
     10 - test_reorder (OTHER_FAULT)
     11 - test_concat (OTHER_FAULT)
     12 - test_softmax_forward (OTHER_FAULT)
     13 - test_eltwise (OTHER_FAULT)
     14 - test_relu (OTHER_FAULT)
     15 - test_lrn_forward (OTHER_FAULT)
     16 - test_lrn_backward (OTHER_FAULT)
     17 - test_pooling_forward (OTHER_FAULT)
     18 - test_pooling_backward (OTHER_FAULT)
     19 - test_batch_normalization (OTHER_FAULT)
     20 - test_inner_product_forward (OTHER_FAULT)
     21 - test_inner_product_backward_data (OTHER_FAULT)
     22 - test_inner_product_backward_weights (OTHER_FAULT)
     23 - test_convolution_format_any (OTHER_FAULT)
     24 - test_convolution_forward_f32 (OTHER_FAULT)
     25 - test_convolution_forward_s16s16s32 (OTHER_FAULT)
     26 - test_convolution_forward_u8s8s32 (OTHER_FAULT)
     27 - test_convolution_forward_u8s8fp (OTHER_FAULT)
     28 - test_convolution_relu_forward_f32 (OTHER_FAULT)
     29 - test_convolution_relu_forward_s16s16s32 (OTHER_FAULT)
     30 - test_convolution_backward_data_f32 (OTHER_FAULT)
     31 - test_convolution_backward_data_s16s16s32 (OTHER_FAULT)
     32 - test_convolution_backward_weights (OTHER_FAULT)
Errors while running CTest
Makefile:127: recipe for target 'test' failed
make: *** [test] Error 8

Arkoprabho on 8 Feb 2018

👍1

Seems like a compatibility issue in MKL-DNN. Thanks for letting us informed. Can you try build CNTK without -DUSE_MKLDNN and see if CNTK works on Ryzen?

KeDengMS on 8 Feb 2018

Built CNTK from the release/2.4 branch without the flag and it does indeed work!

I didn't download and build the mkl-dnn dependency, only mklml.
Besides the -USE_MKLDNN flag, I also had to remove mkldnn from the LIBS_LIST.

jsundh on 9 Feb 2018

Thanks for the confirmation. We'll contact MKL-DNN.

KeDengMS on 9 Feb 2018

MKL-DNN supports IA-compatible processors, including AMD Ryzen. We identified an issue with recently introduced cache detection functionality. The patch is in validation and will be shortly available in MKL-DNN master.

vpirogov on 14 Feb 2018

👍1

Hi folks, fix for this issue is already available in MKL-DNN master. Please update MKL-DNN to the new version (commit c5e2ac6de62a0e4f9054e06fbf9def3c2f86e406).

tprimak on 14 Feb 2018

👍2

@KeDengMS, will you be able to pickup the latest MKL-DNN master or you expect a patch release?

vpirogov on 15 Feb 2018

@vpirogov Thanks for the quick fix! May I know what's the next release planned for MKL-DNN?

From our side, we are working on publishing nightly builds from latest CNTK master. Since MKL-DNN is optional in 2.4, we'll disable it in 2.4+ nightly for now. In 2.5 release (in about 6-week) there will be more MKL-DNN functions and we'd like to pick up MKL-DNN release with the fix, and then made it included by default.

If there's no MKL-DNN release planned before that, we'll add instructions to build MKL-DNN from github commit of the fix, and create our release binaries to include the patch.

Does this plan sound good to you?

KeDengMS on 16 Feb 2018

@KeDengMS Sounds good.
We'll publish MKL-DNN v0.13 release that will contain this fix in the end of next week, so you'll need to pick up this version for CNTK 2.5

tprimak on 16 Feb 2018

Thanks everyone. The temporary fix to disable MKLDNN has been merged into master, and that should unblock AMD Ryzen users if they build from source.

KeDengMS on 21 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings