Incubator-mxnet: OpenMP Error

Created on 20 Feb 2020  Â·  65Comments  Â·  Source: apache/incubator-mxnet

Description

Compiled MxNet has duplicate OpenMP library link to both libomp and libiomp.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked
into the program. That is dangerous, since it can degrade performance or cause
incorrect results. The best thing to do is to ensure that only a single OpenMP
runtime is linked into the process, e.g. by avoiding static linking of the
OpenMP runtime in any library. As an unsafe, unsupported, undocumented
workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to
allow the program to continue to execute, but that may cause crashes or silently
produce incorrect results. For more information, please see
http://www.intel.com/software/products/support/.

To Reproduce

I have both Intel MKL and MKLDNN library installed on Ubuntu 18.04. Use the following config to compile MxNet will lead the error shown above.

cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
ninja -v

What have you tried to solve it?

After I deleted 3rdparty/openmp, and recompiled mxnet, this error no longer occurs.

Environment

Ubuntu 18.04, installed with Intel MKL and MKLDNN library.

Bug CMake

All 65 comments

that is because git clone --recursive pull openmp, and the cmake script don't take care if use system openMP or not

https://github.com/apache/incubator-mxnet/blob/9dcf71d8fe33f77ed316a95fcffaf1f7f883ff70/CMakeLists.txt#L390-L430

unfortunately, openmp dont have pkg-config file, or cmake files for interact with cmake

seems need a own cmake module for search it

this case also same effects with intel-dnnl.

some distros already provide a package intel-dnnl, but mxnet force download the sources again

@cjolivier01 you previously vetoed changing the omp configuration in cmake build, due to a race condition that had not been fixed. As that has been fixed, are you OK with proceeding to prefer system OMP for the CMake build by default? Or what is your recommendation?

Static build should still statically build omp.

@sl1pkn07 given the rapid development of intel-dnnl, MXNet expects a fixed version of intel-dnnl. It's quite unlikely that the system provides that particular version, but patches to improve detection are welcome. Do you want to contribute a PR? But let's track this in a separate issue.

1) What is pulling in libiomp5.so ?
2) Since when is libomp being linked in statically? I am not aware of this ever being the case.

btw, cmake files have min cmake at 3.13, but default 18.04 cmake install is cmake 3.10. Does anyone know what the deal is with 3.13? Ubuntu 18.04 is a pretty widely-used release...
I changed back to 3.10 and it seems to build ok.

@cjolivier01 you previously vetoed changing the omp configuration in cmake build, due to a race condition that had not been fixed. As that has been fixed, are you OK with proceeding to prefer system OMP for the CMake build by default? Or what is your recommendation?

Static build should still statically build omp.

Not actually. Due to no legitimate reason to remove it.

What is pulling in libiomp5.so ?

MKL

btw, cmake files have min cmake at 3.13, but default 18.04 cmake install is cmake 3.10. Does anyone know what the deal is with 3.13? Ubuntu 18.04 is a pretty widely-used release...
I changed back to 3.10 and it seems to build ok.

Just pip install cmake as per the doc https://mxnet.apache.org/get_started/ubuntu_setup. It'd be harder to explain when users require 3.13 and when 3.X, or 3.Y, than to uniformly require a recent version. There are various bugs fixed in 3.13 that affect MXNet use-cases (eg cuda, https://cmake.org/cmake/help/latest/policy/CMP0077.html for llvm openmp subproject)

Not actually. Due to no legitimate reason to remove it.

Speed up developer build. No need to build llvm openmp if system openmp is present.

openmp is like a 4-5-second build.

On my desktop machine it's < 3:
real 0m2.940s
user 0m42.446s
sys 0m5.442s

I installed mkl, but it does not appear to pick it up. is there a way to force it?

Actually, i don;t see this behavior when it does pull in mkl/pulling in the other omp (this is Ubuntu 18.04):

[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd libmxnet.so 
        linux-vdso.so.1 (0x00007ffcbdf3b000)
        libmkl_rt.so => /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00007fb399dd8000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb399bd0000)
        libomp.so => /home/chriso/src/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007fb3998ea000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fb3996e6000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb3994c7000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb39913e000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb398da0000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb398b88000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb398797000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb3a078c000)

I don;t show libmkl_rt.so pulling in libiomp5:

[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
        linux-vdso.so.1 (0x00007ffd6c5cc000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc85058d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc85019c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fc850e71000)

I think libmkl_rt may dlopen libiomp as per https://github.com/intel/mkl-dnn/issues/230#issuecomment-451082066, but I haven't looked into this further yet

Linking in any version of omp statically would probably be a bad idea, since startup order would be important.

Clearly it does not:
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
linux-vdso.so.1 (0x00007ffd6c5cc000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc85058d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc85019c000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc850e71000)

Yes, that's why dlopen.

and with readelf -a <lib> | grep NEEDED ?

can you supply a script to reproduc this error? I am not able to reproduce.

i'm using system openmp and no mkl-dnnl. sorry @icemelon9?

@sl1pkn07 please open a separate issue for your problem. This issue is about MKL.

@icemelon9 please provide the a reproducer to trigger the error message.

Sorry about the late response. Here's the script to reproduce the error message.

import numpy as np
import mxnet as mx

a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
c = mx.nd.dot(a, b)
c.wait_to_read()

The following shows shared library used by libmxnet on my machine.

mxnet git:(master) ldd build/libmxnet.so
        linux-vdso.so.1 (0x00007ffd8b467000)
        libmkl_rt.so => /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00007f1abac81000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1abaa79000)
        libopencv_imgcodecs.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_imgcodecs.so.3.2 (0x00007f1aba840000)
        libopencv_imgproc.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_imgproc.so.3.2 (0x00007f1aba2ef000)
        libopencv_core.so.3.2 => /usr/lib/x86_64-linux-gnu/libopencv_core.so.3.2 (0x00007f1ab9eb4000)
        libomp.so => /home/ubuntu/repo/mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f1ab9bce000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1ab99ca000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1ab97ab000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1ab9422000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1ab9084000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1ab8e6c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1ab8a7b000)
        ...
mxnet git:(master) readelf -a build/libmxnet.so| grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_rt.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgcodecs.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_rt.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgcodecs.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
(pytorch) [chriso@chriso-ripper:~/src/mxnet (master)]python
Python 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import mxnet as mx
>>> 
>>> a = mx.nd.array(np.random.uniform(size=(1024, 128)).astype('float32'))
>>> b = mx.nd.array(np.random.uniform(size=(128, 1024)).astype('float32'))
>>> c = mx.nd.dot(a, b)
>>> c.wait_to_read()
>>> exit()

Stll can't reproduce.
Can you send entire cmake config log?

CI can also reproduce this issue. I switched CI to testing CMake builds instead of Makefile build in https://github.com/apache/incubator-mxnet/pull/17645 and the Python MKLDNN + MKL Pipeline fails with this issue: Log of test failure and Raw log of test failure

and Raw log of build

@cjolivier01 the build log contains the output of cmake configuration.

That pipeline relies on the following build

build_ubuntu_cpu_mkldnn_mkl() {
    set -ex
    cd /work/build
    cmake \
        -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
        -DUSE_MKL_IF_AVAILABLE=ON \
        -DBLAS="MKL" \
        -DUSE_TVM_OP=ON \
        -DUSE_CUDA=OFF \
        -DUSE_CPP_PACKAGE=ON \
        -G Ninja /work/mxnet
    ninja
}

Here is the cmake log.

build git:(master) ✗ cmake -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -GNinja ..
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_PROCESSOR x86_64
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.16.4' using generator 'Ninja'
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Performing Test SUPPORT_CXX0X
-- Performing Test SUPPORT_CXX0X - Success
-- MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF`
-- MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF`
-- MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF`
-- MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC`
-- MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value ``
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- GPU support is disabled
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Intel(R) VTune(TM) Amplifier JIT profiling disabled
-- Found MKL: /opt/intel/mkl/include
-- Found MKL (include: /opt/intel/mkl/include, lib: /opt/intel/mkl/lib/intel64/libmkl_rt.so
-- Found OpenCV: /usr (found version "3.2.0") found components: core highgui imgproc imgcodecs
-- OpenCV 3.2.0 found (/usr/share/OpenCV)
--  OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs
-- Performing Test OPENMP_HAVE_WERROR_FLAG
-- Performing Test OPENMP_HAVE_WERROR_FLAG - Success
-- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG
-- Performing Test OPENMP_HAVE_STD_GNUPP11_FLAG - Success
-- Performing Test OPENMP_HAVE_STD_CPP11_FLAG
-- Performing Test OPENMP_HAVE_STD_CPP11_FLAG - Success
-- Found PythonInterp: /home/ubuntu/anaconda3/envs/tvm/bin/python (found version "3.6.9")
-- Cannot find llvm-lit.
-- Please put llvm-lit in your PATH, set OPENMP_LLVM_LIT_EXECUTABLE to its full path, or point OPENMP_LLVM_TOOLS_DIR to its directory.
CMake Warning at 3rdparty/openmp/cmake/OpenMPTesting.cmake:22 (message):
  The check targets will not be available!
Call Stack (most recent call first):
  3rdparty/openmp/cmake/OpenMPTesting.cmake:40 (find_standalone_test_dependencies)
  3rdparty/openmp/CMakeLists.txt:49 (include)


-- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG
-- Performing Test LIBOMP_HAVE_FNO_EXCEPTIONS_FLAG - Success
-- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG
-- Performing Test LIBOMP_HAVE_FNO_RTTI_FLAG - Success
-- Performing Test LIBOMP_HAVE_X_CPP_FLAG
-- Performing Test LIBOMP_HAVE_X_CPP_FLAG - Success
-- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG
-- Performing Test LIBOMP_HAVE_WCAST_QUAL_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_FUNCTION_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VALUE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNUSED_VARIABLE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SWITCH_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG
-- Performing Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SIGN_COMPARE_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG
-- Performing Test LIBOMP_HAVE_WNO_UNKNOWN_PRAGMAS_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG
-- Performing Test LIBOMP_HAVE_WNO_MISSING_FIELD_INITIALIZERS_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG
-- Performing Test LIBOMP_HAVE_WNO_MISSING_BRACES_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG
-- Performing Test LIBOMP_HAVE_WNO_COMMENT_FLAG - Success
-- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG
-- Performing Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG
-- Performing Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG
-- Performing Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG - Failed
-- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG
-- Performing Test LIBOMP_HAVE_WSTRINGOP_OVERFLOW_FLAG - Success
-- Performing Test LIBOMP_HAVE_MSSE2_FLAG
-- Performing Test LIBOMP_HAVE_MSSE2_FLAG - Success
-- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG
-- Performing Test LIBOMP_HAVE_FTLS_MODEL_FLAG - Success
-- Performing Test LIBOMP_HAVE_MMIC_FLAG
-- Performing Test LIBOMP_HAVE_MMIC_FLAG - Failed
-- Performing Test LIBOMP_HAVE_M32_FLAG
-- Performing Test LIBOMP_HAVE_M32_FLAG - Failed
-- Performing Test LIBOMP_HAVE_X_FLAG
-- Performing Test LIBOMP_HAVE_X_FLAG - Success
-- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG
-- Performing Test LIBOMP_HAVE_WARN_SHARED_TEXTREL_FLAG - Success
-- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG
-- Performing Test LIBOMP_HAVE_AS_NEEDED_FLAG - Success
-- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG
-- Performing Test LIBOMP_HAVE_VERSION_SCRIPT_FLAG - Success
-- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG
-- Performing Test LIBOMP_HAVE_STATIC_LIBGCC_FLAG - Success
-- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG
-- Performing Test LIBOMP_HAVE_Z_NOEXECSTACK_FLAG - Success
-- Performing Test LIBOMP_HAVE_FINI_FLAG
-- Performing Test LIBOMP_HAVE_FINI_FLAG - Success
-- Found Perl: /usr/bin/perl (found version "5.26.1")
-- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS
-- Performing Test LIBOMP_HAVE_VERSION_SYMBOLS - Success
-- Performing Test LIBOMP_HAVE___BUILTIN_FRAME_ADDRESS
-- Performing Test LIBOMP_HAVE___BUILTIN_FRAME_ADDRESS - Success
-- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE
-- Performing Test LIBOMP_HAVE_WEAK_ATTRIBUTE - Success
-- Looking for include files windows.h, psapi.h
-- Looking for include files windows.h, psapi.h - not found
-- Looking for EnumProcessModules in psapi
-- Looking for EnumProcessModules in psapi - not found
-- LIBOMP: Operating System     -- Linux
-- LIBOMP: Target Architecture  -- x86_64
-- LIBOMP: Build Type           -- Release
-- LIBOMP: Library Kind         -- SHARED
-- LIBOMP: Library Type         -- normal
-- LIBOMP: Fortran Modules      -- FALSE
-- LIBOMP: Build                -- 20140926
-- LIBOMP: Use Stats-gathering  -- FALSE
-- LIBOMP: Use Debugger-support -- FALSE
-- LIBOMP: Use ITT notify       -- TRUE
-- LIBOMP: Use OMPT-support     -- TRUE
-- LIBOMP: Use OMPT-optional  -- TRUE
-- LIBOMP: Use Adaptive locks   -- TRUE
-- LIBOMP: Use quad precision   -- TRUE
-- LIBOMP: Use TSAN-support     -- FALSE
-- LIBOMP: Use Hwloc library    -- FALSE
-- Looking for sqrt in m
-- Looking for sqrt in m - found
-- Looking for __atomic_load_1
-- Looking for __atomic_load_1 - not found
-- Looking for __atomic_load_1 in atomic
-- Looking for __atomic_load_1 in atomic - found
-- check-libomp does nothing.
-- check-ompt does nothing.
-- check-openmp does nothing.
USE_LAPACK is ON
CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project):
  VERSION keyword not followed by a value or was followed by a value that
  expanded to nothing.


-- Found GTest: gtest
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /home/ubuntu/repo/mxnet/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Found OpenMP_C: -fopenmp
-- Found OpenMP_CXX: -fopenmp
-- Found OpenMP: TRUE
-- Performing Test SUPPORT_MSSE3
-- Performing Test SUPPORT_MSSE3 - Success
-- Determining F16C support
-- Performing Test COMPILER_SUPPORT_MF16C
-- Performing Test COMPILER_SUPPORT_MF16C - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/repo/mxnet/build

This is with latest 2020 version of mkl?

Yes, I installed Intel MKL 2020.0-166

This behavior seems suspicious because the claim would also suggest that the problem would exist for all similar build cases (ie my machine now) as well as for clang, which would pull in libomp by default, and not libiomp5.

does it occur with opencv turned off?

I still got the error after tuning off opencv.

mxnet git:(master) ✗ readelf -a build/libmxnet.so| grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_rt.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

I can reproduce now. However, even if I remove the openmp build in CMakeLists.txt and build with clang, I get that warning, since it pulls in libomp from clang (I am using clang8):

[chriso@chriso-ripper:~/src/mxnet (master)]PYTHONPATH=$(pwd)/python python3 test.py 
OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Aborted (core dumped)
[chriso@chriso-ripper:~/src/mxnet/build (master)]ldd libmxnet.so 
        linux-vdso.so.1 (0x00007ffd55ab4000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f084c8cd000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f084c6ae000)
        libmkl_rt.so => /opt/intel/mkl/lib/intel64/libmkl_rt.so (0x00007f084bfce000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f084bdc6000)
        liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x00007f084b540000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f084b1b7000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f084ae19000)
        libomp.so => /usr/local/lib/libomp.so (0x00007f084ab56000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f084a93e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f084a54d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0855365000)
        libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f08482a7000)
        libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f0847ec8000)
        libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f0847c88000)

so this seems like a systemic problem with mkl to me.

However, even if I remove the openmp build in CMakeLists.txt and build with clang, I get that warning, since it pulls in libomp from clang (I am using clang8):

When building with MKL, are we supposed to build with libiomp as well @cjolivier01 @pengzhao-intel?
If so, we need to change our build accordingly.
If not, this seems to be a bug in MKL.

Another thing to note is that ompenmp with llvm also installs libgomp.so as a symlink to libomp.so, so there's a good chance that libomp.so will be loaded no matter what, depending upon whether a system had clang/openmp installed at all and where that is in the link order. So unless I am missing some clever logic, what mkl doing with its dynamic loading is a cause for concern.

Also of note, clang seems to also put a symlink to libiomp5 (in addition to libgomp):

[chriso@chriso-ripper:~/src/mxnet (master)]ls -l /usr/local/lib/lib*omp*.so*
lrwxrwxrwx 1 root root      9 Feb 20 14:37 /usr/local/lib/libgomp.so -> libomp.so
lrwxrwxrwx 1 root root      9 Feb 20 14:37 /usr/local/lib/libiomp5.so -> libomp.so
-rw-r--r-- 1 root root 953376 Feb 20 14:36 /usr/local/lib/libomp.so
-rw-r--r-- 1 root root  66072 Feb 20 14:36 /usr/local/lib/libomptarget.so

So it seems link order is important?

@TaoLv @pengzhao-intel can you advise about the mkl dynamic loading of iomp? What's your recommendation to fix this issue.

@leezu I need take time to go through this discussion. But do you think changing MKL to static link will solve your question? Please see link adviser here: https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor

@leezu you may also want to take a look at the discussion around here: https://github.com/apache/incubator-mxnet/issues/16891#issuecomment-567051540

So it seems link order is important?

I think so. I remember @icemelon9 also met the problem that importing pytorch before mxnet in a single python script will slow down the execution of mxnet.

As it's rather difficult to disable clang linking with llvm openmp, we can disable linking intel openmp until a solution is found to only link against intel openmp (may require compiling with icc) or mkl_rt.so is fixed to prefer already present libomp.so over dynamically loading libiomp5.so and running into the conflict..

To stop linking with intel openmp, remove / disable

https://github.com/apache/incubator-mxnet/blob/31144c763bfd0fe199b7fe0f23a20555c9731e7a/cmake/Modules/FindMKL.cmake#L126-L145

and set -DMKL_USE_SINGLE_DYNAMIC_LIBRARY=OFF.

Then we get

 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_intel_lp64.so]
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_intel_thread.so]
 0x0000000000000001 (NEEDED)             Shared library: [libmkl_core.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_highgui.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgcodecs.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.3.2]
 0x0000000000000001 (NEEDED)             Shared library: [liblapack.so.3]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so.5]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

and libmkl shouldn't attempt loading iomp dynamically.
This works because llvm openmp is compatible with iomp.

@icemelon9 I can't reproduce this problem. Please comment out the lines pointed out above and compile with cmake -DUSE_CUDA=0 -DUSE_MKLDNN=1 DMKL_USE_SINGLE_DYNAMIC_LIBRARY=OFF -GNinja ..; ninja

@cjolivier01 any suggestion how to disable linking llvm openmp for clang if iomp5 is present?

@leezu How about also set -DMKL_USE_STATIC_LIBS=ON? Then there is no need to worry about loading iomp dynamically.

On my machine, I always have another gomp as reported here:

[lvtao@mlt2-clx016 build]$ ldd libmxnet.so | grep omp
        libomp.so => /home/lvtao/Workspace/mxnet-official/build/3rdparty/openmp/runtime/src/libomp.so (0x00007fea4b117000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00007fea4a4c8000)
        libXcomposite.so.1 => /lib64/libXcomposite.so.1 (0x00007fea43a9e000)
[lvtao@mlt2-clx016 build]$ ldd libmxnet.so | grep mkl
[lvtao@mlt2-clx016 build]$
[lvtao@mlt2-clx016 build]$ readelf -a libmxnet.so| grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_imgproc.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_highgui.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libopencv_core.so.2.4]
 0x0000000000000001 (NEEDED)             Shared library: [libomp.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgomp.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]

I built with

cmake -DUSE_CUDA=0 -DUSE_MKLDNN=1 -DMKL_USE_SINGLE_DYNAMIC_LIBRARY=OFF -DMKL_USE_STATIC_LIBS=ON -DUSE_LAPACK=0 ..
make -j20

in archlinux (my case)

libgomp and libomp is provided by different package

~
└───╼ pacman -Ql openmp |grep .so
openmp /usr/lib/libomp.so
openmp /usr/lib/libomptarget.rtl.x86_64.so
openmp /usr/lib/libomptarget.so
└───╼ pacman -Ql gcc-libs |grep mp.so
gcc-libs /usr/lib/libgomp.so
gcc-libs /usr/lib/libgomp.so.1
gcc-libs /usr/lib/libgomp.so.1.0.0
~

openmp is builded with this flag for prevent make a libomp symlink

~
-DLIBOMP_INSTALL_ALIASES=OFF
~

(https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/openmp#n35)

i'm not sure if this helps

greetings

@TaoLv
what version of gcc?

@TaoLv -DMKL_USE_STATIC_LIBS=ON is currently broken on my system and in particular doesn't link iomp statically.

I'll create a PR to statically link iomp by default if MKL is found. As all OMP related symbols will then be resolved at compile time, it doesn't matter that gcc declares libgomp.so and clang libomp.so respectively as NEEDED. Further I'll disable building llvm openmp when statically linking to libiomp.so. @cjolivier01 does that sound alright to you?

@TaoLv
what version of gcc?

@cjolivier01 this will happen with any version of gcc

@TaoLv -DMKL_USE_STATIC_LIBS=ON is currently broken on my system and in particular doesn't link iomp statically.

I'll create a PR to statically link iomp by default if MKL is found. As all OMP related symbols will then be resolved at compile time, it doesn't matter that gcc declares libgomp.so and clang libomp.so respectively as NEEDED. Further I'll disable building llvm openmp when statically linking to libiomp.so. @cjolivier01 does that sound alright to you?

Linking to omp statically is dangerous because atfork handler registration (and thus execution) order is no defined, as well as general global init order, of course, although I am not sure if that is problematic since libiomp i all C.

@TaoLv
what version of gcc?

@cjolivier01 this will happen with any version of gcc

it doesn;t happen on my machine, but I am not linking to opencv because it breaks my pytorch, build having it installed. maybe opencv pulls in libgomp?

mayb e statically linking is ok, I am not sure

-DMKL_USE_STATIC_LIBS=ON

How broken? When I built with clang, it seems to run ok.

Nevermind, scratch that, it wasn;t finding mkl

-DMKL_USE_STATIC_LIBS=ON

How broken? When I built with clang, it seems to run ok.

Nevermind, scratch that, it wasn;t finding mkl

You can use this patch as fix

diff --git a/cmake/Modules/FindMKL.cmake b/cmake/Modules/FindMKL.cmake
index 51eff8fe0..444a57687 100644
--- a/cmake/Modules/FindMKL.cmake
+++ b/cmake/Modules/FindMKL.cmake
@@ -19,8 +19,6 @@
 #
 # Options:
 #
-#   USE_MKLDNN                    : Search for MKL:ML library variant
-#
 #   MKL_USE_SINGLE_DYNAMIC_LIBRARY  : use single dynamic library interface
 #   MKL_USE_STATIC_LIBS             : use static libraries
 #   MKL_MULTI_THREADED              : use multi-threading
@@ -45,8 +43,10 @@ set(INTEL_ROOT "/opt/intel" CACHE PATH "Folder contains intel libs")


   # ---[ Options
-  option(MKL_USE_SINGLE_DYNAMIC_LIBRARY "Use single dynamic library interface" ON)
-  cmake_dependent_option(MKL_USE_STATIC_LIBS "Use static libraries" OFF "NOT MKL_USE_SINGLE_DYNAMIC_LIBRARY" OFF)
+  # Single dynamic library interface leads to conflicts between intel omp and llvm omp
+  # https://github.com/apache/incubator-mxnet/issues/17641
+  option(MKL_USE_SINGLE_DYNAMIC_LIBRARY "Use single dynamic library interface" OFF)
+  cmake_dependent_option(MKL_USE_STATIC_LIBS "Use static libraries" ON "NOT MKL_USE_SINGLE_DYNAMIC_LIBRARY" OFF)
   cmake_dependent_option(MKL_MULTI_THREADED  "Use multi-threading"  ON "NOT MKL_USE_SINGLE_DYNAMIC_LIBRARY" OFF)
   option(MKL_USE_ILP64 "Use ilp64 data model" OFF)
   cmake_dependent_option(MKL_USE_CLUSTER "Use cluster functions" OFF "CMAKE_SIZEOF_VOID_P EQUAL 4" OFF)
@@ -122,10 +122,9 @@ set(INTEL_ROOT "/opt/intel" CACHE PATH "Folder contains intel libs")
     list(APPEND MKL_LIBRARIES ${${__mkl_lib_upper}_LIBRARY})
   endforeach()

-
   if(NOT MKL_USE_SINGLE_DYNAMIC_LIBRARY)
     if (MKL_USE_STATIC_LIBS)
-      set(__iomp5_libs iomp5 libiomp5mt.lib)
+      set(__iomp5_libs libiomp5.a libiomp5mt.lib)
     else()
       set(__iomp5_libs iomp5 libiomp5md.lib)
     endif()
@@ -135,15 +134,18 @@ set(INTEL_ROOT "/opt/intel" CACHE PATH "Folder contains intel libs")
       list(APPEND __looked_for INTEL_INCLUDE_DIR)
     endif()

-    find_library(MKL_RTL_LIBRARY ${__iomp5_libs}
+    find_library(IOMP_LIBRARY ${__iomp5_libs}
       PATHS ${INTEL_RTL_ROOT} ${INTEL_ROOT}/compiler ${MKL_ROOT}/.. ${MKL_ROOT}/../compiler
       PATH_SUFFIXES ${__path_suffixes}
       DOC "Path to Path to OpenMP runtime library")

-    list(APPEND __looked_for MKL_RTL_LIBRARY)
-    list(APPEND MKL_LIBRARIES ${MKL_RTL_LIBRARY})
+    list(APPEND __looked_for IOMP_LIBRARY)
+    list(APPEND MKL_LIBRARIES ${IOMP_LIBRARY})
   endif()

+  if(MKL_USE_STATIC_LIBS)
+    set(MKL_LIBRARIES -Wl,--start-group "${MKL_LIBRARIES}" -Wl,--end-group)
+  endif()


 include(FindPackageHandleStandardArgs)
@@ -154,4 +156,3 @@ if(MKL_FOUND)
 endif()

 mxnet_clear_vars(__looked_for __mkl_libs __path_suffixes __lib_suffix __iomp5_libs)
-

@TaoLv
what version of gcc?

@cjolivier01 this will happen with any version of gcc

@TaoLv which version of gcc?

even if I remove the openmp build in CMakeLists.txt and build with clang, I get that warning, since it pulls in libomp from clang

@cjolivier01 How can we stop clang from pulling in libomp? It doesn't have an effect when static linking, as the symbols are already resolved, but it would be better to not pull libomp in in the first place.

Stopping clang/others from linking to omp seems like the tail wagging the dog. I think we should consider other options, such as making the static mkl build work, or somehow stopping mkl from being so "clever".

@TaoLv which version of gcc?

It's 4.8.5 on centos. Do you want me to try a higher version or exclude opencv from the build?

-DMKL_USE_STATIC_LIBS=ON is currently broken on my system and in particular doesn't link iomp statically.

@leezu, with this flag on, I would expect only MKL libraries to be statically linked while omp runtime is dynamically linked to mxnet.so. That's how we handle the omp linkage of DNNL.

@leezu what's the error of statically linking MKL libraries?

@leezu, previously we thought intel is not distributing iomp static library: https://github.com/apache/incubator-mxnet/issues/8532#issuecomment-341834234. But from the linked issue, even we fix the omp runtime conflict inside mxnet, we may still encounter conflicts in down stream projects.

@leezu what's the error of statically linking MKL libraries?

for me it was some link error on some secondary thing like cpp unit test or something like that. libmxnet.so built successfully and the test script was successful. probably not too hard to fix.

Thank you @cjolivier01 . That's exactly what I just observed.

@TaoLv @cjolivier01 its not hard to fix. If you look above, I posted the patch to fix it 12 hours ago.

An improved version of that patch is in https://github.com/apache/incubator-mxnet/pull/17645

Thanks @leezu! It seems that we have got a consensus to address this issue?

there’s a lot of stuff in that PR,would prefer a more targeted PR.
Also, I think the best fix is link statically to
MKL as we discussed, which as far as I can tell is not addressed in this PR, although I didn’t read every line. i will review some more when i get to my day job, but it would be better to have a small, targeted PR.

i will post a pr in the next day or two that addresses this and also clang issue as well as transitive omp dependencies which may also cause the error due to mkl behaving foolishly.

@cjolivier01 the PR coniststs of two commits. Only the second commit is related to omp and implements the conclusion from the discussion in this issue.

I have removed the second commit and disabled testing the MKL cmake builds. I look forward to your improved fix, thanks for contributing that.

@cjolivier01 thanks for volunteering to contribute the PR! Do you have any status update?

@cjolivier01 please prioritize the PR, as this affects other users. For example https://github.com/apache/incubator-mxnet/issues/17733

Let me know if I may resubmit the MKL static linkage commit earlier included in #17645.

yesh just submit the static linkage

On Tue, Mar 3, 2020 at 10:18 AM Leonard Lausen notifications@github.com
wrote:

@cjolivier01 https://github.com/cjolivier01 please prioritize the PR,
as this affects other users. For example #17733
https://github.com/apache/incubator-mxnet/issues/17733

Let me know if I may resubmit the MKL static linkage commit earlier
included in #17645 https://github.com/apache/incubator-mxnet/pull/17645.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/apache/incubator-mxnet/issues/17641?email_source=notifications&email_token=ACVWZ7LR6RPRYJPLODWI4TDRFVCW5A5CNFSM4KYYMAM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENUSMAA#issuecomment-594093568,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACVWZ7OGQYOOPVXQJOEII2LRFVCW5ANCNFSM4KYYMAMQ
.

Was this page helpful?
0 / 5 - 0 ratings