Onnxruntime: [Linux] CApiTest.dim_param fails with mutex error

Created on 13 Jan 2020  路  16Comments  路  Source: microsoft/onnxruntime

Describe the bug
trying to bump package version for nixpkgs to 1.1.0 when I ran into this error when running the ctest suite:

[ RUN      ] CApiTest.dim_param
onnxruntime_shared_lib_test: ../nptl/pthread_mutex_lock.c:142: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

7/7 Test #2: onnxruntime_test_all ................................................   Passed   16.45 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) =  16.46 sec

The following tests FAILED:
          7 - onnxruntime_shared_lib_test (Child aborted)
Errors while running CTest

likely to be incorrect usage of a mutex lock.

Urgency

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): NixOS
  • ONNX Runtime installed from (source or binary): source
  • ONNX Runtime version: 1.1.0
  • Python version: Python 3.7.6
  • Visual Studio version (if applicable): N/A
  • GCC/Compiler version (if compiling from source): gcc (GCC) 9.2.0
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Cmake configure step details

cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_SKIP_BUILD_RPATH=ON -DCMAKE_INSTALL_LOCALEDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/share/doc/ -DCMAKE_INSTALL_INFODIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/vw6ndlplgw6xira45wzggahzmq25kdkf-onnxruntime-1.1.0-dev/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/vw6ndlplgw6xira45wzggahzmq25kdkf-onnxruntime-1.1.0-dev/include -DCMAKE_INSTALL_SBINDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_DEPLOYMENT_TARGET= -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=last -DCMAKE_STRIP=/nix/store/5nxyf0yn836j0acm82cz3g50d4glk5df-binutils-2.31.1/bin/strip -DCMAKE_RANLIB=/nix/store/5nxyf0yn836j0acm82cz3g50d4glk5df-binutils-2.31.1/bin/ranlib -DCMAKE_AR=/nix/store/5nxyf0yn836j0acm82cz3g50d4glk5df-binutils-2.31.1/bin/ar -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_INSTALL_PREFIX=/nix/store/1m7mmscxhr9cfvfb2qvkgv9masxpm273-onnxruntime-1.1.0 -Donnxruntime_USE_OPENMP=ON -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_ENABLE_LTO=ON
-- The C compiler identification is GNU 9.2.0
-- The CXX compiler identification is GNU 9.2.0
-- Check for working C compiler: /nix/store/a9hbzlkdfvpr4r8mjwcxcda9smjp5v1i-gcc-wrapper-9.2.0/bin/gcc
-- Check for working C compiler: /nix/store/a9hbzlkdfvpr4r8mjwcxcda9smjp5v1i-gcc-wrapper-9.2.0/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /nix/store/a9hbzlkdfvpr4r8mjwcxcda9smjp5v1i-gcc-wrapper-9.2.0/bin/g++
-- Check for working CXX compiler: /nix/store/a9hbzlkdfvpr4r8mjwcxcda9smjp5v1i-gcc-wrapper-9.2.0/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR)
-- Could NOT find PNG (missing: PNG_LIBRARY PNG_PNG_INCLUDE_DIR)
-- Found PythonInterp: /nix/store/bzvpdbzdy3a1qba8f9ynrzyz811c10pp-python3-3.7.6/bin/python3 (found suitable version "3.7.6", minimum required is "3.4")
-- Found PythonLibs: /nix/store/bzvpdbzdy3a1qba8f9ynrzyz811c10pp-python3-3.7.6/lib/libpython3.7m.so (found suitable version "3.7.6", minimum required is "3.4")
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR)
# date: USE_SYSTEM_TZ_DB ON
# date: USE_TZ_DB_IN_DOT OFF
# date: BUILD_SHARED_LIBS OFF
# date: ENABLE_DATE_TESTING OFF
-- Performing Test HAS_UNUSED_BUT_SET_VARIABLE
-- Performing Test HAS_UNUSED_BUT_SET_VARIABLE - Success
-- Performing Test HAS_UNUSED_PARAMETER
-- Performing Test HAS_UNUSED_PARAMETER - Success
-- Performing Test HAS_CAST_FUNCTION_TYPE
-- Performing Test HAS_CAST_FUNCTION_TYPE - Success
-- Performing Test HAS_PARENTHESES
-- Performing Test HAS_PARENTHESES - Success
-- Performing Test HAS_USELESS_CAST
-- Performing Test HAS_USELESS_CAST - Success
-- Performing Test HAS_NONNULL_COMPARE
-- Performing Test HAS_NONNULL_COMPARE - Success
-- Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE
-- Performing Test HAS_TAUTOLOGICAL_POINTER_COMPARE - Failed
-- Performing Test HAS_CATCH_VALUE
-- Performing Test HAS_CATCH_VALUE - Success
-- Performing Test HAS_MISSING_BRACES
-- Performing Test HAS_MISSING_BRACES - Success
-- Performing Test HAS_IGNORED_ATTRIBUTES
-- Performing Test HAS_IGNORED_ATTRIBUTES - Success
-- Performing Test HAS_DEPRECATED_COPY
-- Performing Test HAS_DEPRECATED_COPY - Success
-- Performing Test HAS_CLASS_MEMACCESS
-- Performing Test HAS_CLASS_MEMACCESS - Success
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
enabling optimizations for gemmlowp
-- The ASM compiler identification is GNU
-- Found assembler: /nix/store/a9hbzlkdfvpr4r8mjwcxcda9smjp5v1i-gcc-wrapper-9.2.0/bin/gcc
-- Performing Test HAS_AVX512F
-- Performing Test HAS_AVX512F - Success
-- Performing Test AVX512F_COMPILES
-- Performing Test AVX512F_COMPILES - Success
-- Performing Test HAS_AVX512BW
-- Performing Test HAS_AVX512BW - Success
-- Performing Test AVX512BW_COMPILES
-- Performing Test AVX512BW_COMPILES - Success
-- Found PythonInterp: /nix/store/bzvpdbzdy3a1qba8f9ynrzyz811c10pp-python3-3.7.6/bin/python3 (found version "3.7.6")
-- Configuring done
-- Generating done

To Reproduce

Either: get a linux system with gcc==9.2.0, python3.7.6, and cmake. Then use the cmake flags listed in the system info step

OR:

install nix

git clone [email protected]:jonringer/nixpkgs.git
cd nixpkgs
git checkout bump-onnxruntime
nix-build -A onnxruntime

Expected behavior
builds correctly

Screenshots

Additional context

bug

All 16 comments

I can reproduce the bug without using nix.

As for now, I suggest you turn off onnxruntime_ENABLE_LTO. I believe it will solve the issue.
I'll fix the bug (if it's possible), but as you know, 1.1.0 is already released, I can't change it. I can only fix it bug in master and wait for the next release.

That does seem to fix the build, although it's less than ideal ;(

I'll leave the issue open until a PR closes it :)

Still failing on 1.1.1:

[ RUN      ] CApiTest.dim_param
onnxruntime_shared_lib_test: ../nptl/pthread_mutex_lock.c:144: __pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.

7/7 Test #2: onnxruntime_test_all ................................................   Passed   17.94 sec

86% tests passed, 1 tests failed out of 7

Total Test time (real) =  17.94 sec

The following tests FAILED:
          7 - onnxruntime_shared_lib_test (Child aborted)
Errors while running CTest
make: *** [Makefile:130: test] Error 8
builder for '/nix/store/idij9j0lady0qhil9fpr8wcsqcgiml1h-onnxruntime-1.1.1.drv' failed with exit code 2

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

I'm taking vacation now. Will update this when I'm back. (about 10 days later).

Hi @jonringer , I just tested it again, it seems fine.

diff --git a/pkgs/development/libraries/onnxruntime/default.nix b/pkgs/development/libraries/onnxruntime/default.nix
index 1794d55daaa..bcf4e2221f8 100644
--- a/pkgs/development/libraries/onnxruntime/default.nix
+++ b/pkgs/development/libraries/onnxruntime/default.nix
@@ -42,7 +42,7 @@ stdenv.mkDerivation rec {
     "-Donnxruntime_USE_OPENMP=ON"
     "-Donnxruntime_BUILD_SHARED_LIB=ON"
     # flip back to ON next release
-    "-Donnxruntime_ENABLE_LTO=OFF" # https://github.com/microsoft/onnxruntime/issues/2828
+    "-Donnxruntime_ENABLE_LTO=ON" # https://github.com/microsoft/onnxruntime/issues/2828
   ];

onnxruntime version 1.3.1
nixpkg version: dc1c3f3203b455f1abb62880ca925f1ccd1e9e2c

Thanks for even testing with nix :)

BTW, how the sha256 checksum was generated?

BTW, how the sha256 checksum was generated?

nix has a two-step build process, it creates a "derivation" which can be thought of all inputs that go into a package (include platform and architecture), and then uses that to actual build the package. When generating the derivation is when the sha256 is determined, it's assumed that the build outputs will be reproducible (in which most cases it's byte reproducible).

$ nix show-derivation $(nix-instantiate -A onnxruntime)
{
  "/nix/store/2fn5ff46kjiqh8x32lr7cbv1rm09j92y-onnxruntime-1.3.1.drv": {
    "outputs": {
      "dev": {
        "path": "/nix/store/cdb3y2dv6b7xnahj6k7xi3gx8f5kgzfk-onnxruntime-1.3.1-dev"
      },
      "out": {
        "path": "/nix/store/cw01i1cf5j1wkyk4wdjv3g15cn3q9i3f-onnxruntime-1.3.1"
      }
    },
    "inputSrcs": [
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "inputDrvs": {
      "/nix/store/0i02lv650zkj5vipi2a53r5rn1rpcab8-source.drv": [
        "out"
      ],
      "/nix/store/3cac569dbmvrwn2w2w27s1g0fzlk8aw2-libpng-apng-1.6.37.drv": [
        "dev"
      ],
      "/nix/store/840fpgnd7szrjl7m1dj80gz1bwzih0kd-cmake-3.17.3.drv": [
        "out"
      ],
      "/nix/store/hpr02w51r1kqyvvzmpk38iq5d4f58j40-python3-3.8.3.drv": [
        "out"
      ],
      "/nix/store/lbz40vy8q4a85xj6i8x3a9pzbn6lxnpl-glibc-locales-2.31.drv": [
        "out"
      ],
      "/nix/store/q1b3xsrc01nx9jlsy8ydzvfg26h5zf2c-zlib-1.2.11.drv": [
        "dev"
      ],
      "/nix/store/ylvnhnd5nrbqn40kyrvvs9mrsq9m2xmr-bash-4.4-p23.drv": [
        "out"
      ],
      "/nix/store/yq69lshldnwb2pcscbn069xm3v6s27hk-stdenv-linux.drv": [
        "out"
      ]
    },
    "platform": "x86_64-linux",
    "builder": "/nix/store/c4wxsn4jays9j31y5z9f83nr2cp7l4pa-bash-4.4-p23/bin/bash",
    "args": [
      "-e",
      "/nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh"
    ],
    "env": {
      "buildInputs": "/nix/store/73ni62ps2sws6nvxgxmmlxdajzwy1cfa-libpng-apng-1.6.37-dev /nix/store/fn0kwfwaf53638xm37wl94la8dxlbjaq-zlib-1.2.11-dev",
      "builder": "/nix/store/c4wxsn4jays9j31y5z9f83nr2cp7l4pa-bash-4.4-p23/bin/bash",
      "cmakeDir": "../cmake",
      "cmakeFlags": "-Donnxruntime_USE_OPENMP=ON -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_ENABLE_LTO=ON",
      "configureFlags": "",
      "depsBuildBuild": "",
      "depsBuildBuildPropagated": "",
      "depsBuildTarget": "",
      "depsBuildTargetPropagated": "",
      "depsHostHost": "",
      "depsHostHostPropagated": "",
      "depsTargetTarget": "",
      "depsTargetTargetPropagated": "",
      "dev": "/nix/store/cdb3y2dv6b7xnahj6k7xi3gx8f5kgzfk-onnxruntime-1.3.1-dev",
      "doCheck": "1",
      "doInstallCheck": "",
      "enableParallelBuilding": "1",
      "enableParallelChecking": "1",
      "name": "onnxruntime-1.3.1",
      "nativeBuildInputs": "/nix/store/mfz8nm7dwrk1744hv1saak28alsi08xc-cmake-3.17.3 /nix/store/7q4crsd8kvmma32ff2xw4w9ydj5k26cy-python3-3.8.3",
      "out": "/nix/store/cw01i1cf5j1wkyk4wdjv3g15cn3q9i3f-onnxruntime-1.3.1",
      "outputs": "out dev",
      "patches": "",
      "pname": "onnxruntime",
      "postInstall": "rm -r $out/bin   # ctest runner\n",
      "preCheck": "export LOCALE_ARCHIVE=\"/nix/store/4p9w65np73bynmmq98nn2z1an0py0jvn-glibc-locales-2.31/lib/locale/locale-archive\"\n",
      "propagatedBuildInputs": "",
      "propagatedNativeBuildInputs": "",
      "src": "/nix/store/1ijccjqv5rkspng6k2dqrp05yp984dk7-source",
      "stdenv": "/nix/store/b21rxw41g7fi90bzz7fd063wy115x1yq-stdenv-linux",
      "strictDeps": "",
      "system": "x86_64-linux",
      "version": "1.3.1"
    }
  }
}

Since the sha encapsulates all inputs and configurations for every input. A new sha will be generated if one of the inputs change. For example, if I were to use clang to build onnxruntime:

[14:53:56] jon@jon-desktop /home/jon/projects/nixpkgs (onnxruntime-enable-lto)
$ nix-instantiate -E 'with import ./. {}; onnxruntime'
/nix/store/2fn5ff46kjiqh8x32lr7cbv1rm09j92y-onnxruntime-1.3.1.drv
[14:54:04] jon@jon-desktop /home/jon/projects/nixpkgs (onnxruntime-enable-lto)
$ nix-instantiate -E 'with import ./. {}; onnxruntime.override { stdenv = clangStdenv; }'
/nix/store/q3vs6071j95ss847qp5h4x2w8rfaz4cr-onnxruntime-1.3.1.drv

This also makes the build outputs content addressable, by default nix tools will ask the cdn if a particular package exists and is free to download it, as it should be the same as building it locally. (there's some signing in this case to ensure the build output wasn't tampered)

What if, for example, I want to submit a change to update onnxruntime to 1.4.0, how can I get the sha256 value of the new version?
Now 1.4 isn't released yet, but we have created the release branch. How can we test it with nixpkg before the final tag is cut?

Thanks.

How can we test it with nixpkg before the final tag is cut?

if you have a local checkout, you can do something like this, (ignore version, the channel hasn't updated):

nix-build -E 'with import <nixpkgs> {}; onnxruntime.overrideAttrs(oldAttrs: { src = ./.; })'
these derivations will be built:
  /nix/store/jaswk0yjx3s6q2h3bz2irhvv8khpdiqg-onnxruntime-1.2.0.drv
building '/nix/store/jaswk0yjx3s6q2h3bz2irhvv8khpdiqg-onnxruntime-1.2.0.drv'...

and this will build with the code located at $PWD as the src

What if, for example, I want to submit a change to update onnxruntime to 1.4.0, how can I get the sha256 value of the new version?

For normal *.tar.gz urls, there's a nix-prefetch-url command which is able to do this. For onnxruntime, I'm fetching submodules, and for reproducibility, nix will throw away the .git*; so I'm not aware of a good way to determine this. For packaging, I just invalidate the sha256, and nix will tell me that it expected after it pulled down the new source, however, it's not an automated process. I would use the src = ./.; code above.

Taking a step back, it looks like I will have to touch-up the inputs, as flake8 is now required

CMake Warning at flake8.cmake:19 (message):
  Could not find 'flake8' to check python scripts.  Please install flake8
  using pip.

flake8 is only used for detecting code style issue, if you don't do development on onnxruntime, you don't need it.

sorry, didn't run $ git submodule update --init, so it was actually failing on missing submodules

also, looks likes there's a community tool for doing the github prefetch:

$ nix-shell -p nix-prefetch-github --run "nix-prefetch-github --fetch-submodules --rev v1.3.1 microsoft onnxruntime"
{
    "owner": "microsoft",
    "repo": "onnxruntime",
    "rev": "530117cfdb230228c3429ab39d1b7cf1f68c0567",
    "sha256": "03c5h5pp67a51blc26kw08piax81116jp7l2illzrsnx0580m6ri",
    "fetchSubmodules": true
}
Was this page helpful?
0 / 5 - 0 ratings

Related issues

fdwr picture fdwr  路  4Comments

Hramchenko picture Hramchenko  路  4Comments

JammyZhou picture JammyZhou  路  3Comments

walbermr picture walbermr  路  3Comments

vlad3996 picture vlad3996  路  5Comments