Describe the bug
A clear and concise description of what the bug is.
ONNX Runtime test after installation has implicit dependency on scipy. Using --skip_onnx_tests was not able to skip all the test.
2019-11-19 18:18:59,488 Build [DEBUG] - Running subprocess in '/tmp/onnxruntime/build/Linux/Debug'
['/usr/bin/python3', '/tmp/onnxruntime/onnxruntime/test/onnx/gen_test_models.py', '--output_dir', 'test_models']
Traceback (most recent call last):
File "/tmp/onnxruntime/onnxruntime/test/onnx/gen_test_models.py", line 12, in <module>
from scipy.spatial import distance
ModuleNotFoundError: No module named 'scipy'
Traceback (most recent call last):
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1043, in <module>
sys.exit(main())
File "/tmp/onnxruntime/tools/ci_build/build.py", line 987, in main
args.use_dnnlibrary)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 610, in run_onnxruntime_tests
run_subprocess([sys.executable, os.path.join(source_dir,'onnxruntime','test','onnx','gen_test_models.py'),'--output_dir','test_models'], cwd=cwd)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/onnxruntime/onnxruntime/test/onnx/gen_test_models.py', '--output_dir', 'test_models']' returned non-zero exit status 1.
Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
System information
To Reproduce
Describe steps/code to reproduce the behavior:
$ git clone https://github.com/microsoft/onnxruntime.git
$ cd onnxruntime/
# Select a branch for stable release
$ git checkout rel-1.0.0
# Update Git submodules
$ git submodule update --init --recursive
$ ./build.sh --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --parallel --use_full_protobuf --build_wheel
$ cd build/Linux/Debug/
$ python ../../../setup.py install
Expected behavior
A clear and concise description of what you expected to happen.
Dependency should be handled by the installation package instead of the user. The user should not have seen such errors during installation.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.
Even with the scipy installed, more uninstalled dependencies were reflected. How can I build the wheel correctly? Thanks.
2019-11-19 18:42:27,098 Build [DEBUG] - Running subprocess in '/tmp/onnxruntime/build/Linux/Debug'
['/usr/bin/python3', '/tmp/onnxruntime/onnxruntime/test/onnx/gen_test_models.py', '--output_dir', 'test_models']
2019-11-19 18:42:31,371 Build [DEBUG] - Running subprocess in '/tmp/onnxruntime/build/Linux/Debug'
['/tmp/onnxruntime/build/Linux/Debug/onnx_test_runner', 'test_models']
2019-11-19 18:42:31.483881866 [E:onnxruntime:Default, runner.cc:171 ParallelRunTests] Running tests in parallel: at most 35 models at any time
2019-11-19 18:42:57.154138575 [E:onnxruntime:Default, runner.cc:190 ParallelRunTests] Running tests finished. Generating report
result:
Models: 35
Total test cases: 35
Succeeded: 35
Not implemented: 0
Failed: 0
Stats by Operator type:
Not implemented(0):
Failed:
Failed Test Cases:
/tmp/onnxruntime/tools/ci_build/build.py:620: UserWarning: onnxmltools and keras are not installed. Following test cannot be run.
warnings.warn("onnxmltools and keras are not installed. Following test cannot be run.")
2019-11-19 18:42:57,207 Build [DEBUG] - Running subprocess in '/tmp/onnxruntime/build/Linux/Debug'
['/usr/bin/python3', '/tmp/onnxruntime/setup.py', 'bdist_wheel']
Traceback (most recent call last):
File "/tmp/onnxruntime/setup.py", line 220, in <module>
'Programming Language :: Python :: 3.7'],
File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.6/distutils/core.py", line 134, in setup
ok = dist.parse_command_line()
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 706, in parse_command_line
result = _Distribution.parse_command_line(self)
File "/usr/lib/python3.6/distutils/dist.py", line 472, in parse_command_line
args = self._parse_command_opts(parser, args)
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 1021, in _parse_command_opts
nargs = _Distribution._parse_command_opts(self, parser, args)
File "/usr/lib/python3.6/distutils/dist.py", line 534, in _parse_command_opts
if not issubclass(cmd_class, Command):
TypeError: issubclass() arg 1 must be a class
Traceback (most recent call last):
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1043, in <module>
sys.exit(main())
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1034, in main
build_python_wheel(source_dir, build_dir, configs, args.use_cuda, args.use_ngraph, args.use_tensorrt, args.use_openvino, args.use_nuphar, nightly_build)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 795, in build_python_wheel
run_subprocess(args, cwd=cwd)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/onnxruntime/setup.py', 'bdist_wheel']' returned non-zero exit status 1.
It seems that I could still install the Python package even though the test was failed. This misleading messages and tests should really be avoided.
The --skip_onnx_tests bug should be fixed and the tests should be done in parallel as well. Some people cannot afford waiting.
Adding --use_cuda resulted the following error in the tests after installation.
4: [ OK ] OpaqueApiTest.RunModelWithOpaqueInputOutput (66 ms)
4: [----------] 1 test from OpaqueApiTest (66 ms total)
4:
4: [----------] Global test environment tear-down
4: [==========] 1 test from 1 test case ran. (66 ms total)
4: [ PASSED ] 1 test.
4/4 Test #4: opaque_api_test .................. Passed 0.08 sec
75% tests passed, 1 tests failed out of 4
Total Test time (real) = 421.45 sec
The following tests FAILED:
1 - onnxruntime_test_all (Failed)
Errors while running CTest
Traceback (most recent call last):
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1043, in <module>
sys.exit(main())
File "/tmp/onnxruntime/tools/ci_build/build.py", line 987, in main
args.use_dnnlibrary)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 593, in run_onnxruntime_tests
cwd=cwd, dll_path=dll_path)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/local/bin/ctest', '--build-config', 'Debug', '--verbose']' returned non-zero exit status 8.
--config Release sorts of make the test running in parallel.
There are 3 phases: update, build and test. Default is to do all three. Running the onnx tests is just a part of all the potential tests.
Add '--update --build' to skip all the tests.
I had the same question in #1938 this means the --update --build to skip tests in counter-intuitive when there is a --skip_onnx_tests option.
There are 3 phases: update, build and test. Default is to do all three. Running the onnx tests is just a part of all the potential tests.
Add '--update --build' to skip all the tests.
This works. However, I still wish the test bugs could be fixed since the tests are also important for the software.
It is weird. Yesterday I tried to install the ONNX Runtime using the following command in a CUDA-cuDNN container on a RTX-2080TI computer at home, and it worked fine without any error. However, today I used the exact same container on a V100 computer. It produces error.
git clone --recursive https://github.com/Microsoft/onnxruntime
cd onnxruntime
git checkout rel-1.0.0
./build.sh --config Release --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_cuda --parallel --use_full_protobuf --update --build --build_wheel
cd build/Linux/Release/
make install
python ../../../setup.py install
The error message is:
[100%] Linking CUDA device code CMakeFiles/onnxruntime_pybind11_state.dir/cmake_device_link.o
[100%] Linking CXX shared module onnxruntime_pybind11_state.so
[100%] Built target onnxruntime_pybind11_state
2019-11-20 20:06:24,557 Build [DEBUG] - Running subprocess in '/tmp/onnxruntime/build/Linux/Release'
['/usr/bin/python3', '/tmp/onnxruntime/setup.py', 'bdist_wheel', '--use_cuda']
Traceback (most recent call last):
File "/tmp/onnxruntime/setup.py", line 224, in <module>
'Programming Language :: Python :: 3.7'],
File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.6/distutils/core.py", line 134, in setup
ok = dist.parse_command_line()
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 706, in parse_command_line
result = _Distribution.parse_command_line(self)
File "/usr/lib/python3.6/distutils/dist.py", line 472, in parse_command_line
args = self._parse_command_opts(parser, args)
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 1021, in _parse_command_opts
nargs = _Distribution._parse_command_opts(self, parser, args)
File "/usr/lib/python3.6/distutils/dist.py", line 534, in _parse_command_opts
if not issubclass(cmd_class, Command):
TypeError: issubclass() arg 1 must be a class
Traceback (most recent call last):
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1041, in <module>
sys.exit(main())
File "/tmp/onnxruntime/tools/ci_build/build.py", line 1032, in main
build_python_wheel(source_dir, build_dir, configs, args.use_cuda, args.use_ngraph, args.use_tensorrt, args.use_openvino, args.use_nuphar, nightly_build)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 793, in build_python_wheel
run_subprocess(args, cwd=cwd)
File "/tmp/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/onnxruntime/setup.py', 'bdist_wheel', '--use_cuda']' returned non-zero exit status 1.
On my another computer having Titan V, using the same container got the same problem.
[ 98%] Linking CUDA device code CMakeFiles/onnxruntime_pybind11_state.dir/cmake_device_link.o
[100%] Linking CXX shared module onnxruntime_pybind11_state.so
[100%] Built target onnxruntime_pybind11_state
2019-11-20 20:46:57,017 Build [DEBUG] - Running subprocess in '/workspace/onnxruntime/build/Linux/Release'
['/usr/bin/python3', '/workspace/onnxruntime/setup.py', 'bdist_wheel', '--use_cuda']
Traceback (most recent call last):
File "/workspace/onnxruntime/setup.py", line 220, in <module>
'Programming Language :: Python :: 3.7'],
File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 145, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.6/distutils/core.py", line 134, in setup
ok = dist.parse_command_line()
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 706, in parse_command_line
result = _Distribution.parse_command_line(self)
File "/usr/lib/python3.6/distutils/dist.py", line 472, in parse_command_line
args = self._parse_command_opts(parser, args)
File "/usr/local/lib/python3.6/dist-packages/setuptools/dist.py", line 1021, in _parse_command_opts
nargs = _Distribution._parse_command_opts(self, parser, args)
File "/usr/lib/python3.6/distutils/dist.py", line 534, in _parse_command_opts
if not issubclass(cmd_class, Command):
TypeError: issubclass() arg 1 must be a class
Traceback (most recent call last):
File "/workspace/onnxruntime/tools/ci_build/build.py", line 1043, in <module>
sys.exit(main())
File "/workspace/onnxruntime/tools/ci_build/build.py", line 1034, in main
build_python_wheel(source_dir, build_dir, configs, args.use_cuda, args.use_ngraph, args.use_tensorrt, args.use_openvino, args.use_nuphar, nightly_build)
File "/workspace/onnxruntime/tools/ci_build/build.py", line 795, in build_python_wheel
run_subprocess(args, cwd=cwd)
File "/workspace/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess
return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/workspace/onnxruntime/setup.py', 'bdist_wheel', '--use_cuda']' returned non-zero exit status 1.
Removing --build_wheel allows to build successfully. What are the potential problems here?
./build.sh --config Release --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_cuda --parallel --use_full_protobuf --update --build
This is extremely weird. I uninstalled Python 3.6 in the container, installed Python 3.7 and rebuilt. The error message was gone and the building was successful. While I don't want to use Python 3.7, is there any solution that can keep my Python 3.6? Thanks.
I saw in the "official" container, you guys are building against Python 3.7.
Any follow up on this?
It should work with python 3.6. setup.py however swallows any import errors which might be the cause of your issue. Seems pretty similar to this: https://github.com/intel-isl/Open3D/pull/1012/files
Do you have the 'wheel' package installed for python 3.6?
Regarding this:
It seems that I could still install the Python package even though the test was failed. This misleading messages and tests should really be avoided.
Failed Test Cases:
/tmp/onnxruntime/tools/ci_build/build.py:620: UserWarning: onnxmltools and keras are not installed. Following test cannot be run.
warnings.warn("onnxmltools and keras are not installed. Following test cannot be run.")
The test isn't actually failing, The first line is actually listing that there were no failed test cases. The following lines are a warning message from the next step in the testing process. I'll see if a message can be inserted in between so it's clearer that the script the 'Failed Test Cases:' is coming from completes.
Finally, would it be helpful to have a '--skip_tests' parameter that skipped all tests?