Hello,
I have downloaded the latest image of wav2letter-cuda
(Dockerhub DIGEST:sha256:228eb912d5a61a151de4abf9cd9c25a4aa3a9a912ed2d538c7a3a574d948b11a)
When running the make test within the docker the tests fails.
This has been reproduced on 2 separate computers, with cuda 10.1, and cuda 10.2. Same test fails.
Test result:
Running tests...
Test project /root/wav2letter/build
Start 1: W2lCommonTest
1/31 Test #1: W2lCommonTest ....................***Failed 1.00 sec
Start 2: DictionaryTest
2/31 Test #2: DictionaryTest ................... Passed 0.03 sec
Start 3: ProducerConsumerQueueTest
3/31 Test #3: ProducerConsumerQueueTest ........ Passed 0.03 sec
Start 4: CriterionTest
4/31 Test #4: CriterionTest ....................***Failed 0.07 sec
Start 5: Seq2SeqTest
5/31 Test #5: Seq2SeqTest ......................***Failed 0.04 sec
Start 6: AttentionTest
6/31 Test #6: AttentionTest ....................***Failed 0.07 sec
Start 7: WindowTest
7/31 Test #7: WindowTest .......................***Failed 0.04 sec
Start 8: DataTest
8/31 Test #8: DataTest .........................***Failed 0.11 sec
Start 9: ListFileDatasetTest
9/31 Test #9: ListFileDatasetTest ..............***Failed 0.03 sec
Start 10: SoundTest
10/31 Test #10: SoundTest ........................ Passed 0.09 sec
Start 11: DecoderTest
11/31 Test #11: DecoderTest ...................... Passed 1.56 sec
Start 12: CeplifterTest
12/31 Test #12: CeplifterTest .................... Passed 0.03 sec
Start 13: DctTest
13/31 Test #13: DctTest .......................... Passed 0.08 sec
Start 14: DerivativesTest
14/31 Test #14: DerivativesTest .................. Passed 0.03 sec
Start 15: DitherTest
15/31 Test #15: DitherTest ....................... Passed 8.03 sec
Start 16: MfccTest
16/31 Test #16: MfccTest ......................... Passed 0.54 sec
Start 17: PreEmphasisTest
17/31 Test #17: PreEmphasisTest .................. Passed 0.03 sec
Start 18: SpeechUtilsTest
18/31 Test #18: SpeechUtilsTest ..................***Failed 0.03 sec
Start 19: TriFilterbankTest
19/31 Test #19: TriFilterbankTest ................ Passed 0.03 sec
Start 20: WindowingTest
20/31 Test #20: WindowingTest .................... Passed 0.03 sec
Start 21: W2lModuleTest
21/31 Test #21: W2lModuleTest ....................***Failed 0.04 sec
Start 22: RuntimeTest
22/31 Test #22: RuntimeTest ......................***Failed 0.03 sec
Start 23: inference_Conv1dTest
23/31 Test #23: inference_Conv1dTest ............. Passed 0.01 sec
Start 24: inference_IdentityTest
24/31 Test #24: inference_IdentityTest ........... Passed 0.01 sec
Start 25: inference_LayerNormTest
25/31 Test #25: inference_LayerNormTest .......... Passed 0.02 sec
Start 26: inference_LinearTest
26/31 Test #26: inference_LinearTest ............. Passed 0.01 sec
Start 27: inference_LogMelFeatureTest
27/31 Test #27: inference_LogMelFeatureTest ...... Passed 0.06 sec
Start 28: inference_MemoryManagerTest
28/31 Test #28: inference_MemoryManagerTest ...... Passed 0.01 sec
Start 29: inference_ReluTest
29/31 Test #29: inference_ReluTest ............... Passed 0.01 sec
Start 30: inference_ResidualTest
30/31 Test #30: inference_ResidualTest ........... Passed 0.01 sec
Start 31: inference_TDSBlockTest
31/31 Test #31: inference_TDSBlockTest ........... Passed 0.01 sec
68% tests passed, 10 tests failed out of 31
Total Test time (real) = 12.14 sec
The following tests FAILED:
1 - W2lCommonTest (Failed)
4 - CriterionTest (Failed)
5 - Seq2SeqTest (Failed)
6 - AttentionTest (Failed)
7 - WindowTest (Failed)
8 - DataTest (Failed)
9 - ListFileDatasetTest (Failed)
18 - SpeechUtilsTest (Failed)
21 - W2lModuleTest (Failed)
22 - RuntimeTest (Failed)
Errors while running CTest
Makefile:118: recipe for target 'test' failed
make: *** [test] Error 8
When running the W2lCommonTest for example the output is:
[==========] Running 19 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 19 tests from W2lCommonTest
[ RUN ] W2lCommonTest.StringTrim
[ OK ] W2lCommonTest.StringTrim (0 ms)
[ RUN ] W2lCommonTest.ReplaceAll
[ OK ] W2lCommonTest.ReplaceAll (0 ms)
[ RUN ] W2lCommonTest.StringSplit
[ OK ] W2lCommonTest.StringSplit (0 ms)
[ RUN ] W2lCommonTest.StringJoin
[ OK ] W2lCommonTest.StringJoin (0 ms)
[ RUN ] W2lCommonTest.StringFormat
[ OK ] W2lCommonTest.StringFormat (0 ms)
[ RUN ] W2lCommonTest.PathsConcat
[ OK ] W2lCommonTest.PathsConcat (0 ms)
[ RUN ] W2lCommonTest.RetryWithBackoff
[ OK ] W2lCommonTest.RetryWithBackoff (753 ms)
[ RUN ] W2lCommonTest.PackReplabels
[ OK ] W2lCommonTest.PackReplabels (0 ms)
[ RUN ] W2lCommonTest.Dictionary
[ OK ] W2lCommonTest.Dictionary (0 ms)
[ RUN ] W2lCommonTest.UnpackReplabels
[ OK ] W2lCommonTest.UnpackReplabels (0 ms)
[ RUN ] W2lCommonTest.UnpackReplabelsIgnoresInvalid
[ OK ] W2lCommonTest.UnpackReplabelsIgnoresInvalid (0 ms)
[ RUN ] W2lCommonTest.Uniq
[ OK ] W2lCommonTest.Uniq (0 ms)
[ RUN ] W2lCommonTest.Normalize
unknown file: Failure
C++ exception with description "ArrayFire Exception (Runtime error :103):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/device_manager.cpp:531
Error initializing CUDA runtime. Check your CUDA device is visible to the OS and you have installed the correct driver. Try running the nvidia-smi utility to debug any driver issues.
0# 0x00007F04A992D810 in /usr/local/lib/libafcuda.so.3
1# 0x00007F04A992DE7D in /usr/local/lib/libafcuda.so.3
2# 0x00007F04A9CC73FA in /usr/local/lib/libafcuda.so.3
3# 0x00007F04A9CC3CEC in /usr/local/lib/libafcuda.so.3
4# 0x00007F04A973762D in /usr/local/lib/libafcuda.so.3
5# 0x00007F04A973771F in /usr/local/lib/libafcuda.so.3
6# 0x00007F04AB759FAD in /usr/local/lib/libafcuda.so.3
7# 0x00007F04AA50BB28 in /usr/local/lib/libafcuda.so.3
8# af_randu in /usr/local/lib/libafcuda.so.3
9# af::randu(af::dim4 const&, af_dtype) in /usr/local/lib/libafcuda.so.3
10# af::randu(long long, long long, long long, af_dtype) in /usr/local/lib/libafcuda.so.3
11# 0x000055C0B07B0417 in ./src/tests/W2lC" thrown in the test body.
[ FAILED ] W2lCommonTest.Normalize (7 ms)
[ RUN ] W2lCommonTest.Transpose
unknown file: Failure
C++ exception with description "ArrayFire Exception (Runtime error :103):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/device_manager.cpp:531
Error initializing CUDA runtime. Check your CUDA device is visible to the OS and you have installed the correct driver. Try running the nvidia-smi utility to debug any driver issues.
0# 0x00007F04A992D810 in /usr/local/lib/libafcuda.so.3
1# 0x00007F04A992DE7D in /usr/local/lib/libafcuda.so.3
2# 0x00007F04A9CC73FA in /usr/local/lib/libafcuda.so.3
3# 0x00007F04A9CC3CEC in /usr/local/lib/libafcuda.so.3
4# 0x00007F04A973762D in /usr/local/lib/libafcuda.so.3
5# 0x00007F04A973771F in /usr/local/lib/libafcuda.so.3
6# 0x00007F04AB759FAD in /usr/local/lib/libafcuda.so.3
7# 0x00007F04AA50BB28 in /usr/local/lib/libafcuda.so.3
8# af_randu in /usr/local/lib/libafcuda.so.3
9# af::randu(af::dim4 const&, af_dtype) in /usr/local/lib/libafcuda.so.3
10# af::randu(long long, long long, long long, long long, af_dtype) in /usr/local/lib/libafcuda.so.3
11# 0x000055C0B07AFEDA in ./src" thrown in the test body.
[ FAILED ] W2lCommonTest.Transpose (2 ms)
[ RUN ] W2lCommonTest.localNormalize
unknown file: Failure
C++ exception with description "ArrayFire Exception (Runtime error :103):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/device_manager.cpp:531
Error initializing CUDA runtime. Check your CUDA device is visible to the OS and you have installed the correct driver. Try running the nvidia-smi utility to debug any driver issues.
0# 0x00007F04A992D810 in /usr/local/lib/libafcuda.so.3
1# 0x00007F04A992DE7D in /usr/local/lib/libafcuda.so.3
2# 0x00007F04A9CC73FA in /usr/local/lib/libafcuda.so.3
3# 0x00007F04A9CC3CEC in /usr/local/lib/libafcuda.so.3
4# 0x00007F04A973762D in /usr/local/lib/libafcuda.so.3
5# 0x00007F04A973771F in /usr/local/lib/libafcuda.so.3
6# 0x00007F04AB759FAD in /usr/local/lib/libafcuda.so.3
7# 0x00007F04AA50BB28 in /usr/local/lib/libafcuda.so.3
8# af_randu in /usr/local/lib/libafcuda.so.3
9# af::randu(af::dim4 const&, af_dtype) in /usr/local/lib/libafcuda.so.3
10# af::randu(long long, long long, long long, long long, af_dtype) in /usr/local/lib/libafcuda.so.3
11# 0x000055C0B07B2694 in ./src" thrown in the test body.
[ FAILED ] W2lCommonTest.localNormalize (2 ms)
[ RUN ] W2lCommonTest.AfMatrixToStrings
unknown file: Failure
C++ exception with description "ArrayFire Exception (Runtime error :103):
In function cuda::DeviceManager::DeviceManager()
In file src/backend/cuda/device_manager.cpp:531
Error initializing CUDA runtime. Check your CUDA device is visible to the OS and you have installed the correct driver. Try running the nvidia-smi utility to debug any driver issues.
0# 0x00007F04A992D810 in /usr/local/lib/libafcuda.so.3
1# 0x00007F04A992DE7D in /usr/local/lib/libafcuda.so.3
2# 0x00007F04A9CC73FA in /usr/local/lib/libafcuda.so.3
3# 0x00007F04A9CC41EC in /usr/local/lib/libafcuda.so.3
4# 0x00007F04A976196A in /usr/local/lib/libafcuda.so.3
5# 0x00007F04A976263C in /usr/local/lib/libafcuda.so.3
6# af_create_array in /usr/local/lib/libafcuda.so.3
7# 0x00007F04AA63942A in /usr/local/lib/libafcuda.so.3
8# af::array::array<int>(long long, long long, int const*, af_source) in /usr/local/lib/libafcuda.so.3
9# 0x000055C0B07B6052 in ./src/tests/W2lCommonTest
10# 0x000055C0B09B567A in ./src/tests/W2lCommonTest
11# 0x000055C0B09AB023 in ./src/tests/W2lCommon" thrown in the test body.
[ FAILED ] W2lCommonTest.AfMatrixToStrings (2 ms)
[ RUN ] W2lCommonTest.WrdToTarget
Falling back to using letters as targets for the unknown word: 111
Falling back to using letters as targets for the unknown word: 199
Skipping unknown token '9' when falling back to letter target for the unknown word: 199
Skipping unknown token '9' when falling back to letter target for the unknown word: 199
Skipping unknown word '111' when generating target
Skipping unknown word '199' when generating target
[ OK ] W2lCommonTest.WrdToTarget (0 ms)
[ RUN ] W2lCommonTest.TargetToSingleLtr
[ OK ] W2lCommonTest.TargetToSingleLtr (0 ms)
[ RUN ] W2lCommonTest.UT8Split
[ OK ] W2lCommonTest.UT8Split (0 ms)
[----------] 19 tests from W2lCommonTest (766 ms total)
[----------] Global test environment tear-down
[==========] 19 tests from 1 test suite ran. (767 ms total)
[ PASSED ] 15 tests.
[ FAILED ] 4 tests, listed below:
[ FAILED ] W2lCommonTest.Normalize
[ FAILED ] W2lCommonTest.Transpose
[ FAILED ] W2lCommonTest.localNormalize
[ FAILED ] W2lCommonTest.AfMatrixToStrings
4 FAILED TESTS
Any ideas on what's the issue here?
docker pull wav2letter/wav2letter:cuda-latestdocker run --gpus all --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cuda-latest/root/wav2letter/build and run make testUbuntu 18.04
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 On | 00000000:01:00.0 On | N/A |
| N/A 58C P0 27W / N/A | 1406MiB / 6078MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
I did the same command as you pointed, everything works for me.
The problem here could be that driver is not appropriate for the CUDA version we use inside docker. Doesn't matter which CUDA you installed on your machine, only CUDA driver matters. The error you have is pointing exactly to this Error initializing CUDA runtime. Check your CUDA device is visible to the OS and you have installed the correct driver. Try running the nvidia-smi utility to debug any driver issues.
Could you confirm that for other tools/frameworks you don't have problems of CUDA driver with your CUDA 10.1 and 10.2 installed?
Mine machine shows Driver Version: 418.116.00 CUDA Version: 10.1. Probably your driver is too new for the version we build. One possible solution for you (in case of not downgrade your driver) is to rebuild base docker image for the flashlight and then rebuild base docker for the wav2letter.
Thanks for your reply @tlikhomanenko
I tried to rebuild the docker, and the issue still persisted.
Than I tried on a different environment. It passed with no issue.
I just had to expose the GPU using export CUDA_VISIBLE_DEVICES=1
I will re-install the driver on my first environment, and try again.
Thanks for the help!
Hello,
I have the same problem with the Docker image, following the wiki :
sudo docker run --gpus all --rm -itd --ipc=host --name w2l wav2letter/wav2letter:cuda-latest
(by the way runtime=nvidia does not work with the new nvidia docker -> wiki correction needed)
I installed docker, Nvidia docker and driver as required.
Thanks for any help!
@iggygeek
what cuda driver version do you have? Also could you try to build image on your machine and test if it works?
My apologies, it was just that all GPUs where busy on this machine ...
@baruchih feel free to reopen if the problem still persists.