Using EC2 Amazon machine, with nvidia drivers version 361.42, and nvidia-docker, nvidia-docker-plugin installed and running.
running latest DIGITS (4.0) shows in the log:
cudaRuntimeGetVersion() failed with error #35
nvidia-docker volume ls on my machine shows
nvidia-docker nvidia_driver_361.42
there are no CUDA bin files (e.g. deviceQuery or nvidia-smi) that I could find in the DIGITS docker, but running
nvidia-docker run --rm nvidia/cuda nvidia-smi
results in
| NVIDIA-SMI 361.42 Driver Version: 361.42 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 35C P8 17W / 125W | 11MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
trying to nvidia-docker build a dockerfile based on nvidia/cuda:7.0-cudnn4-devel-ubuntu14.04 which clones the master branch of caffe and compiles it with cudnn enabled fails on the beginning of testing with the following error:
Cuda number of devices: 0
Setting to use device 0
Current device id: 0
Current device name:
Note: Randomizing tests' orders with a seed of 21847 .
[==========] Running 2081 tests from 277 test cases.
[----------] Global test environment set-up.
[----------] 50 tests from NeuronLayerTest/3, where TypeParam = caffe::GPUDevice<double>
[ RUN ] NeuronLayerTest/3.TestSigmoidGradient
E0905 10:18:15.161348 263 common.cpp:113] Cannot create Cublas handle. Cublas won't be available.
E0905 10:18:15.162796 263 common.cpp:120] Cannot create Curand generator. Curand won't be available.
F0905 10:18:15.162914 263 syncedmem.hpp:18] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version
But oddly enough, beniz/deepdetect_gpu does seem to work properly with the GPU...
Any Ideas?
Looks like your driver wasn't installed properly. How did you install it?
It's Ubuntu 15.10 (GNU/Linux 4.2.0-42-generic x86_64), this is what I did from the beginning:
$ sudo apt-get update
$ sudo apt-get install --no-install-recommends -y gcc make libc-dev
$ wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/361.42/NVIDIA-Linux-x86_64-361.42.run
$ sudo sh /tmp/NVIDIA-Linux-x86_64-361.42.run --silent
$ wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb
$ sudo dpkg -i /tmp/nvidia-docker_.deb && rm /tmp/nvidia-docker_.deb
$ sudo apt-get install dkms build-essential linux-headers-generic
$ sudo nano /etc/modprobe.d/blacklist-nouveau.conf
adding the following lines:
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
save and quit
$echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
$sudo update-initramfs -u
I may have had to re-run the nvidia installer again at this stage. (exactly the same 2 lines as before)
And finally
$sudo usermod -aG docker ubuntu
$sudo service nvidia-docker start
made sure both docker and nvidia-docker-plugin services are up:
$service nvidia-docker status
$service docker status
And as mentioned above, the nvidia/cuda docker is able to run nvidia-smi and show the GPU and driver versions show as expected, and beniz/deepdetect_gpu does seem to work properly with the GPU.
What's the the output of ldconfig -p | grep libcuda and sudo ls -lR /var/lib/nvidia-docker | grep libcuda
$ldconfig -p | grep libcuda
libcuda.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so
$sudo ls -lR /var/lib/nvidia-docker | grep libcuda
lrwxrwxrwx 1 nvidia-docker nvidia-docker 17 Sep 1 09:36 libcuda.so -> libcuda.so.361.42
lrwxrwxrwx 1 nvidia-docker nvidia-docker 17 Sep 1 09:36 libcuda.so.1 -> libcuda.so.361.42
-rwxr-xr-x 2 root root 16881416 Aug 31 22:54 libcuda.so.361.42
Hmm. I just got a similar unexpected error while playing with a Torch-based docker image.
THCudaCheck FAIL file=/torch/extra/cutorch/lib/THC/THCGeneral.c line=20 error=35 : CUDA driver version is insufficient for CUDA runtime version
/torch/install/bin/luajit: /torch/install/share/lua/5.1/trepl/init.lua:384: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /torch/extra/cutorch/lib/THC/THCGeneral.c:20
stack traceback:
[C]: in function 'error'
/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
neural_style.lua:51: in function 'main'
neural_style.lua:515: in main chunk
[C]: in function 'dofile'
/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Ubuntu 16.04367.481.0.0~rc.3-11.12.1-0~xenialnvidia/cuda:7.5-cudnn5-devel-ubuntu14.04... but a DIGITS image and an NVcaffe image work fine? Not sure what's happening here.
@3XX0 helped me figure out my problem. I was trying to use CUDA while building the image, but it's not available yet. When I changed the last step in my Dockerfile from a RUN to a CMD, everything worked fine. Nevermind!
@Motherboard what does this command do for you?
nvidia-docker run --rm --entrypoint 'digits/device_query.py' nvidia/digits
nvidia-docker doesn't like it when I don't give it all the volumes declared in the docker file, so
$ nvidia-docker run --rm --entrypoint 'digits/device_query.py' nvidia/digits
gives
docker: Error response from daemon: create f64b902e8ee8344f2a45a9e0420aa63b2d70349473229877a65cb9ac47152029: bad volume format: f64b902e8ee8344f2a45a9e0420aa63b2d70349473229877a65cb9ac47152029.
But
$ nvidia-docker run --rm -v /home/ubuntu/notebook:/data -v /home/ubuntu/jobs:/jobs --entrypoint 'digits/device_query.py' nvidia/digits
gives
Device #0:
>>> CUDA attributes:
name GRID K520
totalGlobalMem 4294770688
clockRate 797000
major 3
minor 0
>>> NVML attributes:
Total memory 4095 MB
Used memory 48 MB
Memory utilization 0%
GPU utilization 0%
Temperature 36 C
I don't know what was previously wrong, but I've tried running digits again, and it seems to be fine...
Can't reproduce the error...
I have pain a lot trying to use my GTX 860M in a Lenovo Y70 machine with i7 and intel integrated graphics card and one error is quite similar to the ones you are getting . I discover this regarding how to activate nvidia before any try to access to it thru drivers... Just for giving you ideas:
Just to open a possible solution path:
when I type: NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/deviceQuery, I get:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
But if I try with $optirun NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release/deviceQuery
the result is the one we want....
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 860M"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 4044 MBytes (4240965632 bytes)
( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores
GPU Max Clock rate: 1020 MHz (1.02 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GTX 860M
Result = PASS
That make me think all my problems are related to the way I invoque programs . Now I'm investigating how to make it work with torch for recurrent neural networks but with GPU....