Nvidia-docker: Rhel7 failed to initialize NVML: Unknown Error

Created on 16 Jan 2020  路  2Comments  路  Source: NVIDIA/nvidia-docker

_The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense._

_Also, before reporting a new issue, please make sure that:_


1. Issue or feature description

#docker run --rm nvidia/cuda:latest nvidia-smi
Failed to initialize NVML: Unknown Error

2. Steps to reproduce the issue

1.disable nouveau and install nvidia driver 440.44
2.after reboot, run nvidia-smi is ok.
3.follow readme and install nvidia-container-toolkit
4. disable selinux and firewall
5. pull dock.io/nvidia/cuda:latest
6. exec  `docker run --rm nvidia/cuda:latest nvidia-smi`

3. Information to attach (optional if deemed irrelevant)

  • [x] Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0116 06:15:34.425511 5674 nvc.c:281] initializing library context (version=1.0.5, build=13b836390888f7b7c7dca115d16d7e28ab15a836)
I0116 06:15:34.425562 5674 nvc.c:255] using root /
I0116 06:15:34.425566 5674 nvc.c:256] using ldcache /etc/ld.so.cache
I0116 06:15:34.425569 5674 nvc.c:257] using unprivileged user 65534:65534
I0116 06:15:34.426374 5675 nvc.c:191] loading kernel module nvidia
I0116 06:15:34.426755 5675 nvc.c:203] loading kernel module nvidia_uvm
I0116 06:15:34.426846 5675 nvc.c:211] loading kernel module nvidia_modeset
I0116 06:15:34.427187 5676 driver.c:133] starting driver service
I0116 06:15:34.445930 5674 nvc_info.c:437] requesting driver information with ''
I0116 06:15:34.446106 5674 nvc_info.c:151] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.440.44
I0116 06:15:34.446231 5674 nvc_info.c:151] selecting /usr/lib64/libnvoptix.so.440.44
I0116 06:15:34.446268 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-tls.so.440.44
I0116 06:15:34.446290 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-rtcore.so.440.44
I0116 06:15:34.446312 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.440.44
I0116 06:15:34.446344 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-opticalflow.so.440.44
I0116 06:15:34.446373 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-opencl.so.440.44
I0116 06:15:34.446393 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-ml.so.440.44
I0116 06:15:34.446421 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-ifr.so.440.44
I0116 06:15:34.446448 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-glvkspirv.so.440.44
I0116 06:15:34.446468 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-glsi.so.440.44
I0116 06:15:34.446487 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-glcore.so.440.44
I0116 06:15:34.446507 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-fbc.so.440.44
I0116 06:15:34.446533 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-fatbinaryloader.so.440.44
I0116 06:15:34.446550 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-encode.so.440.44
I0116 06:15:34.446577 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-eglcore.so.440.44
I0116 06:15:34.446597 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-compiler.so.440.44
I0116 06:15:34.446617 5674 nvc_info.c:151] selecting /usr/lib64/libnvidia-cfg.so.440.44
I0116 06:15:34.446645 5674 nvc_info.c:151] selecting /usr/lib64/libnvcuvid.so.440.44
I0116 06:15:34.446862 5674 nvc_info.c:151] selecting /usr/lib64/libcuda.so.440.44
I0116 06:15:34.446961 5674 nvc_info.c:151] selecting /usr/lib64/libGLX_nvidia.so.440.44
I0116 06:15:34.446981 5674 nvc_info.c:151] selecting /usr/lib64/libGLESv2_nvidia.so.440.44
I0116 06:15:34.447000 5674 nvc_info.c:151] selecting /usr/lib64/libGLESv1_CM_nvidia.so.440.44
I0116 06:15:34.447021 5674 nvc_info.c:151] selecting /usr/lib64/libEGL_nvidia.so.440.44
I0116 06:15:34.447045 5674 nvc_info.c:151] selecting /usr/lib/vdpau/libvdpau_nvidia.so.440.44
I0116 06:15:34.447068 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-tls.so.440.44
I0116 06:15:34.447089 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-ptxjitcompiler.so.440.44
I0116 06:15:34.447119 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-opticalflow.so.440.44
I0116 06:15:34.447147 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-opencl.so.440.44
I0116 06:15:34.447167 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-ml.so.440.44
I0116 06:15:34.447193 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-ifr.so.440.44
I0116 06:15:34.447218 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-glvkspirv.so.440.44
I0116 06:15:34.447237 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-glsi.so.440.44
I0116 06:15:34.447255 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-glcore.so.440.44
I0116 06:15:34.447274 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-fbc.so.440.44
I0116 06:15:34.447300 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-fatbinaryloader.so.440.44
I0116 06:15:34.447317 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-encode.so.440.44
I0116 06:15:34.447342 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-eglcore.so.440.44
I0116 06:15:34.447361 5674 nvc_info.c:151] selecting /usr/lib/libnvidia-compiler.so.440.44
I0116 06:15:34.447383 5674 nvc_info.c:151] selecting /usr/lib/libnvcuvid.so.440.44
I0116 06:15:34.447412 5674 nvc_info.c:151] selecting /usr/lib/libcuda.so.440.44
I0116 06:15:34.447442 5674 nvc_info.c:151] selecting /usr/lib/libGLX_nvidia.so.440.44
I0116 06:15:34.447461 5674 nvc_info.c:151] selecting /usr/lib/libGLESv2_nvidia.so.440.44
I0116 06:15:34.447479 5674 nvc_info.c:151] selecting /usr/lib/libGLESv1_CM_nvidia.so.440.44
I0116 06:15:34.447498 5674 nvc_info.c:151] selecting /usr/lib/libEGL_nvidia.so.440.44
W0116 06:15:34.447508 5674 nvc_info.c:306] missing compat32 library libnvidia-cfg.so
W0116 06:15:34.447511 5674 nvc_info.c:306] missing compat32 library libnvidia-rtcore.so
W0116 06:15:34.447514 5674 nvc_info.c:306] missing compat32 library libnvoptix.so
I0116 06:15:34.447604 5674 nvc_info.c:232] selecting /usr/bin/nvidia-smi
I0116 06:15:34.447615 5674 nvc_info.c:232] selecting /usr/bin/nvidia-debugdump
I0116 06:15:34.447626 5674 nvc_info.c:232] selecting /usr/bin/nvidia-persistenced
I0116 06:15:34.447637 5674 nvc_info.c:232] selecting /usr/bin/nvidia-cuda-mps-control
I0116 06:15:34.447648 5674 nvc_info.c:232] selecting /usr/bin/nvidia-cuda-mps-server
I0116 06:15:34.447662 5674 nvc_info.c:369] listing device /dev/nvidiactl
I0116 06:15:34.447665 5674 nvc_info.c:369] listing device /dev/nvidia-uvm
I0116 06:15:34.447668 5674 nvc_info.c:369] listing device /dev/nvidia-uvm-tools
I0116 06:15:34.447671 5674 nvc_info.c:369] listing device /dev/nvidia-modeset
W0116 06:15:34.447686 5674 nvc_info.c:277] missing ipc /var/run/nvidia-persistenced/socket
W0116 06:15:34.447696 5674 nvc_info.c:277] missing ipc /tmp/nvidia-mps
I0116 06:15:34.447699 5674 nvc_info.c:493] requesting device information with ''
I0116 06:15:34.453348 5674 nvc_info.c:523] listing device /dev/nvidia0 (GPU-7d68e01b-2281-6e7a-3b9c-a3514f3b53fe at 00000000:0b:00.0)
NVRM version: 440.44
CUDA version: 10.2

Device Index: 0
Device Minor: 0
Model: TITAN Xp
Brand: GeForce
GPU UUID: GPU-7d68e01b-2281-6e7a-3b9c-a3514f3b53fe
Bus Location: 00000000:0b:00.0
Architecture: 6.1
I0116 06:15:34.453378 5674 nvc.c:318] shutting down library context
I0116 06:15:34.453690 5676 driver.c:192] terminating driver service
I0116 06:15:34.462164 5674 driver.c:233] driver service terminated successfully

  • [x] Kernel version from uname -a
    Linux RHEL 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
  • [ ] Any relevant kernel output lines from dmesg
  • [x] Driver information from nvidia-smi -a
    ==============NVSMI LOG==============

Timestamp : Thu Jan 16 14:17:10 2020
Driver Version : 440.44
CUDA Version : 10.2

Attached GPUs : 1
GPU 00000000:0B:00.0
Product Name : TITAN Xp
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Disabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0323817030033
GPU UUID : GPU-7d68e01b-2281-6e7a-3b9c-a3514f3b53fe
Minor Number : 0
VBIOS Version : 86.02.3D.00.01
MultiGPU Board : No
Board ID : 0xb00
GPU Part Number : 900-1G611-2530-000
Inforom Version
Image Version : G001.0000.01.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x0B
Device : 0x00
Domain : 0x0000
Device Id : 0x1B0210DE
Bus Id : 00000000:0B:00.0
Sub System Id : 0x11DF10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 2
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 1000 KB/s
Rx Throughput : 1000 KB/s
Fan Speed : 23 %
Performance State : P5
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 12192 MiB
Used : 147 MiB
Free : 12045 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 1 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 25 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 24.05 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 125.00 W
Max Power Limit : 300.00 W
Clocks
Graphics : 1404 MHz
SM : 1404 MHz
Memory : 810 MHz
Video : 1202 MHz
Applications Clocks
Graphics : 1404 MHz
Memory : 5705 MHz
Default Applications Clocks
Graphics : 1404 MHz
Memory : 5705 MHz
Max Clocks
Graphics : 1911 MHz
SM : 1911 MHz
Memory : 5705 MHz
Video : 1620 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 2634
Type : G
Name : /usr/bin/X
Used GPU Memory : 84 MiB
Process ID : 3864
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 59 MiB

  • [x] Docker version from docker version
    Client:
    Version: 1.13.1
    API version: 1.26
    Package version: docker-1.13.1-108.git4ef4b30.el7.x86_64
    Go version: go1.10.3
    Git commit: 4ef4b30/1.13.1
    Built: Fri Dec 13 01:48:25 2019
    OS/Arch: linux/amd64

Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Package version: docker-1.13.1-108.git4ef4b30.el7.x86_64
Go version: go1.10.3
Git commit: 4ef4b30/1.13.1
Built: Fri Dec 13 01:48:25 2019
OS/Arch: linux/amd64
Experimental: false

  • [x] NVIDIA packages version from dpkg -l '*nvidia*' _or_ rpm -qa '*nvidia*'
    libnvidia-container-tools-1.0.5-1.x86_64
    nvidia-container-toolkit-1.0.5-2.x86_64
    libnvidia-container1-1.0.5-1.x86_64
  • [x] NVIDIA container library version from nvidia-container-cli -V
    version: 1.0.5
    build date: 2019-09-06T16:59+0000
    build revision: 13b836390888f7b7c7dca115d16d7e28ab15a836
    build compiler: gcc 4.8.5 20150623 (Red Hat 4.8.5-36)
    build platform: x86_64
    build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
  • [ ] NVIDIA container library logs (see troubleshooting)
  • [ ] Docker command, image and tag used

Most helpful comment

Solved

# docker run --rm --privileged nvidia/cuda nvidia-smi

All 2 comments

Solved

# docker run --rm --privileged nvidia/cuda nvidia-smi

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -privileged -gpu -benchmark
Run "nbody -benchmark [-numbodies=]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies= (number of bodies (>= 1) to run in simulation)
-device= (where d=0,1,2.... for the CUDA device to use)
-numdevices= (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Error: only 0 Devices available, 1 requested. Exiting.

Was this page helpful?
0 / 5 - 0 ratings