Apollo: Fail to Install Nvidia Driver inside docker

Created on 7 Feb 2018 · 12Comments · Source: ApolloAuto/apollo

Hi guys,
I was trying to run perception module following the instruction https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_perception_module_on_your_local_computer.md, however I cannot intall the nvidia driver successfully. I have a TITAN X and have installed 384.111 driver and cuda 8.0 outside the docker.

I encountered the following error, can anyone help me to solve it ? Thanks.

ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.
This may be because it is in use (for example, by an X server, a CUDA program, or the
NVIDIA Persistence Daemon), but this may also happen if your kernel was configured
without support for module unloading. Please be sure to exit any programs that may
be using the GPU(s) before attempting to upgrade your driver. If no GPU-based
programs are running, you know that your kernel supports module unloading, and you
still receive this message, then an error may have occured that has corrupted an
NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your
computer.

ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for
details. You may find suggestions on fixing installation problems in the README
available on the Linux driver download page at www.nvidia.com.

waiting for response Docker Question

Source

cxia30

👍1

All 12 comments

Have you tried to install cuda 8.0 inside docker? It can help to automatically install the nVidia driver.

lianglia-apollo on 7 Feb 2018

@lianglia-apollo Thank you for your quick reply. I installed cuda via deb inside the docker, but I still cannot install nvidia driver. Normally I should find nvidia-smi in /usr/bin inside the docker, but there isnt such a file after I installed the cuda.

cxia30 on 8 Feb 2018

👍1

Basically, you need to disable to x server when you install nvidia drivers. It is not always working smoothly on my computer.

You can refer this link for some details: https://devtalk.nvidia.com/default/topic/1018078/how-to-disable-x-server-when-installing-cuda8-0-/

lianglia-apollo on 8 Feb 2018

@lianglia-apollo I tried to disable X server, but still failed. Have you ever tried to install the nvidia driver on Ubtuntu 16.04? Thanks!

cxia30 on 12 Feb 2018

@cxia30 Hi cxia30, I've also met the same problem, do you fix it?

ytzhao on 26 Feb 2018

try replace these three lines
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/375.39/NVIDIA-Linux-x86_64-375.39.run
chmod +x ./NVIDIA-Linux-x86_64-375.39.run
./NVIDIA-Linux-x86_64-375.39.run --no-opengl-files -a -s

into
sudo apt-get install nvidia-384

BTW as the How to Run Perception Module on Your Local Computer said

We have already installed the CUDA and Caffe libraries in the released docker. However, the Nvidia GPU driver is not installed in the released dev docker image.

There is no need to install CUDA by yourself, unless you want to switch version

kkyiss on 7 Mar 2018

Nvidia driver installation fails for me. nvidia_drm is already loaded when I first enter the container and cannot be unloaded with 'sudo rmmod nvidia_drm' even after stopping lightdm.

danmartinez78 on 16 Mar 2018

👍1

Can Nvidia environment (CUDA+cuDNN) on the host change environment inside the Apollo docker? It's so strange that my own committed docker image failed to run after a while...during, I configure CUDA environment for tensorflow...

Durant35 on 16 Apr 2018

@kkyiss

sudo apt-get install nvidia-384

works for me, including directly from apollo_dev image, and isn't strange more

Host

$ nvidia-smi 
Mon Apr 16 22:25:55 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 940MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P0    N/A /  N/A |    445MiB /  2002MiB |     31%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1152      G   /usr/lib/xorg/Xorg                           262MiB |
|    0      2604      G   compiz                                        92MiB |
|    0      3675      G   ...-token=F714A07570C6D271EB760354A86BC2C1    84MiB |
|    0      7904      G   WeChatWeb.exe                                  3MiB |
+-----------------------------------------------------------------------------+

Docker

$ nvidia-smi 
Mon Apr 16 22:21:19 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 940MX       Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   61C    P0    N/A /  N/A |    443MiB /  2002MiB |     36%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

But results in https://github.com/ApolloAuto/apollo/issues/2641

Durant35 on 16 Apr 2018

👍1

Is this issue still persistent? Please try the same using the latest documentation.