Hi guys,
I was trying to run perception module following the instruction https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_perception_module_on_your_local_computer.md, however I cannot intall the nvidia driver successfully. I have a TITAN X and have installed 384.111 driver and cuda 8.0 outside the docker.
I encountered the following error, can anyone help me to solve it ? Thanks.
ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.
This may be because it is in use (for example, by an X server, a CUDA program, or the
NVIDIA Persistence Daemon), but this may also happen if your kernel was configured
without support for module unloading. Please be sure to exit any programs that may
be using the GPU(s) before attempting to upgrade your driver. If no GPU-based
programs are running, you know that your kernel supports module unloading, and you
still receive this message, then an error may have occured that has corrupted an
NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your
computer.
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for
details. You may find suggestions on fixing installation problems in the README
available on the Linux driver download page at www.nvidia.com.
Have you tried to install cuda 8.0 inside docker? It can help to automatically install the nVidia driver.
@lianglia-apollo Thank you for your quick reply. I installed cuda via deb inside the docker, but I still cannot install nvidia driver. Normally I should find nvidia-smi in /usr/bin inside the docker, but there isnt such a file after I installed the cuda.
Basically, you need to disable to x server when you install nvidia drivers. It is not always working smoothly on my computer.
You can refer this link for some details: https://devtalk.nvidia.com/default/topic/1018078/how-to-disable-x-server-when-installing-cuda8-0-/
@lianglia-apollo I tried to disable X server, but still failed. Have you ever tried to install the nvidia driver on Ubtuntu 16.04? Thanks!
@cxia30 Hi cxia30, I've also met the same problem, do you fix it?
try replace these three lines
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/375.39/NVIDIA-Linux-x86_64-375.39.run
chmod +x ./NVIDIA-Linux-x86_64-375.39.run
./NVIDIA-Linux-x86_64-375.39.run --no-opengl-files -a -s
into
sudo apt-get install nvidia-384
BTW as the How to Run Perception Module on Your Local Computer said
We have already installed the CUDA and Caffe libraries in the released docker. However, the Nvidia GPU driver is not installed in the released dev docker image.
There is no need to install CUDA by yourself, unless you want to switch version
Nvidia driver installation fails for me. nvidia_drm is already loaded when I first enter the container and cannot be unloaded with 'sudo rmmod nvidia_drm' even after stopping lightdm.
Can Nvidia environment (CUDA+cuDNN) on the host change environment inside the Apollo docker? It's so strange that my own committed docker image failed to run after a while...during, I configure CUDA environment for tensorflow...
@kkyiss
sudo apt-get install nvidia-384
works for me, including directly from apollo_dev image, and isn't strange more
$ nvidia-smi
Mon Apr 16 22:25:55 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 56C P0 N/A / N/A | 445MiB / 2002MiB | 31% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1152 G /usr/lib/xorg/Xorg 262MiB |
| 0 2604 G compiz 92MiB |
| 0 3675 G ...-token=F714A07570C6D271EB760354A86BC2C1 84MiB |
| 0 7904 G WeChatWeb.exe 3MiB |
+-----------------------------------------------------------------------------+
$ nvidia-smi
Mon Apr 16 22:21:19 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 61C P0 N/A / N/A | 443MiB / 2002MiB | 36% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
But results in https://github.com/ApolloAuto/apollo/issues/2641
Is this issue still persistent? Please try the same using the latest documentation.
@natashadsouza Close it?
@Durant35 thank you! Closing this issue.