Hi all,
I have some issues while installing NVIDIA drivers inside the Apollo docker. Error is reported below.
I have tried to install NVIDIA drivers version 375, 390 and 396 but the result remains the same. Moreover I have tried to start from NVIDIA docker provided on NVIDIA web site as from the docker provided by Apollo. Any ideas on that?
Thanks for your support,
Alberto
av@in_dev_docker:/apollo$ nvidia-smi
Failed to initialize NVML: Function Not Found
Hi Alberto,
Did you follow the instructions from Apollo doc or Nvidia official doc? And what docker image version are you working with?
Hi,
yes I have followed https://github.com/ApolloAuto/apollo/blob/master/docs/howto/how_to_run_perception_module_on_your_local_computer.md but trying to update to Nvidia driver version 396.24 + Cuda Driver 9.2 with Cuda Runtime 9.1 as installed in my Ubuntu host system.
The problem is inside the Docker image (apolloauto/apollo:dev-x86_64-20180508_1647) on which I am not able to install CUDA toolkit 9.2 and NVIDIA drivers properly.
I have first installed the CUDA Driver 9.2 (cuda-linux.9.2.88-23920284.run) and then the NVIDIA Driver 396 inside the Docker (using "apt-get nvidia-396"). However the deviceQuery returns "cudaGetDeviceCount returned 38 -> no CUDA-capable device is detected".
Moreover nvidia-smi is returning "Failed to initialize NVML: Function not found"
PS I am using NVIDIA GeForce GTX 1080 and NVIDIA drivers on host system are working properly
Thanks,
Alberto
@albe81x Apollo docker aims to provide a uniform dev environment for all developers, and officially supports cuda 8.0 on ubuntu 14.04 as its standard dev environment. If you'd like to try different versions of cuda, you may have to consider a different nvidia docker base and build your own docker image for development.
Hi,
do you see any issues on having CUDA 8.0 working on Ubuntu 16.04 inside the container? I got the same results trying to downgrade CUDA from 9.0 to 8.0...
If you have issues with cuda in docker, you can try the following:
Many thanks for your suggestions. They have been helpful and the "nvidia-smi" issue solved keeping exactly the same version between host and docker. It's easier to install the NVIDIA driver in the host using "apt-get install nvidia-384" (the .run file can lead problems on nvidia-drm module load and X sever as reported in other threads) and downloading the same run file version from NVIDIA web site (384.130 in my case) to be installed in the docker. The command line for my docker has been:
_sudo /NVIDIA-Linux-x86_64-384.130.run --no-kernel-module --no-x-check -a -s --no-opengl-files_
Infact when installing the driver the above options should be combined with no openGL option to avoid messing up the GLX library as reported here https://github.com/ApolloAuto/apollo/issues/2641.
The problem now is that the visualizer still is still not working due to a segmentation fault:
_[...]
time: 0 ms
E0525 11:59:39.763595 93 offline_lidar_visualizer_tool.cc:120] transformed cloud size is 107300
I0525 11:59:39.763603 93 frame_content.cc:37] initial pose 00-0.247067 0000.968706 0-0.0238254 00000587363
00-0.968991 00-0.246887 000.0102645 4.14135e+06
00.00406111 000.0256226 0000.999664 000-30.3401
00000000000 00000000000 00000000000 00000000001
I0525 11:59:39.763684 93 frame_content.cc:38] offset = -587363 -4.14135e+06 30.3401
Segmentation fault (core dumped)_
In my case the GLXGEARS reports a problem with the swrast library which is present here "/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so". Below the error log....
_av@in_dev_docker:/apollo$ glxgears
libGL error: failed to load driver: swrast
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 35
Current serial number in output stream: 37_
@techoe can you please help?
Hi guys,
I can provide further details on this error. I have also tried to use Ubuntu 14.04 following the same procedure. Do you have any details on how to map openGL drivers inside the apollo docker?
Thanks for your support,
Alberto
_av@in_dev_docker:/apollo$ export LIBGL_DEBUG=verbose
av@in_dev_docker:/apollo$ glxinfo
name of display: :0
libGL: screen 0 does not appear to be DRI2 capable
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/tls/swrast_dri.so
libGL: OpenDriver: trying /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so
libGL: driver does not expose __driDriverGetExtensions_swrast(): /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so: undefined symbol: __driDriverGetExtensions_swrast
libGL: Can't open configuration file /home/av/.drirc: No such file or directory.
libGL: Can't open configuration file /home/av/.drirc: No such file or directory.
libGL error: failed to load driver: swrast
X Error of failed request: GLXBadContext
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 6 (X_GLXIsDirect)
Serial number of failed request: 44
Current serial number in output stream: 43_
Hi @techoe can you help me on the above topic please? many thanks
Aberto,
glfw has an issue on running on Ubuntu 16.
Please try to follow this document and let me know if it works.
apollo/docs/howto/how_to_run_apollo_2.5_with_ubuntu16.md
T
Thank you,
Tae Eun
Closing this issue as there has been no communication in a while. Feel free to open it if you are still seeing the error above.