Nvidia-docker: Newby question to CUDA container and ssh

Created on 18 Jan 2016  Â·  6Comments  Â·  Source: NVIDIA/nvidia-docker

Hello,
I have a machine with a proper CUDA and Docker installation. When I start an interactive container and for example do an nvidia-sim -l everything looks fine. However when I add an ssh server that in the future other users can also use CUDA (without knowing about Docker) the same container fails when I do an nvidia-sim, although the binary is there.
I read about the nvidia-docker-plugin, but I think I need something like a step by step instruction on how to use it.
Regards,
Stefan

question

Most helpful comment

Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:

RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

All 6 comments

I'm not sure I understood your problem correctly.
Where is sshd living? in the host or in the container? Are you using NV_HOST?
Can you give use the list of commands you issued with their respective output, it would help us reproduce the error.

Hello,
I did the following:

Prerequisites:

Finally my problem:

It looks the like this:

FROM cuda
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:screencast' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i    /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

I changed the password but that should not be an issue. The I build the container with docker with "docker build -t image_name_goes_here".

When I start the container interactively with "nvidia-docker run -it --name name_goes_here -p 10022:22 image_goes_here /bin/bash" I can use "nvidia-smi -q" to get the desired output.

BUT when I ssh into the same running container even a "which nvidia-smi" fails though it is in the right place.

Any ideas what I missed to get the desired behavior? I what the ssh-container solution because I do not want every user to work on the host machine though I know I does not completely fulfill the docker philosophy.

Regards,

Stefan

Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:

RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

Indeed that solved it. Thank you very much.

May I add another two questions then:

I hope my questions are not too abstract.

Regards,

Stefan

  1. The documentation of nvidia-docker and nvidia-docker-plugin explains it. The plugin is needed if you want to deploy NVIDIA Docker on a remote host (say AWS) or if you don't want to setup your volumes manually.
  2. You can, however your GPU processes will have to share the GPU. You can use NVIDIA MPS for that purpose.

Thanks a lot. I think I can go on with your information.

Regards,

Stefan

Was this page helpful?
0 / 5 - 0 ratings