Nvidia-docker: Newby question to CUDA container and ssh

Created on 18 Jan 2016 · 6Comments · Source: NVIDIA/nvidia-docker

Hello,
I have a machine with a proper CUDA and Docker installation. When I start an interactive container and for example do an nvidia-sim -l everything looks fine. However when I add an ssh server that in the future other users can also use CUDA (without knowing about Docker) the same container fails when I do an nvidia-sim, although the binary is there.
I read about the nvidia-docker-plugin, but I think I need something like a step by step instruction on how to use it.
Regards,
Stefan

question

Source

spalkovits

Most helpful comment

Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:

RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

3XX0 on 19 Jan 2016

👍4

All 6 comments

I'm not sure I understood your problem correctly.
Where is sshd living? in the host or in the container? Are you using NV_HOST?
Can you give use the list of commands you issued with their respective output, it would help us reproduce the error.

3XX0 on 18 Jan 2016

Hello,
I did the following:

Prerequisites:

Docker is installed properly on my Ubuntu 14.04 machine, the "Hello World" Container works like expected
The Nvidia driver on the host machine is working properly. I did it after the instruction on http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html and everything works fine
nvidia-docker is installed properly after the instructions on https://github.com/NVIDIA/nvidia-docker. Everything works fine. When I make the example and run the "nvidia-smi" example I get the expected output.
. The nvidia-docker-plugin is installed and working. When I "sudo nvidia-docker-plugin -l :3476" and on the other hand do a "curl localhost:3476/v1.0/gpu/info" I get the desired output.

Finally my problem:

I create a docker container with a Dockerfile. I start with a "FROM cuda" and add the rest of the Dockerfile from https://docs.docker.com/engine/examples/running_ssh_service/

It looks the like this:

FROM cuda
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:screencast' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i    /etc/pam.d/sshd

ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile

EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

I changed the password but that should not be an issue. The I build the container with docker with "docker build -t image_name_goes_here".

When I start the container interactively with "nvidia-docker run -it --name name_goes_here -p 10022:22 image_goes_here /bin/bash" I can use "nvidia-smi -q" to get the desired output.

BUT when I ssh into the same running container even a "which nvidia-smi" fails though it is in the right place.

Any ideas what I missed to get the desired behavior? I what the ssh-container solution because I do not want every user to work on the host machine though I know I does not completely fulfill the docker philosophy.

Regards,

Stefan

spalkovits on 19 Jan 2016

Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:

RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

3XX0 on 19 Jan 2016

👍4

Indeed that solved it. Thank you very much.

May I add another two questions then:

What exactly is the nvidia-docker-plugin then meant for? I think I got then something wrong from the wiki. Especially the part from "Running it remotely" from https://github.com/NVIDIA/nvidia-docker/wiki/Using-nvidia-docker
Can I run two containers with nvidia-docker while only having one GPU?

I hope my questions are not too abstract.

Regards,

Stefan

spalkovits on 19 Jan 2016

The documentation of nvidia-docker and nvidia-docker-plugin explains it. The plugin is needed if you want to deploy NVIDIA Docker on a remote host (say AWS) or if you don't want to setup your volumes manually.
You can, however your GPU processes will have to share the GPU. You can use NVIDIA MPS for that purpose.

3XX0 on 19 Jan 2016

Thanks a lot. I think I can go on with your information.

Regards,

Stefan

spalkovits on 20 Jan 2016

Was this page helpful?

0 / 5 - 0 ratings