Hello,
I have a machine with a proper CUDA and Docker installation. When I start an interactive container and for example do an nvidia-sim -l everything looks fine. However when I add an ssh server that in the future other users can also use CUDA (without knowing about Docker) the same container fails when I do an nvidia-sim, although the binary is there.
I read about the nvidia-docker-plugin, but I think I need something like a step by step instruction on how to use it.
Regards,
Stefan
I'm not sure I understood your problem correctly.
Where is sshd living? in the host or in the container? Are you using NV_HOST?
Can you give use the list of commands you issued with their respective output, it would help us reproduce the error.
Hello,
I did the following:
Prerequisites:
Finally my problem:
It looks the like this:
FROM cuda
RUN apt-get update && apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN echo 'root:screencast' | chpasswd
RUN sed -i 's/PermitRootLogin without-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd
ENV NOTVISIBLE "in users profile"
RUN echo "export VISIBLE=now" >> /etc/profile
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
I changed the password but that should not be an issue. The I build the container with docker with "docker build -t image_name_goes_here".
When I start the container interactively with "nvidia-docker run -it --name name_goes_here -p 10022:22 image_goes_here /bin/bash" I can use "nvidia-smi -q" to get the desired output.
BUT when I ssh into the same running container even a "which nvidia-smi" fails though it is in the right place.
Any ideas what I missed to get the desired behavior? I what the ssh-container solution because I do not want every user to work on the host machine though I know I does not completely fulfill the docker philosophy.
Regards,
Stefan
Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your /etc/profile as shown in your example.
The following should do the trick:
RUN echo "export PATH=$PATH" >> /etc/profile && \
echo "ldconfig" >> /etc/profile
Indeed that solved it. Thank you very much.
May I add another two questions then:
I hope my questions are not too abstract.
Regards,
Stefan
Thanks a lot. I think I can go on with your information.
Regards,
Stefan
Most helpful comment
Your issue comes from the fact that the CUDA environment is not passed to the SSH session.
You need to export it in your
/etc/profileas shown in your example.The following should do the trick: