Singularity: --nv option broken with 384?

Created on 11 Sep 2017  路  14Comments  路  Source: hpcng/singularity

Version of Singularity:

2.3.1-dist

Expected behavior

nvidai-smi output similar to host:

ubuntu@runc:~$ nvidia-smi
Mon Sep 11 03:58:22 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.66                 Driver Version: 384.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:00:06.0 Off |                    0 |
| N/A   27C    P0    25W / 250W |      0MiB / 16152MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Actual behavior

nvidia-smi not found.

Steps to reproduce behavior

Install singularity with apt package as described in docs. Run the follwing:

ubuntu@runc:~$ singularity --version
2.3.1-dist
ubuntu@runc:~$ singularity exec --nv docker://nvidia/cuda:8.0-runtime nvidia-smi
Docker image path: index.docker.io/nvidia/cuda:8.0-runtime
Cache folder set to /home/ubuntu/.singularity/docker
Creating container runtime...
/.singularity.d/actions/exec: 8: exec: nvidia-smi: not found

Similar to #785

Most helpful comment

...or maybe I have secret additional thumbs!!!

All 14 comments

@GodloveD - Any luck with the nvidia-container-cli list option?

ubuntu@runc:~$ nvidia-container-cli list -cu
/usr/lib/nvidia-384/bin/nvidia-smi
/usr/lib/nvidia-384/bin/nvidia-debugdump
/usr/lib/nvidia-384/bin/nvidia-persistenced
/usr/lib/nvidia-384/bin/nvidia-cuda-mps-control
/usr/lib/nvidia-384/bin/nvidia-cuda-mps-server
/usr/lib/nvidia-384/libnvidia-ml.so.384.66
/usr/lib/nvidia-384/libnvidia-cfg.so.384.66
/usr/lib/x86_64-linux-gnu/libcuda.so.384.66
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.384.66
/usr/lib/nvidia-384/libnvidia-ptxjitcompiler.so.384.66
/usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.66
/usr/lib/nvidia-384/libnvidia-compiler.so.384.66
/run/nvidia-persistenced/socket

It appears you don't correctly find the bin path. The following is the current workaround.

ubuntu@runc:~$ export SINGULARITY_BINDPATH=/usr/lib/nvidia-384/bin:/usr/local/nvidia/bin
ubuntu@runc:~$ singularity exec --nv docker://nvidia/cuda:8.0-runtime nvidia-smi

Hi @ryanolson. Glad to see that you have a workaround. Singularity searches the $PATH for nvidia-smi at runtime. But the $PATH variable is also sanitized beforehand so that it does not include custom directories. The upshot is that Singularity cannot find nvidia-smi if it is installed in a non-standard location.

I've looked a bit at nvidia-container-cli but I've not had time to try to integrate it into the code yet. I've noticed that nvidia-container-cli is also unable to locate nvidia-smi if it is not on your $PATH, so if we placed it in the same location as our current call we would have the same problem.

It might be useful to try something like the following:
1) check to see if --nv option is provided before path is sanitized
2) check to see if nvidia-container-cli is installed and if it provides reasonable output (all necessary kernel modules are installed and loaded)
3) if it does, use its output as a list of things to bind into the container
4) if it doesn't fall back to to searching ld.so.cache and $PATH for known libs and binaries like we are doing now

I'll need to check with @gmkurtzer about the security implications of this approach but I think it should be fine.

That seems reasonably straightforward. Especially since --nv is parsed and processed via the bash script at: https://github.com/singularityware/singularity/blob/master/libexec/cli/action_argparser.sh#L117

@GodloveD - do you provide a way to specify default cli arguments? On a full GPU cluster, it would be nice to set --nv as a default.

We don't currently provide a way to make a cli arg default. I could add it to the to-do list, but the list is getting rather long. 馃槣

aw poor @GodloveD, and here I am twiddling thumbs, lol.

Wow @vsoch! Everybody knows that you are much smarter than I am so it is easier for you to do things. lol

...or maybe I have secret additional thumbs!!!

The script below will replace the grep list at:
https://github.com/singularityware/singularity/blob/master/libexec/cli/action_argparser.sh#L119

#!/usr/bin/env python3
import os
import re
import subprocess

try:
    nvcr_list = subprocess.run("nvidia-container-cli list -cuv", shell=True, check=True, stdout=subprocess.PIPE)
    nvcr_list = nvcr_list.stdout.decode("utf-8").rstrip().split('\n')

    # find nvidia-libraries path
    nvidia_lib_paths = [item for item in nvcr_list if '.so.' in item]
    nvidia_lib_names = ["/" + os.path.basename(item).split('.so.')[0] for item in nvidia_lib_paths]

    print("|".join(nvidia_lib_names))
except:
    pass

The script below will replace the which nvidia-smi line at:
https://github.com/singularityware/singularity/blob/master/libexec/cli/action_argparser.sh#L134

#!/usr/bin/env python3
import os
import re
import subprocess

try:
    nvcr_list = subprocess.run("nvidia-container-cli list -cuv", shell=True, check=True, stdout=subprocess.PIPE)
    nvcr_list = nvcr_list.stdout.decode("utf-8").rstrip().split('\n')
    nvidia_smi = [item for item in nvcr_list if 'nvidia-smi' in item][0]
    print(os.path.split(nvidia_smi)[0])
except:
    pass

@vsoch - it seems like you own most of the python components. if we were to stick these scripts into the python tree to be called from the cli bash script, where would be the best place?

I would have preference for keeping these flows relatively separate - we do use python for writing/reading json (that aren't easy with shell) but for a python script to be called in the middle of every argparse, -1 from me. We can't use python 3 for one, and having subprocess makes me uneasy, as does a try except without specific catches. I also think @gmkurtzer is trying to minimize use of python (only for external api stuffs really) so I would advocate for a solution that is based in bash/sh.

As @vsoch mentioned, I have no issue with Python, but I do have a concern with it in the main program flow, because Singularity must be able to run on minimal installations that do not have Python present.

Hey @ryanolson have a look at https://github.com/singularityware/singularity/pull/1082 when you get a chance and let me know if it addresses the features that you requested. Thanks! 馃樅

Hey @ryanolson. #1082 just got merged. I'm going to close this for now, but if that PR doesn't fully address this issue feel free to re-open or open a new one. Thanks!

Was this page helpful?
0 / 5 - 0 ratings