I am running HPL with Infiniband IBverbs with Singularity container, but I found it is not easy to use. I didn't find any guide on how to use Infiniband with Singularity. Could you give any instructions or guide ? Thanks.
2.3.1
Easy to use Infiniband IBverbs with Singularity.
I tried two ways to use Infiniband:
-B /usr -B /lib -B /etc -B /sysInstallation finished successfully.Preparing... ################################# [100%]Updating / installing...1:mlnx-fw-updater-3.3-1.0.0.0 ################################# [100%]Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf
Attempting to perform Firmware update...
The firmware for this device is not distributed inside Mellanox driver: 06:00.0 (PSID: DEL2180110032)
To obtain firmware for this device, please contact your HW vendor.
Failed to update Firmware.
See /tmp/MLNX_OFED_LINUX-3.3-1.0.4.0.16950.logs/fw_update.log
To load the new driver, run:
/etc/init.d/openibd restart
Then I execute the last command "/etc/init.d/openibd restart", but one time it had the following error:
Unloading HCA driver: [ OK ]
Loading HCA driver and Access Layer: [ OK ]
sed: couldn't open temporary file /etc/modprobe.d/sedFYTy2E: Read-only file system
and other times it was just hanging forever.
Both the container and the host are RHEL 7.2.
I have a similar issue on our Cray's that are running Cray MPICH as it can't be installed in the container. I ended up writing a small utility library, https://github.com/olcf/dl-intercept , which uses the runtime loaders LD_AUDIT feature to substitute libraries at runtime(our substitutions look like this currently: https://github.com/olcf/SingularityTools/blob/master/Titan/rtld.sub).
The general workflow is users bootstrap using the distro provided MPICH as they normally would and then at runtime the container provided MPICH libraries are switched out for the Cray library equivalents(which are bind mounted in). This works with OpenMPI as well although the OpenMPI ABI seems less stable and so you have to be careful with version compatibility. I like this solution as it can be controlled from the environment(no destructive changes to the container are needed) and works even if RPATHs have been set on the executables.
This is designed to be a center wide solution covering many use cases and so may be heavier weight than you need for a more focused purpose.
Hi @AdamSimpson, thanks for your comment. I think the difficult part to use Infiniband is OFED driver and IBverbs library. @vsoch and @gmkurtzer , any suggestions on how to use Infiniband? If we reuse the OFED driver and IBverbs library on the host, then if the container OS and host OS are different, then I doubt it will work. If we install them inside the container, then there is error at least in container with RHEL 7.2. I didn't find any online document about how to use Infiniband. Any help is appreciated.
Just to be clear on our infinibad systems I bind mount in the appropriate libraries and use the linked dl-intercept untility to make sure the host system libmpi.so libibverbs.so libraries are used by the container. Any actual driver needs to stay on the host as it does with CUDA. I don't have any issues passing in the appropriate libraries from our RHEL system to ubuntu containers, you just have to pay close attention to which libraries are used by the container.
Hi @AdamSimpson, I just tried to use Ubuntu container on RHEL host and bind IB related paths "-B /usr -B /etc", but I have the following errors:
nvcc: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
mpirun: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
ompi_info: relocation error: /usr/lib64/libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
This is because the libibverbs.so is in /usr/lib64, so I have to bind /usr. But GCC library lib.so is also in /usr/lib64 and the GCC version is 5.3.1 in ubuntu container and 4.8.5 in RHEL host, so there is conflict here. It seems it's not easy to use Infiniband with Singularity in a portable way. Did you have these errors before?
You definitely don't want to just bind all of /usr and /etc into the container. In our case we use a somewhat specialized OpenMPI install but it should be pretty similar to most installs baring a few directory name differences. libmpi.so and the MCA components are scattered about two directories:
/mpi_install/lib and /mpi_install/lib/openmpi.
I bind mount these into the container and prepend them both to the container LD_LIBRARY_PATH to make sure they get picked up before whatever is in the base container. I use dl-intercept to make sure even in the case of an RPATH executable the libmpi.so from the host is used.
On our system libibverbs.so is in /lib64 and has several dependency libraries that are needed but generally not in the container. I bind mount /lib64 to /host_lib64 and append /host_lib64 to the containers LD_LIBRARY_PATH since I only want the libraries used if they don't exist in the container. I do want the host libibverbs.so to be used from the host so I include it in my dl-intercept config.
Hi. Our site uses the following technique to use Infiniband with Singularity.
Add the following descriptions in “${singularity/install/path}/etc/singularity/init" to show IB-related libraries on host.
for i in `ldconfig -p | grep -E "/libib|/libgpfs|/libnuma|/libmlx|/libnl"`; do
if [ -f "$i" ]; then
message 2 "Found a library: $i\n"
if [ -z "${SINGULARITY_CONTAINLIBS:-}" ]; then
SINGULARITY_CONTAINLIBS="$i"
else
SINGULARITY_CONTAINLIBS="$SINGULARITY_CONTAINLIBS,$i"
fi
fi
done
if [ -z "${SINGULARITY_CONTAINLIBS:-}" ]; then
message WARN "Could not find any IB-related libraries on this host!\n";
else
export SINGULARITY_CONTAINLIBS
fi
Install IB-related library such as libibverbs-dev using apt in ubuntu16.04 into the container.
Then, build and install MPI with IB-related options. In openmpi, we use --with-verbs option.
./configure --prefix=/opt/openmpi/${OPENMPI_VERSION} \
--enable-orterun-prefix-by-default \
--enable-mpirun-prefix-by-default \
--enable-static \
--enable-shared \
--with-verbs \
--with-cuda && \
make
install
Execute the mpi program compiled with the installed mpi with the container.
We need to bind mount /etc/libibverbs.d . We can also add the -B description in “${singularity/install/path}/etc/singularity/singulairty.conf".
mpirun -np 4 singularity exec -B /etc/libibverbs.d container.img ./a.out
It seems working on our site.
Check out our approach at https://github.com/CHPC-UofU/Singularity-ubuntu-mpi
Essentially we install the Ubuntu Mellanox IB stack, which goes to /usr/lib/libibverbs in the container, and set LD_LIBRARY_PATH to it.
Then you can either build your own MPI distro in the container (MPICH or derivatives, or OpenMPI) pointing to that IB stack.
Or, more simply in our case, use the existing MPI builds from our host OS, which are fairly OS-oblivious, e.g. Intel MPI, in the container, and then using the same mpirun outside the container to run the container binary.
Things get trickier if OS stock MPI is used, e.g. in https://github.com/CHPC-UofU/Singularity-meep-mpi, where we install meep-mpich2 package which depends on OS built mpich2. This mpich2 is only build with TCP so we need to adjust LD_LIBRARY_PATH to load libmpich.so from MVAPICH2 build - which happens to be again one that we built on the host, but, it could be one that you build in your container as well.
The nice thing about MPICH based MPI distros is that they are ABI compatible, so you can e.g. run MPICH built binary with MVAPICH2 (or IntelMPI).
HTH
Thanks for putting this together, @mcuma. I was quite disappointed when I came to understand how non-trivial it is to get Singularity containers working with IB. The documentation talks about how PMIx is used to facilitate MPI communication outside the container but no mention about the fact that you can't actually use MPI over IB without jumping through a bunch of hoops.
Singularity recognized this issue with GPU support and has made that very easy now. Hopefully (hint @gmkurtzer) they will tackle IB next.
On a related note: will your solution only work with Mellanox IB? We have some systems with Intel OmniPath.
@rgoldino, wrt. OmniPath - we don't have one here so I can't say for sure, but, in principle it should be similar to the Mellanox IB stack.
Based on https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_Fabric_Software_IG_H76467_v1.0.pdf, I'd install the prerequisite OS packages (this document just lists RHEL and SLES, so, some googling for equivalent Ubuntu packages may be necessary), then install the IntelOPA-Basic.DISTRO.VERSION.tgz. Looks like you could set this up to be unattended with appropriate options as well.
Great stuff! These approaches seem viable enough to at least be referenced in the Singularity Readme?
I had significant trouble working with IB and MPI on a Mellanox software stack. Here are my steps for getting it to work on Ubuntu 16.04:
dpkg -l | grep mlnx, which reads '4.0-1.0.1.0' (mlnx-fw-updater usually has the right version)/etc/infiniband/infoompi_info --parsable --all | grep config:cliUnfortunately, this is not good for reproducible research, since containers on one HPC system won't be able to transfer to another easily. I also suggest a --nv like option for MPI and OFED, if possible.
I'm not very familiar with this but would it be possible to proxy MPI calls within the container to a process running outside the container that can use the host's libraries and drivers to speak Infiniband?
@gvallee
This looks like some good info to include with our MPI Doc, what do you think?
Going to close this - there is a roadmap item that's related #5832
Most helpful comment
I had significant trouble working with IB and MPI on a Mellanox software stack. Here are my steps for getting it to work on Ubuntu 16.04:
dpkg -l | grep mlnx, which reads '4.0-1.0.1.0' (mlnx-fw-updater usually has the right version)/etc/infiniband/infoompi_info --parsable --all | grep config:cliUnfortunately, this is not good for reproducible research, since containers on one HPC system won't be able to transfer to another easily. I also suggest a
--nvlike option for MPI and OFED, if possible.