Chapel: Poor default MPI mapping for multi locale programs

Created on 7 Sep 2018  路  10Comments  路  Source: chapel-lang/chapel

Summary of Problem

When running a multi locale Chapel program on a InfiniBand cluster, the default mapping through GASNet is poor because processes are placed sequentially on cores available.
A better approach might be utilizing as many nodes as possible, and balancing the load.

The problem can be confirmed by inspecting the verbose output of the launcher.
Even when there are many nodes available, the program are executing on the same node.

My current workaround is to set env var MPIRUN_CMD to mpirun --bind-to none --map-by ppr:1:node -np %N %P %A

Steps to Reproduce

Source Code:
https://github.com/chapel-lang/chapel/blob/master/test/studies/prk/Transpose/transpose.chpl

Compile command:
chpl --fast transpose.chpl

Execution command:
chpl -nl 4 --set iterations=10 --set order=2000 --set tileSize=64

Configuration Information

Chapel 1.17.1 (bae434820cd5811ed37c184faa9ba5bab6356b11) with the following configuration

export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv
Runtime Bug Performance user issue

All 10 comments

The problem can be confirmed by inspecting the verbose output of the launcher.

Can you capture a snapshot of what this output looks like on this issue?

Also, I assume there's nothing special about transpose here... that for https://github.com/chapel-lang/chapel/blob/master/test/release/examples/hello6-taskpar-dist.chpl, say, you'd see all the messages report the same node name?

Yeah, it's not related to transpose. I discovered the issue while benchmarking transpose.

There are 4 nodes

With workaround, the following PBS script is used

#!/bin/bash
#PBS -l mem=64GB
#PBS -q express
#PBS -l walltime=1:00
#PBS -l ncpus=64
export MPIRUN_CMD="mpirun --bind-to none --map-by ppr:1:node -np %N %P %A"
export CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL
export LD_LIBRARY_PATH=/opt/mellanox/fca/lib/:$LD_LIBRARY_PATH
export GASNET_PHYSMEM_MAX=1G
export GASNET_PHYSMEM_NOPROBE=1
module load openmpi/3.0.1
./hello6-taskpar-dist -nl 4

Output:

Hello, world! (from locale 0 of 4 named r44)
Hello, world! (from locale 1 of 4 named r45)
Hello, world! (from locale 2 of 4 named r49)
Hello, world! (from locale 3 of 4 named r55)

Without workaround, the following PBS script is used

#!/bin/bash
#PBS -l mem=64GB
#PBS -q express
#PBS -l walltime=1:00
#PBS -l ncpus=64
export CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL
export LD_LIBRARY_PATH=/opt/mellanox/fca/lib/:$LD_LIBRARY_PATH
export GASNET_PHYSMEM_MAX=1G
export GASNET_PHYSMEM_NOPROBE=1
module load openmpi/3.0.1
./hello6-taskpar-dist -nl 4

Output:

Hello, world! (from locale 3 of 4 named r127)
Hello, world! (from locale 2 of 4 named r127)
Hello, world! (from locale 1 of 4 named r127)
Hello, world! (from locale 0 of 4 named r127)

Perfect, thank you.

I think the root cause is the same as in issue #11356.

Here is my understanding of the situation:

  • mpirun does not use standard arguments and the arguments can differ between OpenMPI and MPICH. For example --bind-to none is an OpenMPI option.
  • mpirun defaults to creating many processes per node (presumably this is a reasonable default for MPI programs)
  • MPIRUN_CMD can be used to solve the problem but isn't documented in Chapel docs
  • Chapel docs likewise are missing a Networks - using the MPI conduit document

Some possibilities:

  • we could warn/error if the MPI launcher is used and MPIRUN_CMD is not set
  • we could adjust the launcher to detect OpenMPI vs MPICH and then set mpirun appropriately

Note that the mpi conduit was being used in the other issue, and the ibv conduit is being used in this one. GASNet still defaults to spawning with mpirun for ibv (and other configs) though, so wherever we make a change we need it to apply for all launchers/configs where mpirun might be used.

FYI I can reproduce this on one of our slurm clusters with Open MPI:

export CHPL_COMM=gasnet
export CHPL_COMM_SUBSTRATE=ibv
export GASNET_PHYSMEM_MAX=16G
export GASNET_PHYSMEM_NOPROBE=1

salloc --nodelist=chapcs10,chapcs11

chpl examples/hello6-taskpar-dist.chpl

./hello6-taskpar-dist -nl 2
> Hello, world! (from locale 0 of 2 named chapcs11)
> Hello, world! (from locale 1 of 2 named chapcs11)

export MPIRUN_CMD="mpirun --bind-to none --map-by ppr:1:node -np %N %P %A"
./hello6-taskpar-dist -nl 2
> Hello, world! (from locale 0 of 2 named chapcs11)
> Hello, world! (from locale 1 of 2 named chapcs10)

@caizixian I don't have access to a pbs system to try this on, but for slurm if I add --ntasks-per-node=1 to my salloc commands then I get the correct mapping. Could you try adding #PBS -l place=scatter to your pbs script and seeing if that eliminates the need to set MPIRUN_CMD?

Could you also send the output of qsub --version?

@ronawho unfortunately, it doesn't solve the problem.

$ qsub --version
pbs_version = 14.2.5.20180202014445

Bummer, that's for checking so quickly.

I'm working on a patch on our end that should help, but I need to check with the GASNet team to see how portable it is. It basically just adds -N <numLocales> to the gasnetrun_ibv command we call which should specify how many nodes to run on (we're already throwing -n <numLocales>, which specifies how many processes to launch so throwing both should result in a process per node.) However, it's listed as not supported by all mpiruns so I need to see how portable it is and what happens if it's not supported. Patch is below if you have a chance to try it out:

__[edit]__ This patch will no longer apply to master, but we should have a fix coming in the next day or 2. https://github.com/chapel-lang/chapel/issues/11533 has more details

diff --git a/runtime/src/launch/gasnetrun_ibv/launch-gasnetrun_ibv.c b/runtime/src/launch/gasnetrun_ibv/launch-gasnetrun_ibv.c
index 46f38440af..42ae75ae97 100644
--- a/runtime/src/launch/gasnetrun_ibv/launch-gasnetrun_ibv.c
+++ b/runtime/src/launch/gasnetrun_ibv/launch-gasnetrun_ibv.c
@@ -34,15 +34,17 @@ static char _nlbuf[16];
 static char** chpl_launch_create_argv(const char *launch_cmd,
                                       int argc, char* argv[],
                                       int32_t numLocales) {
-  const int largc = 5;
+  const int largc = 7;
   char *largv[largc];

   largv[0] = (char *) launch_cmd;
   largv[1] = (char *) "-n";
   sprintf(_nlbuf, "%d", numLocales);
   largv[2] = _nlbuf;
-  largv[3] = (char*) "-E";
-  largv[4] = chpl_get_enviro_keys(',');
+  largv[3] = (char *) "-N";
+  largv[4] = _nlbuf;
+  largv[5] = (char*) "-E";
+  largv[6] = chpl_get_enviro_keys(',');

   return chpl_bundle_exec_args(argc, argv, largc, largv);
 }

@caizixian this should be resolved on master as of https://github.com/chapel-lang/chapel/pull/11546. Sorry for the long turnaround on this.

Was this page helpful?
0 / 5 - 0 ratings