Chapel example program, modules/packages/mpi/spmd/hello-chapel/hello.chpl fails with MPI errors.
use MPI;
var rank = commRank(CHPL_COMM_WORLD),
size = commSize(CHPL_COMM_WORLD);
for irank in 0.. #size {
if irank == rank then
writef("Hello, Chapel! This is MPI rank=%i of size=%i, on locale.id=%i\n",rank, size, here.id);
C_MPI.MPI_Barrier(CHPL_COMM_WORLD);
}
Compile command:
chpl -o hello+mpi --cc-warnings modules/packages/mpi/spmd/hello-chapel/hello.chpl
Execution command:
./hello+mpi --spmd 2
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 10.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Unable to get a high enough MPI thread support
Unable to get a high enough MPI thread support
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 1817 on
node ip-172-31-41-249 exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[ip-172-31-41-249:01816] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[ip-172-31-41-249:01816] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
This was run on a t2.xlarge type EC2 machine built from a SLES 12.2 ami.
PATH=$CHPL_HOME/bin/linux64:$CHPL_HOME/util:/usr/lib64/mpi/gcc/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib/mit/bin
LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64
CHPL_HOME=
CHPL_HOST_PLATFORM=linux64
CHPL_LAUNCHER=mpirun
CHPL_TARGET_ARCH=none
CHPL_TARGET_COMPILER=mpi-gnu
CHPL_TASKS=fifo
CHPL_TEST_NOMAIL=1
chpl --version:chpl Version 1.16.0 pre-release (eacf087c5a)
Copyright (c) 2004-2017, Cray Inc. (See LICENSE file for more details)
$CHPL_HOME/util/printchplenv --anonymize:CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: mpi-gnu *
CHPL_TARGET_ARCH: none *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: fifo *
CHPL_LAUNCHER: mpirun *
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: gmake
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: none
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
Compilation command: chpl -o hello+mpi --cc-warnings modules/packages/mpi/spmd/hello-chapel/hello.chpl
Chapel compiler version: 1.16.0 pre-release (eacf087c5a)
Chapel environment:
CHPL_HOME:
CHPL_ATOMICS: intrinsics
CHPL_AUX_FILESYS: none
CHPL_COMM: none
CHPL_COMM_SUBSTRATE: none
CHPL_GASNET_SEGMENT: none
CHPL_GMP: gmp
CHPL_HOST_COMPILER: gnu
CHPL_HOST_PLATFORM: linux64
CHPL_HWLOC: none
CHPL_JEMALLOC: jemalloc
CHPL_LAUNCHER: mpirun
CHPL_LLVM: none
CHPL_LOCALE_MODEL: flat
CHPL_MAKE: gmake
CHPL_MEM: jemalloc
CHPL_NETWORK_ATOMICS: none
CHPL_REGEXP: re2
CHPL_TARGET_ARCH: none
CHPL_TARGET_COMPILER: mpi-gnu
CHPL_TARGET_PLATFORM: linux64
CHPL_TASKS: fifo
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_WIDE_POINTERS: struct
gcc --version or clang --version:gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(1/5) Installing: mpi-selector-1.0.3-3.1.1.noarch ..............................................................................................[done]
(2/5) Installing: libstdc++47-devel-4.7.1_20120723-1.1.1.x86_64 ................................................................................[done]
(3/5) Installing: libstdc++-devel-4.7-2.1.1.x86_64 .............................................................................................[done]
(4/5) Installing: openmpi-1.5.4-4.1.4.x86_64 ...................................................................................................[done]
(5/5) Installing: openmpi-devel-1.5.4-4.1.4.x86_64 .............................................................................................[done]
I think your problem stems from using OpenMPI instead of MPICH. To date, we only test against MPICH. However, I don't think this is sufficiently documented in the MPI docs.
I'm not sure what the challenges are in supporting OpenMPI. Maybe we should add it to the list in #5722 though.
@npadmana - do you have any thoughts on this?
The underlying issue is that we currently require being able to make concurrent MPI calls in multiple threads/tasks. You do this by requesting MPI_THREAD_MULTIPLE at initialization time; the code aborts (as it did in this case) if the MPI implementation does not support this.
OpenMPI used to have issues with this mode, which is why I've been using MPICH even locally. I believe this is now fixed in OpenMPI v2.x.x, but I believe this support doesn't get built by default. Reading the docs, it seems like you need --enable-mpi-thread-multiple on configuration.
We probably should test that this works, and if it does, update the documentation to mention all of this.
A related to-do item is to maybe relax the requirements and put the onus on the user not to make concurrent MPI calls. Except for gasnet+mpi (and possibly IB), that should be possible.
@awallace-cray -- just FYI, there is an "undocumented" feature in the MPI module where if you run with
--requireThreadedMPI=false
it will disable that requirement. It's been lightly tested, but would be great if you could see if that works for you. Certainly, in SPMD mode, that should be fine.
I don't recall how I got the idea OpenMPI supported the Chapel pkg. After Ben's comment I made an attempt to replace OpenMPI w mpich.
Mpich was harder to install on that system, or else something was poisoned by the earlier OpenMPI; Chapel could never find mpicc.
Yes, I've hit that issue before. I think OpenMPI leaves some wrappers around.....
I do think we should check whether the newer versions of OpenMPI work or not..... @ben-albrecht -- do you want to add to our todo list?
FWIW I've successfully run Chapel 1.16 with OpenMPI 3.0.0 and the following settings:
CHPL_COMM=gasnet
CHPL_COMM_SUBSTRATE=ibv
CHPL_LAUNCHER=gasnetrun_ibv
GASNET_PHYSMEM_MAX=1G
GASNET_IBV_SPAWNER=mpi
As @awallace-cray found, the same system+settings didn't work with OpenMPI 1.6.3.
Thanks for the info @milthorpe! For clarity, did you build with the --enable-mpi-thread-multiple flag as @npadmana suggested earlier, or is this not necessary for OpenMPI 3?
@awallace-cray - if we can confirm this and add it to nightly testing, we can update the docs to specify that we support the newer versions of OpenMPI (perhaps OpenMPI 2 works as well).
I didn't build OpenMPI, so I can't say whether it was necessary to explicitly pass --enable-mpi-thread-multiple; but it does report threading support:
bash-4.1$ mpirun -version
mpirun (Open MPI) 3.0.0
...
bash-4.1$ ompi_info | grep -i thread
Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
FT Checkpoint support: no (checkpoint thread: no)
Most helpful comment
The underlying issue is that we currently require being able to make concurrent MPI calls in multiple threads/tasks. You do this by requesting
MPI_THREAD_MULTIPLEat initialization time; the code aborts (as it did in this case) if the MPI implementation does not support this.OpenMPI used to have issues with this mode, which is why I've been using MPICH even locally. I believe this is now fixed in OpenMPI v2.x.x, but I believe this support doesn't get built by default. Reading the docs, it seems like you need
--enable-mpi-thread-multipleon configuration.We probably should test that this works, and if it does, update the documentation to mention all of this.