Chapel: Chapel example program for MPI module fails with OpenMPI

Created on 22 Jun 2017  路  9Comments  路  Source: chapel-lang/chapel

Summary of Problem

Chapel example program, modules/packages/mpi/spmd/hello-chapel/hello.chpl fails with MPI errors.

Steps to Reproduce

  • set env options for MPI
  • make
  • compile and run Chapel example program with --spmd option
  • see below for details

Source

use MPI;

var rank = commRank(CHPL_COMM_WORLD),
    size = commSize(CHPL_COMM_WORLD);

for irank in 0.. #size {
  if irank == rank then
    writef("Hello, Chapel! This is MPI rank=%i of size=%i, on locale.id=%i\n",rank, size, here.id);
  C_MPI.MPI_Barrier(CHPL_COMM_WORLD);
}

Compile command:
chpl -o hello+mpi --cc-warnings modules/packages/mpi/spmd/hello-chapel/hello.chpl

Execution command:
./hello+mpi --spmd 2

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 10.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Unable to get a high enough MPI thread support
Unable to get a high enough MPI thread support
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 1817 on
node ip-172-31-41-249 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[ip-172-31-41-249:01816] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[ip-172-31-41-249:01816] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Configuration Information

This was run on a t2.xlarge type EC2 machine built from a SLES 12.2 ami.

PATH=$CHPL_HOME/bin/linux64:$CHPL_HOME/util:/usr/lib64/mpi/gcc/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/usr/games:/usr/lib/mit/bin

LD_LIBRARY_PATH=/usr/lib64/mpi/gcc/openmpi/lib64

CHPL_HOME=
CHPL_HOST_PLATFORM=linux64
CHPL_LAUNCHER=mpirun
CHPL_TARGET_ARCH=none
CHPL_TARGET_COMPILER=mpi-gnu
CHPL_TASKS=fifo
CHPL_TEST_NOMAIL=1
  • Output of chpl --version:
chpl Version 1.16.0 pre-release (eacf087c5a)
Copyright (c) 2004-2017, Cray Inc.  (See LICENSE file for more details)
  • Output of $CHPL_HOME/util/printchplenv --anonymize:
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: mpi-gnu *
CHPL_TARGET_ARCH: none *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: fifo *
CHPL_LAUNCHER: mpirun *
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: gmake
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: none
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
  • ./hello+mpi -a
Compilation command: chpl -o hello+mpi --cc-warnings modules/packages/mpi/spmd/hello-chapel/hello.chpl 
Chapel compiler version: 1.16.0 pre-release (eacf087c5a)
Chapel environment:
  CHPL_HOME: 
  CHPL_ATOMICS: intrinsics
  CHPL_AUX_FILESYS: none
  CHPL_COMM: none
  CHPL_COMM_SUBSTRATE: none
  CHPL_GASNET_SEGMENT: none
  CHPL_GMP: gmp
  CHPL_HOST_COMPILER: gnu
  CHPL_HOST_PLATFORM: linux64
  CHPL_HWLOC: none
  CHPL_JEMALLOC: jemalloc
  CHPL_LAUNCHER: mpirun
  CHPL_LLVM: none
  CHPL_LOCALE_MODEL: flat
  CHPL_MAKE: gmake
  CHPL_MEM: jemalloc
  CHPL_NETWORK_ATOMICS: none
  CHPL_REGEXP: re2
  CHPL_TARGET_ARCH: none
  CHPL_TARGET_COMPILER: mpi-gnu
  CHPL_TARGET_PLATFORM: linux64
  CHPL_TASKS: fifo
  CHPL_TIMERS: generic
  CHPL_UNWIND: none
  CHPL_WIDE_POINTERS: struct
  • Back-end compiler and version, e.g. gcc --version or clang --version:

    • gcc --version

gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  • mpicc --version
gcc (SUSE Linux) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(1/5) Installing: mpi-selector-1.0.3-3.1.1.noarch ..............................................................................................[done]
(2/5) Installing: libstdc++47-devel-4.7.1_20120723-1.1.1.x86_64 ................................................................................[done]
(3/5) Installing: libstdc++-devel-4.7-2.1.1.x86_64 .............................................................................................[done]
(4/5) Installing: openmpi-1.5.4-4.1.4.x86_64 ...................................................................................................[done]
(5/5) Installing: openmpi-devel-1.5.4-4.1.4.x86_64 .............................................................................................[done]
Libraries / Modules Unimplemented Feature

Most helpful comment

The underlying issue is that we currently require being able to make concurrent MPI calls in multiple threads/tasks. You do this by requesting MPI_THREAD_MULTIPLE at initialization time; the code aborts (as it did in this case) if the MPI implementation does not support this.

OpenMPI used to have issues with this mode, which is why I've been using MPICH even locally. I believe this is now fixed in OpenMPI v2.x.x, but I believe this support doesn't get built by default. Reading the docs, it seems like you need --enable-mpi-thread-multiple on configuration.

We probably should test that this works, and if it does, update the documentation to mention all of this.

All 9 comments

I think your problem stems from using OpenMPI instead of MPICH. To date, we only test against MPICH. However, I don't think this is sufficiently documented in the MPI docs.

I'm not sure what the challenges are in supporting OpenMPI. Maybe we should add it to the list in #5722 though.

@npadmana - do you have any thoughts on this?

The underlying issue is that we currently require being able to make concurrent MPI calls in multiple threads/tasks. You do this by requesting MPI_THREAD_MULTIPLE at initialization time; the code aborts (as it did in this case) if the MPI implementation does not support this.

OpenMPI used to have issues with this mode, which is why I've been using MPICH even locally. I believe this is now fixed in OpenMPI v2.x.x, but I believe this support doesn't get built by default. Reading the docs, it seems like you need --enable-mpi-thread-multiple on configuration.

We probably should test that this works, and if it does, update the documentation to mention all of this.

A related to-do item is to maybe relax the requirements and put the onus on the user not to make concurrent MPI calls. Except for gasnet+mpi (and possibly IB), that should be possible.

@awallace-cray -- just FYI, there is an "undocumented" feature in the MPI module where if you run with
--requireThreadedMPI=false
it will disable that requirement. It's been lightly tested, but would be great if you could see if that works for you. Certainly, in SPMD mode, that should be fine.

I don't recall how I got the idea OpenMPI supported the Chapel pkg. After Ben's comment I made an attempt to replace OpenMPI w mpich.
Mpich was harder to install on that system, or else something was poisoned by the earlier OpenMPI; Chapel could never find mpicc.

Yes, I've hit that issue before. I think OpenMPI leaves some wrappers around.....

I do think we should check whether the newer versions of OpenMPI work or not..... @ben-albrecht -- do you want to add to our todo list?

FWIW I've successfully run Chapel 1.16 with OpenMPI 3.0.0 and the following settings:

CHPL_COMM=gasnet
CHPL_COMM_SUBSTRATE=ibv
CHPL_LAUNCHER=gasnetrun_ibv
GASNET_PHYSMEM_MAX=1G
GASNET_IBV_SPAWNER=mpi

As @awallace-cray found, the same system+settings didn't work with OpenMPI 1.6.3.

Thanks for the info @milthorpe! For clarity, did you build with the --enable-mpi-thread-multiple flag as @npadmana suggested earlier, or is this not necessary for OpenMPI 3?

@awallace-cray - if we can confirm this and add it to nightly testing, we can update the docs to specify that we support the newer versions of OpenMPI (perhaps OpenMPI 2 works as well).

I didn't build OpenMPI, so I can't say whether it was necessary to explicitly pass --enable-mpi-thread-multiple; but it does report threading support:

bash-4.1$ mpirun -version
mpirun (Open MPI) 3.0.0
...
bash-4.1$ ompi_info | grep -i thread
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)
   FT Checkpoint support: no (checkpoint thread: no)
Was this page helpful?
0 / 5 - 0 ratings