Trilinos: Teuchos::Comm has unexpected behavior on waterman

Created on 25 Aug 2018  路  4Comments  路  Source: trilinos/Trilinos

@trilinos/teuchos @jwillenbring @bartlettroscoe

Next Action Status

PR #3356 merged on 9/1/2018 that added stronger Teuchos::Comm unit tests to assert correct behavior from MPI. Next: Watch for new passing TeuchosComm tests on 'waterman' on CDash starting 9/2/2018 ...

Expectations


Teuchos::Comm createSubcommunicator and duplicate should behave the same on waterman as on other platforms.

Current Behavior

Regarding Teuchos::Comm::createSubcommunicator:
In topic branch fix3331,
Test teuchos/comm/test/Comm/teuchosSubcommTest.cpp
runs correctly on most platforms but times out on waterman.
The test is named TeuchosComm_teuchosSubcommTest.

Regarding Teuchos::Comm::duplicate:
In topic branch fix3331,
Test teuchos/comm/test/Comm/waterman_teuchoscomm.cpp
runs correctly on most platforms but fails on waterman when Teuchos::Comm::duplicate is called.
This failing test is named TeuchosComm_teuchoscomm_with_comm_duplicate.
The test passes when the call to Teuchos::Comm::duplicate is skipped.
This passing test is named TeuchosComm_teuchoscomm_without_comm_duplicate.

An equivalent version of the test that calls MPI directly rather than using Teuchos::Comm is in
teuchos/comm/test/Comm/waterman_mpi.cpp
This test passes without problem, whether or not the communicator is duplicated.

Motivation and Context


The test failures in #3331 are actually caused by these problems, not by errors in Zoltan2.

Definition of Done

Possible Solution

Steps to Reproduce


Use topic branch fix3331 to get the tests described above.
Build using the ATDM configuration on waterman as described in #3331, but build Teuchos instead of Zoltan2.

Your Environment

  • Relevant repo SHA1s:
  • Relevant configure flags or configure script:
  • Operating system and version:
  • Compiler and TPL versions:

Related Issues

  • Blocks #3331
  • Is blocked by
  • Follows
  • Precedes
  • Related to
  • Part of
  • Composed of

Additional Information

Framework ATDM Teuchos in review

Most helpful comment

Shown here these tests are passing now on waterman. Closing this issue

All 4 comments

Please see my comment here for a suggested work-around. @kddevin, if you have a chance to try that change on waterman, could you please?

FYI: @nmhamster informed us today that OpenMPI 3.1.0 does not work correctly on 'waterman' and we should be using the build env with OpenMPI '2.1.2' instead. I will give that env a try. Perhaps it will fix this issue.

FYI: PR #3356 was just merged. Therefore, I am putting this Issue "in review" and we will wait for confirmation on CDash that these tests pass on 'waterman' (which they should l from testing done as part of #3356).

Shown here these tests are passing now on waterman. Closing this issue

Was this page helpful?
0 / 5 - 0 ratings