Trilinos: TpetraCore tests failing in ATDM sems-rhel7+cuda+complex build

Created on 2 Apr 2019  ·  3Comments  ·  Source: trilinos/Trilinos

CC: @trilinos/tpetra, @kddevin (Trilinos Data Services Product Lead), @bartlettroscoe, @fryeguy52






Next Action Status

Description

As shown in this query the build:

  • Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug

has several failing TpetraCore tests. The following are failing everyday (CDash)

  • TpetraCore_gemm_m_eq_1_MPI_1
  • TpetraCore_gemm_m_eq_1_MPI_2
  • TpetraCore_gemm_m_eq_1_MPI_5
  • TpetraCore_gemm_m_eq_1_MPI_13

While these have failed randomly over the last couple weeks (CDash):

  • TpetraCore_MultiVector_MicroBenchmark_MPI_1
  • TpetraCore_Map_Bug5822_2_MPI_2
  • TpetraCore_getEntryOnHost_MPI_1
  • TpetraCore_gemv_MPI_1 
  • TpetraCore_deep_copy_MultiVector_to_SerialDenseMatrix_MPI_1
  • TpetraCore_createMirrorView_MPI_1

several more tests have failed over the last 2 weeks in similar complex shared build shown here:

  • Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug

Those tests are:

  • TpetraCore_MatrixMatrix_UnitTests_MPI_4
  • TpetraCore_Issue_607_MPI_4
  • TpetraCore_Issue_114_MPI_4
  • TpetraCore_Issue601_MPI_4
  • TpetraCore_ImportExport2_UnitTests_MPI_4
  • TpetraCore_ImportBug5430_MPI_4
  • TpetraCore_Import_Union_MPI_4
  • TpetraCore_CrsMatrix_NonlocalSumInto_Ignore_MPI_4
  • TpetraCore_CrsMatrix_gaussSeidel_MPI_4
  • TpetraCore_CrsGraph_getNumDiags_MPI_4
  • TpetraCore_AddProfiling_UnitTests_MPI_4

Current Status on CDash

Failed TpetraCore tests for the current testing day

Steps to Reproduce

One should be able to reproduce this failure on with a sems rhel6 environment as described in:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md

More specifically, the commands given for with a sems rhel6 environment are provided at:

  • https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md#sems-rhel6-environment

The exact commands to reproduce this issue should be:

$ cd <some_build_dir>/
$ source $TRILINOS_DIR/cmake/std/atdm/load-env.sh Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug
$ cmake \
 -GNinja \
 -DTrilinos_CONFIGURE_OPTIONS_FILE:STRING=cmake/std/atdm/ATDMDevEnv.cmake \
 -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_Tpetra=ON \
 $TRILINOS_DIR
$ make NP=16
$ ctest -j8
ATDM Env Issue Nonblocker Data Services ATDM Tpetra in review bug

All 3 comments

FYI: We switched from ctest -j10 to ctest -j4 on 'ascicgpu14' in #4865 merged to 'develop' on 4/10/2019 and it seems to have addressed the problem. Putting this in review and running for a few days to verify the problem is fixed.

As shown in the below table from our CDash analysis tool (#2933) all of these tests have been passing for at least 9 consecutive days since the merge of PR #4865.

Therefore, we can close this issue.


Tests with issue trackers Passed: twip=21 (2019-04-19)

Site Build Name Test Name Status Details Consec­utive Pass Days Non-pass Last 30 Days Pass Last 30 Days Issue Tracker
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­AddProfiling_­UnitTests_­MPI_­4 Passed Completed 17 3 26 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­CrsGraph_­getNumDiags_­MPI_­4 Passed Completed 25 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­CrsMatrix_­NonlocalSumInto_­Ignore_­MPI_­4 Passed Completed 23 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­CrsMatrix_­gaussSeidel_­MPI_­4 Passed Completed 23 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­ImportBug5430_­MPI_­4 Passed Completed 19 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­ImportExport2_­UnitTests_­MPI_­4 Passed Completed 15 4 25 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­Import_­Union_­MPI_­4 Passed Completed 19 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­Issue601_­MPI_­4 Passed Completed 25 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­Issue_­114_­MPI_­4 Passed Completed 24 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­Issue_­607_­MPI_­4 Passed Completed 24 1 28 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­Map_­Bug5822_­2_­MPI_­2 Passed Completed (Completed) 10 2 26 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-shared-release-debug TpetraCore_­MatrixMatrix_­UnitTests_­MPI_­4 Passed Completed 18 2 27 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­MultiVector_­MicroBenchmark_­MPI_­1 Passed Completed 15 4 24 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­createMirrorView_­MPI_­1 Passed Completed 24 1 27 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­deep_­copy_­MultiVector_­to_­SerialDenseMatrix_­MPI_­1 Passed Completed 15 2 19 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­gemm_­m_­eq_­13_­MPI_­1 Passed Completed 9 19 9 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­gemm_­m_­eq_­1_­MPI_­1 Passed Completed 9 19 9 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­gemm_­m_­eq_­2_­MPI_­1 Passed Completed 9 19 9 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­gemm_­m_­eq_­5_­MPI_­1 Passed Completed 9 19 9 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­gemv_­MPI_­1 Passed Completed 15 4 24 #4790
sems-rhel7 Trilinos-atdm-sems-rhel7-cuda-9.2-Volta70-complex-static-release-debug TpetraCore_­getEntryOnHost_­MPI_­1 Passed Completed 24 1 27 #4790

Closing as complete as per above.

Was this page helpful?
0 / 5 - 0 ratings