Trilinos: Test TeuchosComm_TimeMonitor_UnitTests_MPI_3 randomly failing in ATDM builds of Trilinos

Created on 31 Mar 2018  路  4Comments  路  Source: trilinos/Trilinos

CC: @trilinos/teuchos

Next Action Status

Commit a481846 merged to 'develop' on 3/30/2018 and no timeouts seen failing since then.

Description

As shown in the query from 2018-02-01 to 2018-03-31:

the test TeuchosComm_TimeMonitor_UnitTests_MPI_3 appears to be failing randomly. It failed 17 times in the following ATDM four builds on 'white' and 'ride':

  • Trilinos-atdm-white-ride-cuda-debug
  • Trilinos-atdm-white-ride-cuda-opt
  • Trilinos-atdm-white-ride-gnu-debug-openmp
  • Trilinos-atdm-white-ride-gnu-opt-openmp



Table of failing tests (sorted by date) (click to expand)

| Site | Build Name | Status | Time | Details | Build Time |
| :-- | :-- | :-- | :-- | --: | :-- |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-30T06:20:33 UTC |
| ride | gnu-opt-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-30T06:11:48 UTC |
| ride | cuda-opt | Failed | 10m 20ms | Completed (Timeout) | 2018-03-27T06:50:29 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-27T06:16:00 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-22T06:17:49 UTC |
| ride | gnu-opt-openmp | Failed | 10m 60ms | Completed (Timeout) | 2018-03-20T06:18:19 UTC |
| ride | gnu-opt-openmp | Failed | 1s 650ms | Completed (Failed) | 2018-03-16T06:12:14 UTC |
| ride | cuda-opt | Failed | 2s 180ms | Completed (Failed) | 2018-03-15T06:50:23 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-15T06:16:34 UTC |
| ride | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:58:58 UTC |
| ride | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:32:20 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:16:13 UTC |
| ride | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-11T18:37:16 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-10T07:17:18 UTC |
| white | cuda-opt | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T08:13:51 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T05:07:23 UTC |
| white | gnu-opt-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T05:02:21 UTC |

NOTE: All of the above builds really start with Trilinos-atdm-white-ride- but that was removed to shorten the "Build Name" column.

  • Note taht all but two above the the above tests timed out. The two that faikled completed in under 3 sec.

When the test times out, it looks like it passes, for example as shown today at:

which shows:

***
*** Unit test suite ...
***


Sorting tests by group name then by the order they were added ... (time = 3.1e-06)

Running unit tests ...

0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00122 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00648 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00905 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00778 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00117 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.156 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.341 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.28 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)

Total Time: 2.8 sec

Summary: total = 9, run = 9, passed = 9, failed = 0

End Result: TEST PASSED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------

So it looks like on of the other processes is failing for some reason and that causes the other processes to hang.

When it did fail, for example on 2018-03-16 shown at:

it showed:

--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that has less cpus than
requested ones:

  #cpus-per-proc:  8
  number of cpus:  7
  map-by:          BYSOCKET:OVERSUBSCRIBE

Please specify a mapping level that has more cpus, or else let us
define a default mapping that will allow multiple cpus-per-proc.
--------------------------------------------------------------------------

I think those failures were addressed by commit 114ca53 from "Fri Mar 16 10:17:14 2018" pushed that day. Therefore, I think we can ignore these failures. Since 2018-03-16, there have been only random timeouts so we will focus just on those timeouts.

Looking the the full history for this test for the build Trilinos-atdm-white-ride-cuda-debug in white (which is targeted for a Trilinos auto PR build in #2464) in the query:

You can see that other than timing out at 10 minutes today, the test timed out three other days which were 2018-03-22, 2018-03-15 and 2018-03-10. When the test does pass, it does so in less than 6 sec in all cases. When the test does pass, for example yesterday in the above query, it shows:

***
*** Unit test suite ...
***


Sorting tests by group name then by the order they were added ... (time = 5.96e-06)

Running unit tests ...

0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00168 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00647 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00907 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00782 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00117 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.156 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.34 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.28 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)

Total Time: 2.8 sec

Summary: total = 9, run = 9, passed = 9, failed = 0

End Result: TEST PASSED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51006,1],2]
  Exit code:    1
--------------------------------------------------------------------------

Wow, that is the same output for when the test times out!

Steps to Reproduce

Should be able to reproduce this by logging onto 'white' or 'ride' and then running any of the supported builds on this platform shown above and then enable and run the Teuchos tests as documented at:

Related Issues

  • Related to: #2464
Framework ATDM Teuchos bug

All 4 comments

I was able to reproduce the failing test by loging on to 'white', updating my Trilinos 'develop' branch and then doing:

$ cd ~/Trilinos.base/BUILD/WHITE/CHECKIN/

$ ./checkin-test-atdm.sh gnu-debug-openmp --enable-packages=Teuchos --local-do-all

$ cd gnu-debug-openmp/

$ source load-env.sh

$ cd packages/teuchos/comm/test/time/

$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
  "--teuchos-suppress-startup-banner"

NVIDIA: no NVIDIA devices found

***
*** Unit test suite ...
***


Sorting tests by group name then by the order they were added ... (time = 5.96e-06)

Running unit tests ...

0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00142 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00686 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00956 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00819 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00123 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.165 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.36 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.35 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)

Total Time: 2.91 sec

Summary: total = 9, run = 9, passed = 9, failed = 0

End Result: TEST PASSED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51789,1],2]
  Exit code:    1
--------------------------------------------------------------------------


$  mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe "--teuchos-suppress-startup-banner"

I then ran the test with verbose output on the other processes and the test actually fails on the other processes. For example, running:

$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
  "--teuchos-suppress-startup-banner" --output-show-proc-rank --output-to-root-rank-only=1

produces the output show below



detailed test output (click to expand)

NVIDIA: no NVIDIA devices found
p=1 | 
p=1 | ***
p=1 | *** Unit test suite ...
p=1 | ***
p=1 | 
p=1 | 
p=1 | Sorting tests by group name then by the order they were added ... (time = 6.2e-06)
p=1 | 
p=1 | Running unit tests ...
p=1 | 
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  [FUNC_TIME_MONITOR1]
p=1 |  Total times:  {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 |  Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |  
p=1 |  substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - FUNC_TIME_MONITOR1
p=1 |  Total times: 
p=1 |    FUNC_TIME_MONITOR1: 
p=1 |      MinOverProcs: 0
p=1 |      MeanOverProcs: 0
p=1 |      MaxOverProcs: 0
p=1 |      MeanOverCallCounts: 0
p=1 |  Call counts:
p=1 |    FUNC_TIME_MONITOR1: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |  
p=1 |  substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (0.00131 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137
p=1 |  
p=1 | 1. TimeMonitor_enableTimer_UnitTest ... 
p=1 |  Testing TimeMonitor's disableTimer() and enableTimer() methods
p=1 |  Creating timers
p=1 |  Running all timers without disabling any
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  Disabling one timer and trying again
p=1 |  Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
p=1 |  timers[0]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  Reenabling the timer and trying again
p=1 |  Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
p=1 |  timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 |  timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 |  Test that summarize() reports enabled and disabled timers
p=1 |  Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  [FAILED]  (0.00674 sec) TimeMonitor_enableTimer_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202
p=1 |  
p=1 | 2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00958 sec)
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
p=1 |  Total times:  {"Foo: timer 1": {MinOverProcs: 0.00042367, MeanOverProcs: 0.000424862, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141621}, "Foo: timer 2": {MinOverProcs: 0.000425339, MeanOverProcs: 0.000425895, MaxOverProcs: 0.000426292, MeanOverCallCounts: 0.000141965}, "Foo: timer 3": {MinOverProcs: 0.000423908, MeanOverProcs: 0.0004251, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.0001417}}
p=1 |  Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 226 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Foo: timer 1"
p=1 |    - "Foo: timer 2"
p=1 |    - "Foo: timer 3"
p=1 |  Total times: 
p=1 |    "Foo: timer 1": 
p=1 |      MinOverProcs: 0.00042367
p=1 |      MeanOverProcs: 0.000424862
p=1 |      MaxOverProcs: 0.000425816
p=1 |      MeanOverCallCounts: 0.000141621
p=1 |    "Foo: timer 2": 
p=1 |      MinOverProcs: 0.000425339
p=1 |      MeanOverProcs: 0.000425895
p=1 |      MaxOverProcs: 0.000426292
p=1 |      MeanOverCallCounts: 0.000141965
p=1 |    "Foo: timer 3": 
p=1 |      MinOverProcs: 0.000423908
p=1 |      MeanOverProcs: 0.0004251
p=1 |      MaxOverProcs: 0.000425816
p=1 |      MeanOverCallCounts: 0.0001417
p=1 |  Call counts:
p=1 |    "Foo: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Foo: timer 2": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Foo: timer 3": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 247 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Bar: timer 1", "Bar: timer 2"]
p=1 |  Total times:  {"Bar: timer 1": {MinOverProcs: 0.000430107, MeanOverProcs: 0.000455062, MaxOverProcs: 0.000503778, MeanOverCallCounts: 0.000151687}, "Bar: timer 2": {MinOverProcs: 0.000426054, MeanOverProcs: 0.000426054, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000142018}}
p=1 |  Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Bar: timer 1"
p=1 |    - "Bar: timer 2"
p=1 |  Total times: 
p=1 |    "Bar: timer 1": 
p=1 |      MinOverProcs: 0.000430107
p=1 |      MeanOverProcs: 0.000455062
p=1 |      MaxOverProcs: 0.000503778
p=1 |      MeanOverCallCounts: 0.000151687
p=1 |    "Bar: timer 2": 
p=1 |      MinOverProcs: 0.000426054
p=1 |      MeanOverProcs: 0.000426054
p=1 |      MaxOverProcs: 0.000426054
p=1 |      MeanOverCallCounts: 0.000142018
p=1 |  Call counts:
p=1 |    "Bar: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Bar: timer 2": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Baz: timer 1"]
p=1 |  Total times:  {"Baz: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425259, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141753}}
p=1 |  Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Baz: timer 1"
p=1 |  Total times: 
p=1 |    "Baz: timer 1": 
p=1 |      MinOverProcs: 0.000424862
p=1 |      MeanOverProcs: 0.000425259
p=1 |      MaxOverProcs: 0.000425816
p=1 |      MeanOverCallCounts: 0.000141753
p=1 |  Call counts:
p=1 |    "Baz: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (0.00825 sec) TimeMonitor_TimerLabelFiltering_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430
p=1 |  
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 |  substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
p=1 |  Total times:  {FUNC_TIME_MONITOR2: {MinOverProcs: 4.05312e-06, MeanOverProcs: 4.6889e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.6889e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 |  Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |  
p=1 |  substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 |  substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - FUNC_TIME_MONITOR2
p=1 |    - FUNC_TIME_MONITOR2_inner
p=1 |  Total times: 
p=1 |    FUNC_TIME_MONITOR2: 
p=1 |      MinOverProcs: 4.05312e-06
p=1 |      MeanOverProcs: 4.6889e-06
p=1 |      MaxOverProcs: 5.00679e-06
p=1 |      MeanOverCallCounts: 4.6889e-06
p=1 |    FUNC_TIME_MONITOR2_inner: 
p=1 |      MinOverProcs: 0
p=1 |      MeanOverProcs: 0
p=1 |      MaxOverProcs: 0
p=1 |      MeanOverCallCounts: 0
p=1 |  Call counts:
p=1 |    FUNC_TIME_MONITOR2: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |    FUNC_TIME_MONITOR2_inner: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |  
p=1 |  substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 |  substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (0.0011 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639
p=1 |  
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 
p=1 |  
p=1 |  Testing intersection of timers:
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  Testing union of timers:
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 |  [FAILED]  (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711
p=1 |  
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808
p=1 |  
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869
p=1 |  
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 |  
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  [FAILED]  (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951
p=1 |  
p=1 | 
p=1 | The following tests FAILED:
p=1 |     0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
p=1 |     1. TimeMonitor_enableTimer_UnitTest ... 
p=1 |     3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
p=1 |     4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
p=1 |     5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 
p=1 |     6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
p=1 |     7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
p=1 |     8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 
p=1 | 
p=1 | Total Time: 2.91 sec
p=1 | 
p=1 | Summary: total = 9, run = 9, passed = 1, failed = 8
p=1 | 
p=1 | End Result: TEST FAILED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51238,1],2]
  Exit code:    1
--------------------------------------------------------------------------

Looking at this output, I just realized that this test likely fails in other builds on other platforms but it is being ignored!

Looking at the all of the runs of this test today at:

and looking at some of the results, it is definitely also failing on other builds and platforms. For example, for the "Clean" build Linux-gcc-4.8.4-MPI_Release_gcc_4.8.4_openmpi_1.8.7_DEV today the output at:

shows:

***
*** Unit test suite ...
***


Sorting tests by group name then by the order they were added ... (time = 5.96e-06)

Running unit tests ...

0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00111 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.000257 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00166 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00165 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.000447 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.000245 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.000167 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (0.000155 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)

Total Time: 1.01 sec

Summary: total = 9, run = 9, passed = 9, failed = 0

End Result: TEST PASSED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[31998,1],1]
  Exit code:    1
--------------------------------------------------------------------------

That is bad.

The problem is that this unit test is not using the parallel unit test driver. It is using the standard serial driver.

I just pushed the branch 2487-time-monitor-ride-white to my GitHub fork with the commit f395fe5:

commit f395fe51761658b8326b2473f075b304bef97a42
Author: Roscoe A. Bartlett <[email protected]>
Date:   Fri Mar 30 17:43:36 2018 -0600

    Use parallel unit test driver to show non-root rank failures (#2487)

    This test actally fails all of the unit tests on the non-root rank.  Changing
    to use the parallel unit test driver makes this clear.

M       packages/teuchos/comm/test/Time/CMakeLists.txt

Now when I run the test on 'white' using:

$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
  "--teuchos-suppress-startup-banner"

it fails a showns the output shown below.



detailed test output using parallel test driver (click to expand)

NVIDIA: no NVIDIA devices found

***
*** Unit test suite ...
***


Sorting tests by group name then by the order they were added ... (time = 5.01e-06)

Running unit tests ...

0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
 =============================================================================================

                             TimeMonitor results over 3 processors

 Timer Name            MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 ---------------------------------------------------------------------------------------------
 FUNC_TIME_MONITOR1    0 (1)           0 (1)            0 (1)           0 (1)                 
 =============================================================================================

 substr_i = 350 != std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: compact
 Number of processes: 3
 Time unit: s
 Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
 Timer names:  [FUNC_TIME_MONITOR1]
 Total times:  {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
 Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}

 substr_i = 193 != std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: spacious
 Number of processes: 3
 Time unit: s
 Statistics collected: 
   - MinOverProcs
   - MeanOverProcs
   - MaxOverProcs
   - MeanOverCallCounts
 Timer names: 
   - FUNC_TIME_MONITOR1
 Total times: 
   FUNC_TIME_MONITOR1: 
     MinOverProcs: 0
     MeanOverProcs: 0
     MaxOverProcs: 0
     MeanOverCallCounts: 0
 Call counts:
   FUNC_TIME_MONITOR1: 
     MinOverProcs: 1
     MeanOverProcs: 1
     MaxOverProcs: 1
     MeanOverCallCounts: 1

 substr_i = 208 != std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.00142 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137

1. TimeMonitor_enableTimer_UnitTest ... 
 Testing TimeMonitor's disableTimer() and enableTimer() methods
 Creating timers
 Running all timers without disabling any
 timers[i]->numCalls() = 5 == numTrials = 5 : passed
 timers[i]->numCalls() = 5 == numTrials = 5 : passed
 timers[i]->numCalls() = 5 == numTrials = 5 : passed
 Disabling one timer and trying again
 Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
 timers[0]->numCalls() = 5 == numTrials = 5 : passed
 timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
 timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
 Reenabling the timer and trying again
 Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
 timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
 timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
 timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
 Test that summarize() reports enabled and disabled timers
 Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
 =============================================================================================

                             TimeMonitor results over 3 processors

 Timer Name    MinOverProcs     MeanOverProcs    MaxOverProcs     MeanOverCallCounts    
 ---------------------------------------------------------------------------------------------
 Timer 0       0.001417 (10)    0.001418 (10)    0.001418 (10)    0.0001418 (10)        
 Timer 1       0.002124 (15)    0.002132 (15)    0.002146 (15)    0.0001421 (15)        
 Timer 2       0.002129 (15)    0.00213 (15)     0.002131 (15)    0.000142 (15)         
 =============================================================================================

 substr_i = 344 != std::string::npos = 18446744073709551615 : passed
 substr_i = 432 != std::string::npos = 18446744073709551615 : passed
 substr_i = 520 != std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.00718 sec) TimeMonitor_enableTimer_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202

2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00963 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
 =============================================================================================

                             TimeMonitor results over 3 processors

 Timer Name      MinOverProcs     MeanOverProcs    MaxOverProcs     MeanOverCallCounts    
 ---------------------------------------------------------------------------------------------
 Foo: timer 1    0.0004249 (3)    0.0004254 (3)    0.0004261 (3)    0.0001418 (3)         
 Foo: timer 2    0.0004258 (3)    0.0004259 (3)    0.0004261 (3)    0.000142 (3)          
 Foo: timer 3    0.0004251 (3)    0.0004254 (3)    0.0004261 (3)    0.0001418 (3)         
 =============================================================================================

 pos = 346 != std::string::npos = 18446744073709551615 : passed
 pos = 436 != std::string::npos = 18446744073709551615 : passed
 pos = 526 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: compact
 Number of processes: 3
 Time unit: s
 Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
 Timer names:  ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
 Total times:  {"Foo: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}, "Foo: timer 2": {MinOverProcs: 0.000425816, MeanOverProcs: 0.000425895, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141965}, "Foo: timer 3": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}}
 Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}

 pos = 194 != std::string::npos = 18446744073709551615 : passed
 pos = 210 != std::string::npos = 18446744073709551615 : passed
 pos = 226 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: spacious
 Number of processes: 3
 Time unit: s
 Statistics collected: 
   - MinOverProcs
   - MeanOverProcs
   - MaxOverProcs
   - MeanOverCallCounts
 Timer names: 
   - "Foo: timer 1"
   - "Foo: timer 2"
   - "Foo: timer 3"
 Total times: 
   "Foo: timer 1": 
     MinOverProcs: 0.000424862
     MeanOverProcs: 0.000425418
     MaxOverProcs: 0.000426054
     MeanOverCallCounts: 0.000141806
   "Foo: timer 2": 
     MinOverProcs: 0.000425816
     MeanOverProcs: 0.000425895
     MaxOverProcs: 0.000426054
     MeanOverCallCounts: 0.000141965
   "Foo: timer 3": 
     MinOverProcs: 0.0004251
     MeanOverProcs: 0.000425418
     MaxOverProcs: 0.000426054
     MeanOverCallCounts: 0.000141806
 Call counts:
   "Foo: timer 1": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3
   "Foo: timer 2": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3
   "Foo: timer 3": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3

 pos = 209 != std::string::npos = 18446744073709551615 : passed
 pos = 228 != std::string::npos = 18446744073709551615 : passed
 pos = 247 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 =============================================================================================

                             TimeMonitor results over 3 processors

 Timer Name      MinOverProcs     MeanOverProcs    MaxOverProcs     MeanOverCallCounts    
 ---------------------------------------------------------------------------------------------
 Bar: timer 1    0.0004249 (3)    0.0004249 (3)    0.0004251 (3)    0.0001416 (3)         
 Bar: timer 2    0.0004251 (3)    0.0004255 (3)    0.0004261 (3)    0.0001418 (3)         
 =============================================================================================

 pos = 346 != std::string::npos = 18446744073709551615 : passed
 pos = 436 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: compact
 Number of processes: 3
 Time unit: s
 Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
 Timer names:  ["Bar: timer 1", "Bar: timer 2"]
 Total times:  {"Bar: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000424941, MaxOverProcs: 0.0004251, MeanOverCallCounts: 0.000141647}, "Bar: timer 2": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425498, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141833}}
 Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}

 pos = 194 != std::string::npos = 18446744073709551615 : passed
 pos = 210 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: spacious
 Number of processes: 3
 Time unit: s
 Statistics collected: 
   - MinOverProcs
   - MeanOverProcs
   - MaxOverProcs
   - MeanOverCallCounts
 Timer names: 
   - "Bar: timer 1"
   - "Bar: timer 2"
 Total times: 
   "Bar: timer 1": 
     MinOverProcs: 0.000424862
     MeanOverProcs: 0.000424941
     MaxOverProcs: 0.0004251
     MeanOverCallCounts: 0.000141647
   "Bar: timer 2": 
     MinOverProcs: 0.0004251
     MeanOverProcs: 0.000425498
     MaxOverProcs: 0.000426054
     MeanOverCallCounts: 0.000141833
 Call counts:
   "Bar: timer 1": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3
   "Bar: timer 2": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3

 pos = 209 != std::string::npos = 18446744073709551615 : passed
 pos = 228 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 =============================================================================================

                             TimeMonitor results over 3 processors

 Timer Name      MinOverProcs     MeanOverProcs    MaxOverProcs     MeanOverCallCounts    
 ---------------------------------------------------------------------------------------------
 Baz: timer 1    0.0004251 (3)    0.0004257 (3)    0.0004261 (3)    0.0001419 (3)         
 =============================================================================================

 pos = 346 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: compact
 Number of processes: 3
 Time unit: s
 Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
 Timer names:  ["Baz: timer 1"]
 Total times:  {"Baz: timer 1": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425657, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141886}}
 Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}

 pos = 194 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: spacious
 Number of processes: 3
 Time unit: s
 Statistics collected: 
   - MinOverProcs
   - MeanOverProcs
   - MaxOverProcs
   - MeanOverCallCounts
 Timer names: 
   - "Baz: timer 1"
 Total times: 
   "Baz: timer 1": 
     MinOverProcs: 0.0004251
     MeanOverProcs: 0.000425657
     MaxOverProcs: 0.000426054
     MeanOverCallCounts: 0.000141886
 Call counts:
   "Baz: timer 1": 
     MinOverProcs: 3
     MeanOverProcs: 3
     MaxOverProcs: 3
     MeanOverCallCounts: 3

 pos = 209 != std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.00819 sec) TimeMonitor_TimerLabelFiltering_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430

4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name                  MinOverProcs     MeanOverProcs    MaxOverProcs     MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 FUNC_TIME_MONITOR2          4.053e-06 (1)    4.371e-06 (1)    5.007e-06 (1)    4.371e-06 (1)         
 FUNC_TIME_MONITOR2_inner    0 (1)            0 (1)            0 (1)            0 (1)                 
 =====================================================================================================

 substr_i = 378 != std::string::npos = 18446744073709551615 : passed
 substr_inner_i = 480 != std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: compact
 Number of processes: 3
 Time unit: s
 Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
 Timer names:  [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
 Total times:  {FUNC_TIME_MONITOR2: {MinOverProcs: 4.05312e-06, MeanOverProcs: 4.37101e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.37101e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
 Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}

 substr_i = 193 != std::string::npos = 18446744073709551615 : passed
 substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
 # Teuchos::TimeMonitor report
 ---
 Output mode: spacious
 Number of processes: 3
 Time unit: s
 Statistics collected: 
   - MinOverProcs
   - MeanOverProcs
   - MaxOverProcs
   - MeanOverCallCounts
 Timer names: 
   - FUNC_TIME_MONITOR2
   - FUNC_TIME_MONITOR2_inner
 Total times: 
   FUNC_TIME_MONITOR2: 
     MinOverProcs: 4.05312e-06
     MeanOverProcs: 4.37101e-06
     MaxOverProcs: 5.00679e-06
     MeanOverCallCounts: 4.37101e-06
   FUNC_TIME_MONITOR2_inner: 
     MinOverProcs: 0
     MeanOverProcs: 0
     MaxOverProcs: 0
     MeanOverCallCounts: 0
 Call counts:
   FUNC_TIME_MONITOR2: 
     MinOverProcs: 1
     MeanOverProcs: 1
     MaxOverProcs: 1
     MeanOverCallCounts: 1
   FUNC_TIME_MONITOR2_inner: 
     MinOverProcs: 1
     MeanOverProcs: 1
     MaxOverProcs: 1
     MeanOverCallCounts: 1

 substr_i = 208 != std::string::npos = 18446744073709551615 : passed
 substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.00127 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639

5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 

 Testing intersection of timers:
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer A       0.09191 (3)     0.09193 (3)      0.09195 (3)     0.03064 (3)           
 =====================================================================================================

 substr_i = 362 != std::string::npos = 18446744073709551615 : passed
 substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed

 Testing union of timers:
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs       MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer A       0.09191 (3)     0.09193 (3)         0.09195 (3)     0.03064 (3)           
 Timer B       0 (0)           0.02421 (0.3333)    0.07262 (1)     0.07262 (0.3333)      
 =====================================================================================================

 substr_i = 365 != std::string::npos = 18446744073709551615 : passed
 substr_i = 454 != std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711

6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer A       0.2178 (3)      0.2179 (3)       0.2179 (3)      0.07263 (3)           
 Timer B       0.1419 (1)      0.1419 (1)       0.1419 (1)      0.1419 (1)            
 =====================================================================================================

 substr_i = 362 != std::string::npos = 18446744073709551615 : passed
 substr_i = 448 != std::string::npos = 18446744073709551615 : passed
 substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808

7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer A       0 (0)           0.07262 (1)      0.2179 (3)      0.07262 (1)           
 Timer B       1.135 (1)       1.135 (1)        1.135 (1)       1.135 (1)             
 =====================================================================================================

 substr_i = 362 != std::string::npos = 18446744073709551615 : passed
 substr_i = 448 != std::string::npos = 18446744073709551615 : passed
 substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869

8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs      MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer Z       0 (0)           0.6668 (0.6667)    1 (1)           1 (0.6667)            
 =====================================================================================================

 substr_i = 378 != std::string::npos = 18446744073709551615 : passed
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs      MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer Z       0 (0)           0.6668 (0.6667)    1 (1)           1 (0.6667)            
 =====================================================================================================

 substr_i = 378 != std::string::npos = 18446744073709551615 : passed
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer Z       1 (1)           1 (1)            1 (1)           1 (1)                 
 =====================================================================================================

 substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 =====================================================================================================

                                 TimeMonitor results over 3 processors

 Timer Name    MinOverProcs    MeanOverProcs    MaxOverProcs    MeanOverCallCounts    
 -----------------------------------------------------------------------------------------------------
 Timer Z       1 (1)           1 (1)            1 (1)           1 (1)                 
 =====================================================================================================

 substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)
 [FAILED]  (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
 Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951


The following tests FAILED:
    0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
    1. TimeMonitor_enableTimer_UnitTest ... 
    3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
    4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
    5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 
    6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
    7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
    8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 

Total Time: 2.91 sec

Summary: total = 9, run = 9, passed = 1, failed = 8

End Result: TEST FAILED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[59811,1],0]
  Exit code:    1
--------------------------------------------------------------------------

  • This shows in all the unit tests:

 NOTE: Global reduction shows failures on other processes!
 (rerun with --output-to-root-rank-only=-1 to see output
 from other processes to see what process failed!)
 NOTE: Unit test failed on processes = {1, 2}
 (rerun with --output-to-root-rank-only=<procID> to see output
 from individual processes where the unit test is failing!)

If I run with:

$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
  "--teuchos-suppress-startup-banner" --output-show-proc-rank --output-to-root-rank-only=1

it produces the output show below.



detailed test output on rank=1 with parallel test driver (click to expand)

NVIDIA: no NVIDIA devices found
p=1 | 
p=1 | ***
p=1 | *** Unit test suite ...
p=1 | ***
p=1 | 
p=1 | 
p=1 | Sorting tests by group name then by the order they were added ... (time = 6.2e-06)
p=1 | 
p=1 | Running unit tests ...
p=1 | 
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  [FUNC_TIME_MONITOR1]
p=1 |  Total times:  {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 |  Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |  
p=1 |  substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - FUNC_TIME_MONITOR1
p=1 |  Total times: 
p=1 |    FUNC_TIME_MONITOR1: 
p=1 |      MinOverProcs: 0
p=1 |      MeanOverProcs: 0
p=1 |      MaxOverProcs: 0
p=1 |      MeanOverCallCounts: 0
p=1 |  Call counts:
p=1 |    FUNC_TIME_MONITOR1: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |  
p=1 |  substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.00136 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137
p=1 |  
p=1 | 1. TimeMonitor_enableTimer_UnitTest ... 
p=1 |  Testing TimeMonitor's disableTimer() and enableTimer() methods
p=1 |  Creating timers
p=1 |  Running all timers without disabling any
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  Disabling one timer and trying again
p=1 |  Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
p=1 |  timers[0]->numCalls() = 5 == numTrials = 5 : passed
p=1 |  timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  Reenabling the timer and trying again
p=1 |  Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
p=1 |  timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 |  timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 |  timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 |  Test that summarize() reports enabled and disabled timers
p=1 |  Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.00686 sec) TimeMonitor_enableTimer_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202
p=1 |  
p=1 | 2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.0096 sec)
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
p=1 |  Total times:  {"Foo: timer 1": {MinOverProcs: 0.000428677, MeanOverProcs: 0.00043265, MaxOverProcs: 0.000438213, MeanOverCallCounts: 0.000144217}, "Foo: timer 2": {MinOverProcs: 0.000424147, MeanOverProcs: 0.000424941, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141647}, "Foo: timer 3": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425259, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141753}}
p=1 |  Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 226 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Foo: timer 1"
p=1 |    - "Foo: timer 2"
p=1 |    - "Foo: timer 3"
p=1 |  Total times: 
p=1 |    "Foo: timer 1": 
p=1 |      MinOverProcs: 0.000428677
p=1 |      MeanOverProcs: 0.00043265
p=1 |      MaxOverProcs: 0.000438213
p=1 |      MeanOverCallCounts: 0.000144217
p=1 |    "Foo: timer 2": 
p=1 |      MinOverProcs: 0.000424147
p=1 |      MeanOverProcs: 0.000424941
p=1 |      MaxOverProcs: 0.000426054
p=1 |      MeanOverCallCounts: 0.000141647
p=1 |    "Foo: timer 3": 
p=1 |      MinOverProcs: 0.000424862
p=1 |      MeanOverProcs: 0.000425259
p=1 |      MaxOverProcs: 0.000425816
p=1 |      MeanOverCallCounts: 0.000141753
p=1 |  Call counts:
p=1 |    "Foo: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Foo: timer 2": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Foo: timer 3": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 247 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Bar: timer 1", "Bar: timer 2"]
p=1 |  Total times:  {"Bar: timer 1": {MinOverProcs: 0.000426054, MeanOverProcs: 0.000426054, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000142018}, "Bar: timer 2": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}}
p=1 |  Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Bar: timer 1"
p=1 |    - "Bar: timer 2"
p=1 |  Total times: 
p=1 |    "Bar: timer 1": 
p=1 |      MinOverProcs: 0.000426054
p=1 |      MeanOverProcs: 0.000426054
p=1 |      MaxOverProcs: 0.000426054
p=1 |      MeanOverCallCounts: 0.000142018
p=1 |    "Bar: timer 2": 
p=1 |      MinOverProcs: 0.000424862
p=1 |      MeanOverProcs: 0.000425418
p=1 |      MaxOverProcs: 0.000426054
p=1 |      MeanOverCallCounts: 0.000141806
p=1 |  Call counts:
p=1 |    "Bar: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |    "Bar: timer 2": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  ["Baz: timer 1"]
p=1 |  Total times:  {"Baz: timer 1": {MinOverProcs: 0.000425577, MeanOverProcs: 0.000425975, MaxOverProcs: 0.000426292, MeanOverCallCounts: 0.000141992}}
p=1 |  Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |  
p=1 |  pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - "Baz: timer 1"
p=1 |  Total times: 
p=1 |    "Baz: timer 1": 
p=1 |      MinOverProcs: 0.000425577
p=1 |      MeanOverProcs: 0.000425975
p=1 |      MaxOverProcs: 0.000426292
p=1 |      MeanOverCallCounts: 0.000141992
p=1 |  Call counts:
p=1 |    "Baz: timer 1": 
p=1 |      MinOverProcs: 3
p=1 |      MeanOverProcs: 3
p=1 |      MaxOverProcs: 3
p=1 |      MeanOverCallCounts: 3
p=1 |  
p=1 |  pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.00819 sec) TimeMonitor_TimerLabelFiltering_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430
p=1 |  
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 |  substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: compact
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected:  [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 |  Timer names:  [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
p=1 |  Total times:  {FUNC_TIME_MONITOR2: {MinOverProcs: 3.8147e-06, MeanOverProcs: 4.60943e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.60943e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 |  Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |  
p=1 |  substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 |  substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
p=1 |  # Teuchos::TimeMonitor report
p=1 |  ---
p=1 |  Output mode: spacious
p=1 |  Number of processes: 3
p=1 |  Time unit: s
p=1 |  Statistics collected: 
p=1 |    - MinOverProcs
p=1 |    - MeanOverProcs
p=1 |    - MaxOverProcs
p=1 |    - MeanOverCallCounts
p=1 |  Timer names: 
p=1 |    - FUNC_TIME_MONITOR2
p=1 |    - FUNC_TIME_MONITOR2_inner
p=1 |  Total times: 
p=1 |    FUNC_TIME_MONITOR2: 
p=1 |      MinOverProcs: 3.8147e-06
p=1 |      MeanOverProcs: 4.60943e-06
p=1 |      MaxOverProcs: 5.00679e-06
p=1 |      MeanOverCallCounts: 4.60943e-06
p=1 |    FUNC_TIME_MONITOR2_inner: 
p=1 |      MinOverProcs: 0
p=1 |      MeanOverProcs: 0
p=1 |      MaxOverProcs: 0
p=1 |      MeanOverCallCounts: 0
p=1 |  Call counts:
p=1 |    FUNC_TIME_MONITOR2: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |    FUNC_TIME_MONITOR2_inner: 
p=1 |      MinOverProcs: 1
p=1 |      MeanOverProcs: 1
p=1 |      MaxOverProcs: 1
p=1 |      MeanOverCallCounts: 1
p=1 |  
p=1 |  substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 |  substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.00113 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639
p=1 |  
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 
p=1 |  
p=1 |  Testing intersection of timers:
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  Testing union of timers:
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711
p=1 |  
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808
p=1 |  
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869
p=1 |  
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 |  
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 |  
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  
p=1 |  substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |  NOTE: Unit test failed on processes = {1, 2}
p=1 |  (rerun with --output-to-root-rank-only=<procID> to see output
p=1 |  from individual processes where the unit test is failing!)
p=1 |  [FAILED]  (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
p=1 |  Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951
p=1 |  
p=1 | 
p=1 | The following tests FAILED:
p=1 |     0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... 
p=1 |     1. TimeMonitor_enableTimer_UnitTest ... 
p=1 |     3. TimeMonitor_TimerLabelFiltering_UnitTest ... 
p=1 |     4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... 
p=1 |     5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... 
p=1 |     6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... 
p=1 |     7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... 
p=1 |     8. TimeMonitor_IgnoreMissingTimers_UnitTest ... 
p=1 | 
p=1 | Total Time: 2.91 sec
p=1 | 
p=1 | Summary: total = 9, run = 9, passed = 1, failed = 8
p=1 | 
p=1 | End Result: TEST FAILED
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[63321,1],2]
  Exit code:    1
--------------------------------------------------------------------------

*

The unique failures showin in that output are:

p=1 |  pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 |  substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 |  substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659

I think that the problem is that the time monitor really only collects data to the root process. Therefore, the other processes should not be checking the ouptut at all. Instead, they should just not crash.

The PR #2488 fixes this test for all platforms.

So it turns out that this test was mess up on a bunch of builds and platforms, it was just that this Power8 system actually caused the test to hang in some cases due flaky MPI implementation.

This is an example of why we need to be testing on these platforms.

@bartlettroscoe YOU'RE AWESOME thanks for finding this & fixing it! :D

The test TeuchosComm_TimeMonitor_UnitTests_MPI_3 ran and passed in all 18 ATDM builds where it ran today as shown at:

We have not seen it fail since a481846 was merged.

I am confident this is fixed. I am closing as complete.

Was this page helpful?
0 / 5 - 0 ratings