CC: @trilinos/teuchos
Commit a481846 merged to 'develop' on 3/30/2018 and no timeouts seen failing since then.
As shown in the query from 2018-02-01 to 2018-03-31:
the test TeuchosComm_TimeMonitor_UnitTests_MPI_3 appears to be failing randomly. It failed 17 times in the following ATDM four builds on 'white' and 'ride':
Trilinos-atdm-white-ride-cuda-debugTrilinos-atdm-white-ride-cuda-optTrilinos-atdm-white-ride-gnu-debug-openmpTrilinos-atdm-white-ride-gnu-opt-openmp
Table of failing tests (sorted by date) (click to expand)
| Site | Build Name | Status | Time | Details | Build Time |
| :-- | :-- | :-- | :-- | --: | :-- |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-30T06:20:33 UTC |
| ride | gnu-opt-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-30T06:11:48 UTC |
| ride | cuda-opt | Failed | 10m 20ms | Completed (Timeout) | 2018-03-27T06:50:29 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-27T06:16:00 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-22T06:17:49 UTC |
| ride | gnu-opt-openmp | Failed | 10m 60ms | Completed (Timeout) | 2018-03-20T06:18:19 UTC |
| ride | gnu-opt-openmp | Failed | 1s 650ms | Completed (Failed) | 2018-03-16T06:12:14 UTC |
| ride | cuda-opt | Failed | 2s 180ms | Completed (Failed) | 2018-03-15T06:50:23 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-15T06:16:34 UTC |
| ride | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:58:58 UTC |
| ride | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:32:20 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-14T06:16:13 UTC |
| ride | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-11T18:37:16 UTC |
| white | cuda-debug | Failed | 10m 20ms | Completed (Timeout) | 2018-03-10T07:17:18 UTC |
| white | cuda-opt | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T08:13:51 UTC |
| white | gnu-debug-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T05:07:23 UTC |
| white | gnu-opt-openmp | Failed | 10m 20ms | Completed (Timeout) | 2018-03-07T05:02:21 UTC |
NOTE: All of the above builds really start with Trilinos-atdm-white-ride- but that was removed to shorten the "Build Name" column.
Note taht all but two above the the above tests timed out. The two that faikled completed in under 3 sec.
When the test times out, it looks like it passes, for example as shown today at:
which shows:
***
*** Unit test suite ...
***
Sorting tests by group name then by the order they were added ... (time = 3.1e-06)
Running unit tests ...
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00122 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00648 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00905 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00778 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00117 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.156 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.341 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.28 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)
Total Time: 2.8 sec
Summary: total = 9, run = 9, passed = 9, failed = 0
End Result: TEST PASSED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
So it looks like on of the other processes is failing for some reason and that causes the other processes to hang.
When it did fail, for example on 2018-03-16 shown at:
it showed:
--------------------------------------------------------------------------
A request for multiple cpus-per-proc was given, but a directive
was also give to map to an object level that has less cpus than
requested ones:
#cpus-per-proc: 8
number of cpus: 7
map-by: BYSOCKET:OVERSUBSCRIBE
Please specify a mapping level that has more cpus, or else let us
define a default mapping that will allow multiple cpus-per-proc.
--------------------------------------------------------------------------
I think those failures were addressed by commit 114ca53 from "Fri Mar 16 10:17:14 2018" pushed that day. Therefore, I think we can ignore these failures. Since 2018-03-16, there have been only random timeouts so we will focus just on those timeouts.
Looking the the full history for this test for the build Trilinos-atdm-white-ride-cuda-debug in white (which is targeted for a Trilinos auto PR build in #2464) in the query:
You can see that other than timing out at 10 minutes today, the test timed out three other days which were 2018-03-22, 2018-03-15 and 2018-03-10. When the test does pass, it does so in less than 6 sec in all cases. When the test does pass, for example yesterday in the above query, it shows:
***
*** Unit test suite ...
***
Sorting tests by group name then by the order they were added ... (time = 5.96e-06)
Running unit tests ...
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00168 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00647 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00907 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00782 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00117 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.156 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.34 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.28 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)
Total Time: 2.8 sec
Summary: total = 9, run = 9, passed = 9, failed = 0
End Result: TEST PASSED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[51006,1],2]
Exit code: 1
--------------------------------------------------------------------------
Wow, that is the same output for when the test times out!
Should be able to reproduce this by logging onto 'white' or 'ride' and then running any of the supported builds on this platform shown above and then enable and run the Teuchos tests as documented at:
I was able to reproduce the failing test by loging on to 'white', updating my Trilinos 'develop' branch and then doing:
$ cd ~/Trilinos.base/BUILD/WHITE/CHECKIN/
$ ./checkin-test-atdm.sh gnu-debug-openmp --enable-packages=Teuchos --local-do-all
$ cd gnu-debug-openmp/
$ source load-env.sh
$ cd packages/teuchos/comm/test/time/
$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
"--teuchos-suppress-startup-banner"
NVIDIA: no NVIDIA devices found
***
*** Unit test suite ...
***
Sorting tests by group name then by the order they were added ... (time = 5.96e-06)
Running unit tests ...
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00142 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.00686 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00956 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00819 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.00123 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.165 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.36 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (1.35 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)
Total Time: 2.91 sec
Summary: total = 9, run = 9, passed = 9, failed = 0
End Result: TEST PASSED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[51789,1],2]
Exit code: 1
--------------------------------------------------------------------------
$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe "--teuchos-suppress-startup-banner"
I then ran the test with verbose output on the other processes and the test actually fails on the other processes. For example, running:
$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
"--teuchos-suppress-startup-banner" --output-show-proc-rank --output-to-root-rank-only=1
produces the output show below
detailed test output (click to expand)
NVIDIA: no NVIDIA devices found
p=1 |
p=1 | ***
p=1 | *** Unit test suite ...
p=1 | ***
p=1 |
p=1 |
p=1 | Sorting tests by group name then by the order they were added ... (time = 6.2e-06)
p=1 |
p=1 | Running unit tests ...
p=1 |
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: [FUNC_TIME_MONITOR1]
p=1 | Total times: {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 | Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |
p=1 | substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - FUNC_TIME_MONITOR1
p=1 | Total times:
p=1 | FUNC_TIME_MONITOR1:
p=1 | MinOverProcs: 0
p=1 | MeanOverProcs: 0
p=1 | MaxOverProcs: 0
p=1 | MeanOverCallCounts: 0
p=1 | Call counts:
p=1 | FUNC_TIME_MONITOR1:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 |
p=1 | substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (0.00131 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137
p=1 |
p=1 | 1. TimeMonitor_enableTimer_UnitTest ...
p=1 | Testing TimeMonitor's disableTimer() and enableTimer() methods
p=1 | Creating timers
p=1 | Running all timers without disabling any
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | Disabling one timer and trying again
p=1 | Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
p=1 | timers[0]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | Reenabling the timer and trying again
p=1 | Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
p=1 | timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 | timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 | Test that summarize() reports enabled and disabled timers
p=1 | Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | [FAILED] (0.00674 sec) TimeMonitor_enableTimer_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202
p=1 |
p=1 | 2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00958 sec)
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ...
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
p=1 | Total times: {"Foo: timer 1": {MinOverProcs: 0.00042367, MeanOverProcs: 0.000424862, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141621}, "Foo: timer 2": {MinOverProcs: 0.000425339, MeanOverProcs: 0.000425895, MaxOverProcs: 0.000426292, MeanOverCallCounts: 0.000141965}, "Foo: timer 3": {MinOverProcs: 0.000423908, MeanOverProcs: 0.0004251, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.0001417}}
p=1 | Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 226 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Foo: timer 1"
p=1 | - "Foo: timer 2"
p=1 | - "Foo: timer 3"
p=1 | Total times:
p=1 | "Foo: timer 1":
p=1 | MinOverProcs: 0.00042367
p=1 | MeanOverProcs: 0.000424862
p=1 | MaxOverProcs: 0.000425816
p=1 | MeanOverCallCounts: 0.000141621
p=1 | "Foo: timer 2":
p=1 | MinOverProcs: 0.000425339
p=1 | MeanOverProcs: 0.000425895
p=1 | MaxOverProcs: 0.000426292
p=1 | MeanOverCallCounts: 0.000141965
p=1 | "Foo: timer 3":
p=1 | MinOverProcs: 0.000423908
p=1 | MeanOverProcs: 0.0004251
p=1 | MaxOverProcs: 0.000425816
p=1 | MeanOverCallCounts: 0.0001417
p=1 | Call counts:
p=1 | "Foo: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Foo: timer 2":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Foo: timer 3":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 247 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Bar: timer 1", "Bar: timer 2"]
p=1 | Total times: {"Bar: timer 1": {MinOverProcs: 0.000430107, MeanOverProcs: 0.000455062, MaxOverProcs: 0.000503778, MeanOverCallCounts: 0.000151687}, "Bar: timer 2": {MinOverProcs: 0.000426054, MeanOverProcs: 0.000426054, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000142018}}
p=1 | Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Bar: timer 1"
p=1 | - "Bar: timer 2"
p=1 | Total times:
p=1 | "Bar: timer 1":
p=1 | MinOverProcs: 0.000430107
p=1 | MeanOverProcs: 0.000455062
p=1 | MaxOverProcs: 0.000503778
p=1 | MeanOverCallCounts: 0.000151687
p=1 | "Bar: timer 2":
p=1 | MinOverProcs: 0.000426054
p=1 | MeanOverProcs: 0.000426054
p=1 | MaxOverProcs: 0.000426054
p=1 | MeanOverCallCounts: 0.000142018
p=1 | Call counts:
p=1 | "Bar: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Bar: timer 2":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Baz: timer 1"]
p=1 | Total times: {"Baz: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425259, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141753}}
p=1 | Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Baz: timer 1"
p=1 | Total times:
p=1 | "Baz: timer 1":
p=1 | MinOverProcs: 0.000424862
p=1 | MeanOverProcs: 0.000425259
p=1 | MaxOverProcs: 0.000425816
p=1 | MeanOverCallCounts: 0.000141753
p=1 | Call counts:
p=1 | "Baz: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (0.00825 sec) TimeMonitor_TimerLabelFiltering_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430
p=1 |
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 | substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
p=1 | Total times: {FUNC_TIME_MONITOR2: {MinOverProcs: 4.05312e-06, MeanOverProcs: 4.6889e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.6889e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 | Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |
p=1 | substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 | substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - FUNC_TIME_MONITOR2
p=1 | - FUNC_TIME_MONITOR2_inner
p=1 | Total times:
p=1 | FUNC_TIME_MONITOR2:
p=1 | MinOverProcs: 4.05312e-06
p=1 | MeanOverProcs: 4.6889e-06
p=1 | MaxOverProcs: 5.00679e-06
p=1 | MeanOverCallCounts: 4.6889e-06
p=1 | FUNC_TIME_MONITOR2_inner:
p=1 | MinOverProcs: 0
p=1 | MeanOverProcs: 0
p=1 | MaxOverProcs: 0
p=1 | MeanOverCallCounts: 0
p=1 | Call counts:
p=1 | FUNC_TIME_MONITOR2:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 | FUNC_TIME_MONITOR2_inner:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 |
p=1 | substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 | substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (0.0011 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639
p=1 |
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
p=1 |
p=1 | Testing intersection of timers:
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | Testing union of timers:
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 | [FAILED] (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711
p=1 |
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808
p=1 |
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869
p=1 |
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 |
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | [FAILED] (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951
p=1 |
p=1 |
p=1 | The following tests FAILED:
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
p=1 | 1. TimeMonitor_enableTimer_UnitTest ...
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ...
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
p=1 |
p=1 | Total Time: 2.91 sec
p=1 |
p=1 | Summary: total = 9, run = 9, passed = 1, failed = 8
p=1 |
p=1 | End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[51238,1],2]
Exit code: 1
--------------------------------------------------------------------------
Looking at this output, I just realized that this test likely fails in other builds on other platforms but it is being ignored!
Looking at the all of the runs of this test today at:
and looking at some of the results, it is definitely also failing on other builds and platforms. For example, for the "Clean" build Linux-gcc-4.8.4-MPI_Release_gcc_4.8.4_openmpi_1.8.7_DEV today the output at:
shows:
***
*** Unit test suite ...
***
Sorting tests by group name then by the order they were added ... (time = 5.96e-06)
Running unit tests ...
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ... [Passed] (0.00111 sec)
1. TimeMonitor_enableTimer_UnitTest ... [Passed] (0.000257 sec)
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00166 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ... [Passed] (0.00165 sec)
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ... [Passed] (0.000447 sec)
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ... [Passed] (0.000245 sec)
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ... [Passed] (0.000167 sec)
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ... [Passed] (0.000155 sec)
8. TimeMonitor_IgnoreMissingTimers_UnitTest ... [Passed] (1 sec)
Total Time: 1.01 sec
Summary: total = 9, run = 9, passed = 9, failed = 0
End Result: TEST PASSED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[31998,1],1]
Exit code: 1
--------------------------------------------------------------------------
That is bad.
The problem is that this unit test is not using the parallel unit test driver. It is using the standard serial driver.
I just pushed the branch 2487-time-monitor-ride-white to my GitHub fork with the commit f395fe5:
commit f395fe51761658b8326b2473f075b304bef97a42
Author: Roscoe A. Bartlett <[email protected]>
Date: Fri Mar 30 17:43:36 2018 -0600
Use parallel unit test driver to show non-root rank failures (#2487)
This test actally fails all of the unit tests on the non-root rank. Changing
to use the parallel unit test driver makes this clear.
M packages/teuchos/comm/test/Time/CMakeLists.txt
Now when I run the test on 'white' using:
$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
"--teuchos-suppress-startup-banner"
it fails a showns the output shown below.
detailed test output using parallel test driver (click to expand)
NVIDIA: no NVIDIA devices found
***
*** Unit test suite ...
***
Sorting tests by group name then by the order they were added ... (time = 5.01e-06)
Running unit tests ...
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
=============================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------
FUNC_TIME_MONITOR1 0 (1) 0 (1) 0 (1) 0 (1)
=============================================================================================
substr_i = 350 != std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: compact
Number of processes: 3
Time unit: s
Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
Timer names: [FUNC_TIME_MONITOR1]
Total times: {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
substr_i = 193 != std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: spacious
Number of processes: 3
Time unit: s
Statistics collected:
- MinOverProcs
- MeanOverProcs
- MaxOverProcs
- MeanOverCallCounts
Timer names:
- FUNC_TIME_MONITOR1
Total times:
FUNC_TIME_MONITOR1:
MinOverProcs: 0
MeanOverProcs: 0
MaxOverProcs: 0
MeanOverCallCounts: 0
Call counts:
FUNC_TIME_MONITOR1:
MinOverProcs: 1
MeanOverProcs: 1
MaxOverProcs: 1
MeanOverCallCounts: 1
substr_i = 208 != std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.00142 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137
1. TimeMonitor_enableTimer_UnitTest ...
Testing TimeMonitor's disableTimer() and enableTimer() methods
Creating timers
Running all timers without disabling any
timers[i]->numCalls() = 5 == numTrials = 5 : passed
timers[i]->numCalls() = 5 == numTrials = 5 : passed
timers[i]->numCalls() = 5 == numTrials = 5 : passed
Disabling one timer and trying again
Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
timers[0]->numCalls() = 5 == numTrials = 5 : passed
timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
Reenabling the timer and trying again
Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
Test that summarize() reports enabled and disabled timers
Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
=============================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------
Timer 0 0.001417 (10) 0.001418 (10) 0.001418 (10) 0.0001418 (10)
Timer 1 0.002124 (15) 0.002132 (15) 0.002146 (15) 0.0001421 (15)
Timer 2 0.002129 (15) 0.00213 (15) 0.002131 (15) 0.000142 (15)
=============================================================================================
substr_i = 344 != std::string::npos = 18446744073709551615 : passed
substr_i = 432 != std::string::npos = 18446744073709551615 : passed
substr_i = 520 != std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.00718 sec) TimeMonitor_enableTimer_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202
2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.00963 sec)
3. TimeMonitor_TimerLabelFiltering_UnitTest ...
=============================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------
Foo: timer 1 0.0004249 (3) 0.0004254 (3) 0.0004261 (3) 0.0001418 (3)
Foo: timer 2 0.0004258 (3) 0.0004259 (3) 0.0004261 (3) 0.000142 (3)
Foo: timer 3 0.0004251 (3) 0.0004254 (3) 0.0004261 (3) 0.0001418 (3)
=============================================================================================
pos = 346 != std::string::npos = 18446744073709551615 : passed
pos = 436 != std::string::npos = 18446744073709551615 : passed
pos = 526 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: compact
Number of processes: 3
Time unit: s
Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
Timer names: ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
Total times: {"Foo: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}, "Foo: timer 2": {MinOverProcs: 0.000425816, MeanOverProcs: 0.000425895, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141965}, "Foo: timer 3": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}}
Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
pos = 194 != std::string::npos = 18446744073709551615 : passed
pos = 210 != std::string::npos = 18446744073709551615 : passed
pos = 226 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: spacious
Number of processes: 3
Time unit: s
Statistics collected:
- MinOverProcs
- MeanOverProcs
- MaxOverProcs
- MeanOverCallCounts
Timer names:
- "Foo: timer 1"
- "Foo: timer 2"
- "Foo: timer 3"
Total times:
"Foo: timer 1":
MinOverProcs: 0.000424862
MeanOverProcs: 0.000425418
MaxOverProcs: 0.000426054
MeanOverCallCounts: 0.000141806
"Foo: timer 2":
MinOverProcs: 0.000425816
MeanOverProcs: 0.000425895
MaxOverProcs: 0.000426054
MeanOverCallCounts: 0.000141965
"Foo: timer 3":
MinOverProcs: 0.0004251
MeanOverProcs: 0.000425418
MaxOverProcs: 0.000426054
MeanOverCallCounts: 0.000141806
Call counts:
"Foo: timer 1":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
"Foo: timer 2":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
"Foo: timer 3":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
pos = 209 != std::string::npos = 18446744073709551615 : passed
pos = 228 != std::string::npos = 18446744073709551615 : passed
pos = 247 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
=============================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------
Bar: timer 1 0.0004249 (3) 0.0004249 (3) 0.0004251 (3) 0.0001416 (3)
Bar: timer 2 0.0004251 (3) 0.0004255 (3) 0.0004261 (3) 0.0001418 (3)
=============================================================================================
pos = 346 != std::string::npos = 18446744073709551615 : passed
pos = 436 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: compact
Number of processes: 3
Time unit: s
Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
Timer names: ["Bar: timer 1", "Bar: timer 2"]
Total times: {"Bar: timer 1": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000424941, MaxOverProcs: 0.0004251, MeanOverCallCounts: 0.000141647}, "Bar: timer 2": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425498, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141833}}
Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
pos = 194 != std::string::npos = 18446744073709551615 : passed
pos = 210 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: spacious
Number of processes: 3
Time unit: s
Statistics collected:
- MinOverProcs
- MeanOverProcs
- MaxOverProcs
- MeanOverCallCounts
Timer names:
- "Bar: timer 1"
- "Bar: timer 2"
Total times:
"Bar: timer 1":
MinOverProcs: 0.000424862
MeanOverProcs: 0.000424941
MaxOverProcs: 0.0004251
MeanOverCallCounts: 0.000141647
"Bar: timer 2":
MinOverProcs: 0.0004251
MeanOverProcs: 0.000425498
MaxOverProcs: 0.000426054
MeanOverCallCounts: 0.000141833
Call counts:
"Bar: timer 1":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
"Bar: timer 2":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
pos = 209 != std::string::npos = 18446744073709551615 : passed
pos = 228 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
=============================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
---------------------------------------------------------------------------------------------
Baz: timer 1 0.0004251 (3) 0.0004257 (3) 0.0004261 (3) 0.0001419 (3)
=============================================================================================
pos = 346 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: compact
Number of processes: 3
Time unit: s
Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
Timer names: ["Baz: timer 1"]
Total times: {"Baz: timer 1": {MinOverProcs: 0.0004251, MeanOverProcs: 0.000425657, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141886}}
Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
pos = 194 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: spacious
Number of processes: 3
Time unit: s
Statistics collected:
- MinOverProcs
- MeanOverProcs
- MaxOverProcs
- MeanOverCallCounts
Timer names:
- "Baz: timer 1"
Total times:
"Baz: timer 1":
MinOverProcs: 0.0004251
MeanOverProcs: 0.000425657
MaxOverProcs: 0.000426054
MeanOverCallCounts: 0.000141886
Call counts:
"Baz: timer 1":
MinOverProcs: 3
MeanOverProcs: 3
MaxOverProcs: 3
MeanOverCallCounts: 3
pos = 209 != std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.00819 sec) TimeMonitor_TimerLabelFiltering_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
FUNC_TIME_MONITOR2 4.053e-06 (1) 4.371e-06 (1) 5.007e-06 (1) 4.371e-06 (1)
FUNC_TIME_MONITOR2_inner 0 (1) 0 (1) 0 (1) 0 (1)
=====================================================================================================
substr_i = 378 != std::string::npos = 18446744073709551615 : passed
substr_inner_i = 480 != std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: compact
Number of processes: 3
Time unit: s
Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
Timer names: [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
Total times: {FUNC_TIME_MONITOR2: {MinOverProcs: 4.05312e-06, MeanOverProcs: 4.37101e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.37101e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
substr_i = 193 != std::string::npos = 18446744073709551615 : passed
substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
# Teuchos::TimeMonitor report
---
Output mode: spacious
Number of processes: 3
Time unit: s
Statistics collected:
- MinOverProcs
- MeanOverProcs
- MaxOverProcs
- MeanOverCallCounts
Timer names:
- FUNC_TIME_MONITOR2
- FUNC_TIME_MONITOR2_inner
Total times:
FUNC_TIME_MONITOR2:
MinOverProcs: 4.05312e-06
MeanOverProcs: 4.37101e-06
MaxOverProcs: 5.00679e-06
MeanOverCallCounts: 4.37101e-06
FUNC_TIME_MONITOR2_inner:
MinOverProcs: 0
MeanOverProcs: 0
MaxOverProcs: 0
MeanOverCallCounts: 0
Call counts:
FUNC_TIME_MONITOR2:
MinOverProcs: 1
MeanOverProcs: 1
MaxOverProcs: 1
MeanOverCallCounts: 1
FUNC_TIME_MONITOR2_inner:
MinOverProcs: 1
MeanOverProcs: 1
MaxOverProcs: 1
MeanOverCallCounts: 1
substr_i = 208 != std::string::npos = 18446744073709551615 : passed
substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.00127 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
Testing intersection of timers:
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer A 0.09191 (3) 0.09193 (3) 0.09195 (3) 0.03064 (3)
=====================================================================================================
substr_i = 362 != std::string::npos = 18446744073709551615 : passed
substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
Testing union of timers:
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer A 0.09191 (3) 0.09193 (3) 0.09195 (3) 0.03064 (3)
Timer B 0 (0) 0.02421 (0.3333) 0.07262 (1) 0.07262 (0.3333)
=====================================================================================================
substr_i = 365 != std::string::npos = 18446744073709551615 : passed
substr_i = 454 != std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer A 0.2178 (3) 0.2179 (3) 0.2179 (3) 0.07263 (3)
Timer B 0.1419 (1) 0.1419 (1) 0.1419 (1) 0.1419 (1)
=====================================================================================================
substr_i = 362 != std::string::npos = 18446744073709551615 : passed
substr_i = 448 != std::string::npos = 18446744073709551615 : passed
substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer A 0 (0) 0.07262 (1) 0.2179 (3) 0.07262 (1)
Timer B 1.135 (1) 1.135 (1) 1.135 (1) 1.135 (1)
=====================================================================================================
substr_i = 362 != std::string::npos = 18446744073709551615 : passed
substr_i = 448 != std::string::npos = 18446744073709551615 : passed
substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869
8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer Z 0 (0) 0.6668 (0.6667) 1 (1) 1 (0.6667)
=====================================================================================================
substr_i = 378 != std::string::npos = 18446744073709551615 : passed
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer Z 0 (0) 0.6668 (0.6667) 1 (1) 1 (0.6667)
=====================================================================================================
substr_i = 378 != std::string::npos = 18446744073709551615 : passed
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer Z 1 (1) 1 (1) 1 (1) 1 (1)
=====================================================================================================
substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
=====================================================================================================
TimeMonitor results over 3 processors
Timer Name MinOverProcs MeanOverProcs MaxOverProcs MeanOverCallCounts
-----------------------------------------------------------------------------------------------------
Timer Z 1 (1) 1 (1) 1 (1) 1 (1)
=====================================================================================================
substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
[FAILED] (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951
The following tests FAILED:
0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
1. TimeMonitor_enableTimer_UnitTest ...
3. TimeMonitor_TimerLabelFiltering_UnitTest ...
4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
Total Time: 2.91 sec
Summary: total = 9, run = 9, passed = 1, failed = 8
End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[59811,1],0]
Exit code: 1
--------------------------------------------------------------------------
This shows in all the unit tests:
NOTE: Global reduction shows failures on other processes!
(rerun with --output-to-root-rank-only=-1 to see output
from other processes to see what process failed!)
NOTE: Unit test failed on processes = {1, 2}
(rerun with --output-to-root-rank-only=<procID> to see output
from individual processes where the unit test is failing!)
If I run with:
$ mpiexec "-np" "3" "-map-by" "socket:PE=4" ./TeuchosComm_TimeMonitor_UnitTests.exe \
"--teuchos-suppress-startup-banner" --output-show-proc-rank --output-to-root-rank-only=1
it produces the output show below.
detailed test output on rank=1 with parallel test driver (click to expand)
NVIDIA: no NVIDIA devices found
p=1 |
p=1 | ***
p=1 | *** Unit test suite ...
p=1 | ***
p=1 |
p=1 |
p=1 | Sorting tests by group name then by the order they were added ... (time = 6.2e-06)
p=1 |
p=1 | Running unit tests ...
p=1 |
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: [FUNC_TIME_MONITOR1]
p=1 | Total times: {FUNC_TIME_MONITOR1: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 | Call counts: {FUNC_TIME_MONITOR1: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |
p=1 | substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - FUNC_TIME_MONITOR1
p=1 | Total times:
p=1 | FUNC_TIME_MONITOR1:
p=1 | MinOverProcs: 0
p=1 | MeanOverProcs: 0
p=1 | MaxOverProcs: 0
p=1 | MeanOverCallCounts: 0
p=1 | Call counts:
p=1 | FUNC_TIME_MONITOR1:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 |
p=1 | substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.00136 sec) TimeMonitor_FUNC_TIME_MONITOR_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:137
p=1 |
p=1 | 1. TimeMonitor_enableTimer_UnitTest ...
p=1 | Testing TimeMonitor's disableTimer() and enableTimer() methods
p=1 | Creating timers
p=1 | Running all timers without disabling any
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 5 == numTrials = 5 : passed
p=1 | Disabling one timer and trying again
p=1 | Test that code {TimeMonitor::disableTimer ("Timer 0");} does not throw : passed
p=1 | timers[0]->numCalls() = 5 == numTrials = 5 : passed
p=1 | timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | timers[i]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | Reenabling the timer and trying again
p=1 | Test that code {TimeMonitor::enableTimer ("Timer 0");} does not throw : passed
p=1 | timers[0]->numCalls() = 10 == 2*numTrials = 10 : passed
p=1 | timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 | timers[i]->numCalls() = 15 == 3*numTrials = 15 : passed
p=1 | Test that summarize() reports enabled and disabled timers
p=1 | Test that code {TimeMonitor::disableTimer ("Timer 1");} does not throw : passed
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.00686 sec) TimeMonitor_enableTimer_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:202
p=1 |
p=1 | 2. TimeMonitor_YamlLabelQuoting_UnitTest ... [Passed] (0.0096 sec)
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ...
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Foo: timer 1", "Foo: timer 2", "Foo: timer 3"]
p=1 | Total times: {"Foo: timer 1": {MinOverProcs: 0.000428677, MeanOverProcs: 0.00043265, MaxOverProcs: 0.000438213, MeanOverCallCounts: 0.000144217}, "Foo: timer 2": {MinOverProcs: 0.000424147, MeanOverProcs: 0.000424941, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141647}, "Foo: timer 3": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425259, MaxOverProcs: 0.000425816, MeanOverCallCounts: 0.000141753}}
p=1 | Call counts: {"Foo: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Foo: timer 3": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 226 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Foo: timer 1"
p=1 | - "Foo: timer 2"
p=1 | - "Foo: timer 3"
p=1 | Total times:
p=1 | "Foo: timer 1":
p=1 | MinOverProcs: 0.000428677
p=1 | MeanOverProcs: 0.00043265
p=1 | MaxOverProcs: 0.000438213
p=1 | MeanOverCallCounts: 0.000144217
p=1 | "Foo: timer 2":
p=1 | MinOverProcs: 0.000424147
p=1 | MeanOverProcs: 0.000424941
p=1 | MaxOverProcs: 0.000426054
p=1 | MeanOverCallCounts: 0.000141647
p=1 | "Foo: timer 3":
p=1 | MinOverProcs: 0.000424862
p=1 | MeanOverProcs: 0.000425259
p=1 | MaxOverProcs: 0.000425816
p=1 | MeanOverCallCounts: 0.000141753
p=1 | Call counts:
p=1 | "Foo: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Foo: timer 2":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Foo: timer 3":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 247 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Bar: timer 1", "Bar: timer 2"]
p=1 | Total times: {"Bar: timer 1": {MinOverProcs: 0.000426054, MeanOverProcs: 0.000426054, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000142018}, "Bar: timer 2": {MinOverProcs: 0.000424862, MeanOverProcs: 0.000425418, MaxOverProcs: 0.000426054, MeanOverCallCounts: 0.000141806}}
p=1 | Call counts: {"Bar: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}, "Bar: timer 2": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 210 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Bar: timer 1"
p=1 | - "Bar: timer 2"
p=1 | Total times:
p=1 | "Bar: timer 1":
p=1 | MinOverProcs: 0.000426054
p=1 | MeanOverProcs: 0.000426054
p=1 | MaxOverProcs: 0.000426054
p=1 | MeanOverCallCounts: 0.000142018
p=1 | "Bar: timer 2":
p=1 | MinOverProcs: 0.000424862
p=1 | MeanOverProcs: 0.000425418
p=1 | MaxOverProcs: 0.000426054
p=1 | MeanOverCallCounts: 0.000141806
p=1 | Call counts:
p=1 | "Bar: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 | "Bar: timer 2":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 228 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: ["Baz: timer 1"]
p=1 | Total times: {"Baz: timer 1": {MinOverProcs: 0.000425577, MeanOverProcs: 0.000425975, MaxOverProcs: 0.000426292, MeanOverCallCounts: 0.000141992}}
p=1 | Call counts: {"Baz: timer 1": {MinOverProcs: 3, MeanOverProcs: 3, MaxOverProcs: 3, MeanOverCallCounts: 3}}
p=1 |
p=1 | pos = 194 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - "Baz: timer 1"
p=1 | Total times:
p=1 | "Baz: timer 1":
p=1 | MinOverProcs: 0.000425577
p=1 | MeanOverProcs: 0.000425975
p=1 | MaxOverProcs: 0.000426292
p=1 | MeanOverCallCounts: 0.000141992
p=1 | Call counts:
p=1 | "Baz: timer 1":
p=1 | MinOverProcs: 3
p=1 | MeanOverProcs: 3
p=1 | MaxOverProcs: 3
p=1 | MeanOverCallCounts: 3
p=1 |
p=1 | pos = 209 != std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | pos = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.00819 sec) TimeMonitor_TimerLabelFiltering_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:430
p=1 |
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 | substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: compact
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected: [MinOverProcs, MeanOverProcs, MaxOverProcs, MeanOverCallCounts]
p=1 | Timer names: [FUNC_TIME_MONITOR2, FUNC_TIME_MONITOR2_inner]
p=1 | Total times: {FUNC_TIME_MONITOR2: {MinOverProcs: 3.8147e-06, MeanOverProcs: 4.60943e-06, MaxOverProcs: 5.00679e-06, MeanOverCallCounts: 4.60943e-06}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 0, MeanOverProcs: 0, MaxOverProcs: 0, MeanOverCallCounts: 0}}
p=1 | Call counts: {FUNC_TIME_MONITOR2: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}, FUNC_TIME_MONITOR2_inner: {MinOverProcs: 1, MeanOverProcs: 1, MaxOverProcs: 1, MeanOverCallCounts: 1}}
p=1 |
p=1 | substr_i = 193 != std::string::npos = 18446744073709551615 : passed
p=1 | substr_inner_i = 213 != std::string::npos = 18446744073709551615 : passed
p=1 | # Teuchos::TimeMonitor report
p=1 | ---
p=1 | Output mode: spacious
p=1 | Number of processes: 3
p=1 | Time unit: s
p=1 | Statistics collected:
p=1 | - MinOverProcs
p=1 | - MeanOverProcs
p=1 | - MaxOverProcs
p=1 | - MeanOverCallCounts
p=1 | Timer names:
p=1 | - FUNC_TIME_MONITOR2
p=1 | - FUNC_TIME_MONITOR2_inner
p=1 | Total times:
p=1 | FUNC_TIME_MONITOR2:
p=1 | MinOverProcs: 3.8147e-06
p=1 | MeanOverProcs: 4.60943e-06
p=1 | MaxOverProcs: 5.00679e-06
p=1 | MeanOverCallCounts: 4.60943e-06
p=1 | FUNC_TIME_MONITOR2_inner:
p=1 | MinOverProcs: 0
p=1 | MeanOverProcs: 0
p=1 | MaxOverProcs: 0
p=1 | MeanOverCallCounts: 0
p=1 | Call counts:
p=1 | FUNC_TIME_MONITOR2:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 | FUNC_TIME_MONITOR2_inner:
p=1 | MinOverProcs: 1
p=1 | MeanOverProcs: 1
p=1 | MaxOverProcs: 1
p=1 | MeanOverCallCounts: 1
p=1 |
p=1 | substr_i = 208 != std::string::npos = 18446744073709551615 : passed
p=1 | substr_inner_i = 231 != std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.00113 sec) TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:639
p=1 |
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
p=1 |
p=1 | Testing intersection of timers:
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | Testing union of timers:
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.165 sec) TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:711
p=1 |
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (0.36 sec) TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:808
p=1 |
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (1.35 sec) TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:869
p=1 |
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 |
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 |
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 |
p=1 | substr_i = 18446744073709551615 == std::string::npos = 18446744073709551615 : passed
p=1 | NOTE: Unit test failed on processes = {1, 2}
p=1 | (rerun with --output-to-root-rank-only=<procID> to see output
p=1 | from individual processes where the unit test is failing!)
p=1 | [FAILED] (1 sec) TimeMonitor_IgnoreMissingTimers_UnitTest
p=1 | Location: /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:951
p=1 |
p=1 |
p=1 | The following tests FAILED:
p=1 | 0. TimeMonitor_FUNC_TIME_MONITOR_UnitTest ...
p=1 | 1. TimeMonitor_enableTimer_UnitTest ...
p=1 | 3. TimeMonitor_TimerLabelFiltering_UnitTest ...
p=1 | 4. TimeMonitor_FUNC_TIME_MONITOR_tested_UnitTest ...
p=1 | 5. TimeMonitor_SUMMARIZE_diffTimerSets_UnitTest ...
p=1 | 6. TimeMonitor_FILTER_ZERO_TIMERS_sameParallelCallCounts_UnitTest ...
p=1 | 7. TimeMonitor_FILTER_ZERO_TIMERS_differentParallelCallCounts_UnitTest ...
p=1 | 8. TimeMonitor_IgnoreMissingTimers_UnitTest ...
p=1 |
p=1 | Total Time: 2.91 sec
p=1 |
p=1 | Summary: total = 9, run = 9, passed = 1, failed = 8
p=1 |
p=1 | End Result: TEST FAILED
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[63321,1],2]
Exit code: 1
--------------------------------------------------------------------------
*
The unique failures showin in that output are:
p=1 | pos = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:512
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1019
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:1030
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:156
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:291
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:657
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:776
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:796
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:798
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:852
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:854
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:936
p=1 | substr_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:938
p=1 | substr_inner_i = 18446744073709551615 != std::string::npos = 18446744073709551615 : FAILED ==> /home/rabartl/Trilinos.base/Trilinos/packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp:659
I think that the problem is that the time monitor really only collects data to the root process. Therefore, the other processes should not be checking the ouptut at all. Instead, they should just not crash.
The PR #2488 fixes this test for all platforms.
So it turns out that this test was mess up on a bunch of builds and platforms, it was just that this Power8 system actually caused the test to hang in some cases due flaky MPI implementation.
This is an example of why we need to be testing on these platforms.
@bartlettroscoe YOU'RE AWESOME thanks for finding this & fixing it! :D
The test TeuchosComm_TimeMonitor_UnitTests_MPI_3 ran and passed in all 18 ATDM builds where it ran today as shown at:
We have not seen it fail since a481846 was merged.
I am confident this is fixed. I am closing as complete.