When run locally, this completes relatively quickly. But in some number of runs, it seems to hang forever, triggering a 360 second test timeout in travis.
tests/python/pants_test/base:exception_sink_integration .....Command '['/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/bin/python3.6', '/Users/travis/build/pantsbuild/pants/.pants.d/test/pytest-prep/CPython-3.6.5/929c23cae3b600b495a5d319ae6c47e8b41a2667', '-c', '/dev/null', '-ocache_dir=/Users/travis/build/pantsbuild/pants/.pants.d/test/pytest/.pytest_cache', '--junitxml', '/Users/travis/build/pantsbuild/pants/.pants.d/test/pytest/tests.python.pants_test.base.exception_sink_integration/junitxml/TEST-tests.python.pants_test.base.exception_sink_integration.xml', '--confcutdir', '/Users/travis/build/pantsbuild/pants', '--continue-on-collection-errors', '--color', 'yes', '-q', '-rfa', '--rootdir', '/Users/travis/build/pantsbuild/pants', '-p', '__pants_backend_python_tasks_pytest_prep_pytest_plugin__', '--pants-sources-map-path', '/Users/travis/build/pantsbuild/pants/.pants.d/test/pytest/tmpg79dyv4r/sources_map.json', '/Users/travis/build/pantsbuild/pants/.pants.d/pyprep/sources/49f3f1d9d9dc377d027f9fb364db7fffbb6a5ab9/pants_test/base/test_exception_sink_integration.py']' timed out after 360 seconds
Seen in #8123.
Seen again in #8099.
Seen again on master.
Seen in #8143.
Seen again in master.
This is probably our highest priority flaky test, as it seems to just hang fairly frequently.
Seen again on the OSX shard in #8153. The timeout for this one is now 540, and it takes about 30 seconds to run locally on OSX, so there is something strange happening. Maybe we're being forced to re-boostrap or recompile? Or it is just hanging.
Seen again in both #8165 and #8166 on the OSX shard.
Seen in #8150.
Seen in #8192. It's not the first time I see it, but it is the first time I comment here. Overall, I think there is no doubt this one regularly exceeds its timeout.
Seen again in master.
Seen again in #8201.
Seen again in #8221 on the OSX shard.
Seen in #8223 in OSX platform-specific tests shard
Seen again in OSX platform-specific tests - time out of 540.
@stuhood we should probably lower the timeout to less than 540 because this appears to be an issue with hanging forever? That way it eagerly fails.
I do not think this is an issue with trying to re-bootstrap ./pants. Now that https://github.com/pantsbuild/pants/pull/8183 has landed, we only ever use ./pants.pex for integration tests so I don't think this would even be possible.
Seen in #8233 in OSX platform-specific tests shard
Seen in #8276 in OSX platform-specific tests shard
Seen again in #8113.
Seen again in #8088.
Seen again in https://github.com/pantsbuild/pants/pull/8406
Seen again in https://github.com/pantsbuild/pants/pull/8452
I'm looking into this today. I agree with Stu that this is likely our highest priority flake.
Locally, I ran a script to repeat until failure. First run, it took 71 attempts. Second run, it took 131 attempts to fail. This translates to 1.3% of runs failing and 0.7% of runs failing, respectively. In CI, it seems the number is closer to 20%. I'm going to try debugging in CI instead.
On a successful OSX shard, the test takes 5 minutes to run. Locally on OSX, it takes 30-35 seconds. Something seems to be going on with Travis.
These were the individual tests that took longer than local execution:
prints_traceback_on_sigusr2keyboardinterruptdumps_traceback_on_sigabrtdumps_logs_on_signalEDIT: the common denominator for all of these tests is _make_waiter_handle:
I'm monitoring this one to decide what to do.
Most helpful comment
Seen again in master.
This is probably our highest priority flaky test, as it seems to just hang fairly frequently.