Yarp: Random crashes in unit tests

Created on 19 Aug 2016  路  16Comments  路  Source: robotology/yarp

There are a few tests that sometimes fail with random crashes. I'll use this issue to keep track of them

Broken

YARP_OS

OS::PortTest (#621)
  • [ ] Exception: SegFault

    • 2016/08/08 - master - 2442.26 - ***Exception: SegFault 3.67 sec
    • 2016/07/30 - devel - 2360.23 - ***Exception: SegFault 3.59 sec
OS::PublisherTest
  • [ ] Exception: Other

    • 2016/11/09 - devel - 2900.6 - ***Exception: Other 0.26 sec
    • 2016/08/07 - devel - 2502.6 - ***Exception: Other 0.80 sec
OS::RateThreadTest
  • [ ] Exception: SegFault

    • 2017/01/10 - master - 3407.18 - *Exception: SegFault 6.78 sec
    • 2017/04/21 - master - 3869.8 - *Exception: SegFault 6.78 sec
    • 2017/10/27 - devel - traversaro 24.13 - *Exception: SegFault 9.64 sec

      - 2018/03/30 - devel - 5588.10 - *Exception: SegFault 9.63 sec

YARP_dev

dev::PolyDriverTest::Valgrind::MemCheck
  • [ ] Timeout

    • 2016/09/07 - master - 2625.26 - ***Timeout 1500.00 sec
    • 2016/08/18 - master - 2521.28 - ***Timeout 1500.00 sec
    • 2016/12/03 - devel - 3039.22 - ***Timeout 1500.00 sec
    • 2016/12/14 - master - 3162.24 - ***Timeout 1500.00 sec
  • [ ] No output received

    • 2016/08/05 - devel - 2429.28 - No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
dev::ControlBoardRemapperTest (#881)
  • [ ] Exception: Other

  • [ ] Exception: SegFault

    • 2016/08/19 - devel - 2533.10 - ***Exception: SegFault 1.30 sec
    • 2016/11/24 - devel - 3016.16 - ***Exception: SegFault 1.64 sec
    • 2016/12/08 - master - 3133.6 - ***Exception: SegFault 0.68 sec
    • 2016/12/12 - master - 3149.5 - ***Exception: SegFault 2.15 sec
    • 2018/09/02 - master - 6425.6 - ***Exception: SegFault 1.78 sec
  • [ ] Failed (yarp: could not create udp server)

    • 2016/09/03 - cleanup-gsl - 2616.29 - ***Failed 4.63 sec
    • 2016/12/07 - master - 3130.25 - ***Failed 0.13 sec
    • 2016/12/07 - devel - 3132.26 - ***Failed 0.15 sec
  • [ ] Timeout

    • 2016/08/07 - devel - 2502.22 - ***Timeout 1500.00 sec
    • 2016/10/17 - devel - 2966.17 - ***Timeout 1500.00 sec
    • 2016/12/12 - master - 3147.1 - ***Timeout 1500.00 sec
    • 2016/12/20 - master - 3236.23 - ***Timeout 1500.00 sec
dev::ControlBoardRemapperTest::Valgrind::MemCheck
  • [ ] Failed

    • 2018/01/22 - devel - 5232.18 - ***Failed 7.39 sec

    • 2016/12/07 - master - 3130.25 - ***Failed 3.70 sec

    • 2016/12/07 - devel - 3132.26 - ***Failed 5.18 sec

    • 2018/04/9 - master - 5632.18 - ***Failed 7.40 sec

dev::FrameTransformClientTest
  • [ ] Failed (cids.size() == 0 (0) == true (1))

    • 2016/12/14 - master - 3162.24 - ***Failed 4.06 sec
dev::MultipleAnalogSensorsInterfacesTest::Valgrind::MemCheck
  • [ ] Failed

    • 2018/08/22 - master - 6040.19 - ***Failed 3.19 sec
dev::FrameTransformClientTest::Valgrind::MemCheck (#953)
  • 2016/10/26 - devel - 2851.24 - ***Failed 6.14 sec
  • 2016/09/21 - devel - 2642.30 - ***Failed 5.97 sec

YARP_dev

math::RandTest ( https://github.com/robotology/yarp/issues/1478 )
  • [ ] Failed

Others

idl_thrift_demo_test
  • [ ] failed

    • 2016/08/02 - devel - 2384.23 - ***Failed 1.48 sec
integration::rpc
  • [ ] Failed (Cannot make connection)

    • 2016/08/19 - master - 2535.19 - ***Failed 12.97 sec (Cannot make connection)

Fixed (?)

  • [x] OS::NodeTest

  • [x] dev::FrameTransformClientTest

Continuous Integration Library - YARP_dev Library - YARP_os Tests Bug Low Minor

All 16 comments

Edit: Moved to the main list

Edit: Moved to the main list

The OS::NodeTest seems to be failing quite often recently on builds 11 (gcc) and 15 (clang), i.e. macOS, TRAVIS_WITH_ACE=false, TRAVIS_WITH_CXX11=false. Interesting fact, it happens both on master and devel, therefore it should not be related to the latest changes for ROS...

The OS::NodeTest seems to be failing quite often recently on builds 11 (gcc) and 15 (clang), i.e. macOS, TRAVIS_WITH_ACE=false, TRAVIS_WITH_CXX11=false. Interesting fact, it happens both on master and devel, therefore it should not be related to the latest changes for ROS...

I just saw it on a build 16 (clang, TRAVIS_WITH_ACE=false, TRAVIS_WITH_CXX11=true), but I had to restart the build since it is on a pull request)

At this point I think it might be something on macOS + SKIP_ACE enabled, I've never seen this only recently and quite often.
Perhaps it is related to one of these commits that modified macOS stuff:

59c42ae Fixed MCAST on macOS
8c72b50 Fixed UDP os macOS
e075ff2 Updated OSX System Info
7673218 Added draft of ProcessInfo
11b0de7 Initial implementation of SystemInfo for OSX
75c8318 YARP_OS: Fix PlatformThread for OSX, no ACE, no c++11

@traversaro @francesco-romano Can you please try to riproduce this on a Mac and see if reverting one or more of these commits it disappears?

Tests ( on my machine ):
In master & devel, it is possible to reproduce the failure with approx 20 runs with ctest -R NodeTest --output-on-failure --repeat-until-fail 100. Going back in time:

  • 75c8318 Fails in the same way. Commits before it fail to compile.
  • v2.3.65 , v2.3.66 Fail to compile.
  • v2.3.64 Fails in the same way.

Maybe we should move the macOS + SKIP_ACE builds to the allowed failures, this crash happens very often...

Ok, this is getting _REALLY_ annoying, I moved the macOS + SKIP_ACE builds to the allowed failures, I will re-enable them later if we find a fix. Anyway I don't think that the combo macOS + SKIP_ACE is really useful, we might also consider disabling these builds completely...

Yes, given the amount of work require by that, we could simply declare macOS + SKIP_ACE "unsupported".

the FrameTransformClientTest failure (the race condition one) should be fixed by the commit 20f8c792c0288878862159b0ac10c32055c529e6. I'm investigating on the memcheck one and they could have the same cause (and so this could be fixed as well).

@aerydna unfortunately the memory leak is not fixed yet, see 2910.21 and 2910.22

Just a thought: m_transform_storage is not initialized as nullptr in the FrameTransformClient constructor. Not sure whether this could cause trouble ..

Have we stopped updating this page, or have the random crash decreased over time? Most of them never occurred in 2017, apparently.

I'm afraid we just stopped updating the page... 馃挬

The OS::RateThreadTest segfault just happened on a build, updated the page.

New entry: dev::MultipleAnalogSensorsInterfacesTest::Valgrind::MemCheck, with 2018/08/22 - master - 6040.19 - ***Failed 3.19 sec .

Was this page helpful?
0 / 5 - 0 ratings