yarprobotinterface segfaults

Created on 17 Jul 2019  路  14Comments  路  Source: robotology/yarp

Describe the bug
Every so often the yarprobotinterface crashes on iCubGenova01.

To Reproduce
Unfortunately, this happens very rarely hence it is difficult to reproduce.

Do you think it could be an option to compile yarp on the pc104 using RelWithDebInfo and then use the useful tool coredumpctl to recover the dump the next time the crash happens?

Configuration (please complete the following information):

  • OS: Ubuntu 18.04
  • yarp version: 3.2.0 (7ee03f8e042d31dcad271b6782dc1de8398a8b12)

All 14 comments

@xEnVrE I think it's an option we'd definitely need to explore 馃憤

It crashes while running or on closure?

We should investigate it but usually the problems with yarprobotinterface are not yarprobotinterface fault.
Usually some devices crashes, recovering the dump could help to understand which is the buggy device.

It crashes while running or on closure?

While running.

We should investigate it but usually the problems with yarprobotinterface are not yarprobotinterface fault.
Usually some devices crashes, recovering the dump could help to understand which is the buggy device.

I have some hints regarding this. Different people has the same issue, i.e. the yarprobotinterface crashing, while closing/opening a cartesian interface. The typical scenario is that you are using some module that opened a cartesian interface (I mean through a PolyDriver), then you closes the module and everything is fine. When you try to re-open the module, the call to PolyDriver::open() gets stuck and doesn't return anymore. At that time, sometimes but not always, the yarprobotinterface has already crashed. I hope this can helps somehow to understand the problem.

We should investigate it but usually the problems with yarprobotinterface are not yarprobotinterface fault.

馃憤

Usually some devices crashes, recovering the dump could help to understand which is the buggy device.

Just out of sheer curiosity: why does the yarprobotinterface crash? Is this the intende behaviour? If so, why? Is it possible to "catch" errors, report them and safely counteract this?

cc @barbalberto

I think this is due to the fact the since it spawns devices as plugins, if one of the throws an exception or segfault the entire process crashes.

I honestly don't know the yarprobotinterface implementation, but the open and close of the devices could be wrapped by try catch.

But unfortunately it is not our silver bullet 馃槥
Which exception we try to catch? Catch all the exception with the sintax catch(...) is a bad practice.

Moreover: "C++ try-catch blocks only handle C++ exceptions. Errors like segmentation faults are lower-level, and try-catch ignores these events and behaves the same as if there was no try-catch block."

I see, so the problem is related to our implementations rather than some missing features.
It might be a good chance to take actions! 馃殌

As @Nicogene said, when the robotInterface crash is _always_ because one of the loaded plugins crashes.
Using coredump for post-mortem analysis is definitely the best way to go if the problem happens rarely.
First thing is to identify which plugin is crashing, and this idea can help.

Segfault can be catched, the same way we catch the CTRL+C inside RFmodules, but you have no information about who thrown the signal and why, so in a complex program as robotInterface I think it'll be not so useful, except for trying a graceful shutdown.

Different people has the same issue, i.e. the yarprobotinterface crashing, while closing/opening a cartesian interface. The typical scenario is that you are using some module that opened a cartesian interface (I mean through a PolyDriver), then you closes the module and everything is fine. When you try to re-open the module, the call to PolyDriver::open() gets stuck and doesn't return anymore. At that time, sometimes but not always, the yarprobotinterface has already crashed. I hope this can helps somehow to understand the problem.

One possibility is writing a simple stress test that open and closes a cartesian client at high frequency. If there are race conditions or similar, it may be possible that a stress test like that could make them more repeatable.

Today I have a similar issue with iTeenGenova01, the yarprobotinterface crashed at some point.
I was running the iKinGazeCtrl and the two iKinCartesianSolver-s and some other modules.

You can find the entire log attached here.

At some point some warnings started to appear, namely

2876    2603,221548 WARNING Performance warning: You are using positionMove commands at high rate (< 80  ms). Probably position control mode is not the right control mode to use.

After many messages like that, the last messages were

3114    3001,520266     yarp: Removing output from /icub/cartesianController/left_arm/state:o to /iolReachingCalibrationCollector/left/hand_fk:i
3115    3001,530339     yarp: Removing output from /icub/cartesianController/right_arm/state:o to /iolReachingCalibrationCollector/right/hand_fk:i
3116    3001,530391     terminate called after throwing an instance of 'std::bad_alloc'
3117    3001,530410       what():  std::bad_alloc

where /iolReachingCalibrationCollector/right/hand_fk:i and /iolReachingCalibrationCollector/left/hand_fk:i are ports opened by one of my modules.

The fact that those connections were removed might suggest that the cartesianController loaded within the yarprobotinterface had a failure?

@pattacini @traversaro @Nicogene

Is there any chance to know which joint of which part was controlled in position @ high rate?

Unfortunately I have no insight in the cartesianController :disappointed:

The Cartesian Control does not rely on position control but rather on position-direct control.
Was ARE running? Fingers' movements are controlled in position instead.

No, ARE was not running. I was just collecting data from some ports and not doing any control task in particular.

So, which component was requesting movements in position?

I was using the graze controller, but it relies on position-direct for the neck and mixed mode for the eyes.

Was this page helpful?
0 / 5 - 0 ratings