Describe the bug
On iCubGenova04, the IMU measurements estimated from the BOSCH IMU in the robot head and published on the port /icub/inertial are either zero or NaN.
After a preliminary analysis, we suspect that the yarprobotinterface somehow fails to communicate with the head IMU, receives a vector of zeros instead of the correct IMU values, and then these values impair the computations inside the dcm2rpy method.
The main problem is that dcm2rpy just prints a warning message saying that, being in a singular configuration, multiple solutions were found, and therefore one of them is taken. But this is not actually true, as the problem is that the original data describing the IMU orientation are not a valid rotation matrix or quaternion. Without an explicit error, the IMU port silently fails and streams NaN.
To Reproduce
It seems the problem occurs when turning the icub-head PC on and then starting the yarprobotinterface. It does not occur when one starts the yarprobotinterface without restarting also icub-head.
Screenshots
Data streamed from the IMU

Warning message in the logger

Logger when the failure occurred
imu_not_working.txt
no messages are printed besides the warning.
Configuration (please complete the following information):
cc @traversaro @prashanthr05 @GiulioRomualdi
yarp device name is: imuBosch_BNO055
The main problem is that dcm2rpy just prints a warning message saying that, being in a singular configuration, multiple solutions were found, and therefore one of them is taken. But this is not actually true, as the problem is that the original data describing the IMU orientation are not a valid rotation matrix or quaternion. Without an explicit error, the IMU port silently fails and streams NaN.
To be explicit, since the output data buffer is not being filled when we read from the IMU register through i2c, the output data buffer all remains zero.
Consequently, we use a quaternion representation for rotation before converting into DCM and RPY. For a meaningful rotation, it must be made sure that we pass an unit quaternion, which is being violated in case of zero data buffer.
This error should be handled properly.
This seems to be a problem in the sensor, for some reason. Given that this is apparently something that can happen, perhaps it could make sense to read the sensor once during the open method, and return false is the read fails or if the quaternion read does not have unit norm.
It could make sense to read the sensor once during the open method, and return false is the read fails or if the quaternion read does not have unit norm.
I agree. :+1:
I don't know if it is related but I noticed on iCubErzelli02 that the first yarprobotinterface launched after the power on of icub-head usually fails to open the imuBosch_BNO055, usually with the second trial it run smoothly.
I checked a bit the Linux kernel source code, and apparently an improved check on the correctness of our call to i2c_smbus_read_i2c_block_data that we do in https://github.com/robotology/yarp/blob/master/src/devices/imuBosch_BNO055/imuBosch_BNO055.cpp#L418 would be to check if the return value (the number of bytes read) is actually len, as done for example in:
https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/drivers/rtc/rtc-rx8010.c#L128 .
Another thing that we could look in improving is the initialization part in https://github.com/robotology/yarp/blob/24eb21146091e4cc3d5bba0856f1ab8fe9b202df/src/devices/imuBosch_BNO055/imuBosch_BNO055.cpp#L444 . There seems that there are several commands separated by rather long waits, but what is dictating the length of that weights, i.e. the SWITCHING_TIME parameter? If it is possible to check if the configuration has been correctly set, a better strategy may be to avoid the long wait, and just continuously check if the set was successful (with a timeout) and proceed to the next command when the configuration was correctly set.
An improved check on the correctness of our call to i2c_smbus_read_i2c_block_data that we do in https://github.com/robotology/yarp/blob/master/src/devices/imuBosch_BNO055/imuBosch_BNO055.cpp#L418 would be to check if the return value (the number of bytes read) is actually
len
:+1:
There seems that there are several commands separated by rather long waits, but what is dictating the length of that weights, i.e. the SWITCHING_TIME parameter? If it is possible to check if the configuration has been correctly set, a better strategy may be to avoid the long wait, and just continuously check if the set was successful (with a timeout) and proceed to the next command when the configuration was correctly set.
Honestly when I wrote the i2c part of the device I started copying-pasting the serial implementation and I left the delays, but I am not sure if they are necessary also for the i2c configuration. Maybe @barbalberto knows it better.
Dealing with wrong hw read is always painful because there is no perfect solution.
I checked a bit the Linux kernel source code, and apparently an improved check on the correctness of our call to i2c_smbus_read_i2c_block_data that we do in https://github.com/robotology/yarp/blob/master/src/devices/imuBosch_BNO055/imuBosch_BNO055.cpp#L418 would be to check if the return value (the number of bytes read) is actually len.
We may, but I think it does not solve the issue. If I understood the point here, the problem is that we feed the computation with a quaternion full of zeros. Even if we change the check, we still fall in a situation with an invalid read.
So, assume we get an invalid read, the question is how shall we handle it?
1) fill it with zeros -> no good for the reason of this issue
2) fill it with NaN?
3) keep last values -> no good because values will be used as real values, when they are not. This is the reason why we switched from serial to i2c in the first place.
The solution of run a first smoke test when the device starts is a valid approach if the problem arise only at startup, but if an invalid read happens after some time, how do we deal with it?
yarp::os::SystemClock::delaySystem(SWITCHING_TIME);
. There seems that there are several commands separated by rather long waits, but what is dictating the length of that weights, i.e. the SWITCHING_TIME parameter?
That is mandatory from device's datasheet.
So, assume we get an invalid read, the question is how shall we handle it?
My two cents:
IGenericSensor this would mean just returning false to the read call, while if we use also the new MAS interfaces (see https://github.com/robotology/yarp/issues/1854) the status of the sensor should be set to MAS_ERRORHow to handle the failures in the open or during runtime is up to the devices attached to the imuBosch_BNO055, I think that the duty of the imuBosch_BNO055 device is just to make sure that the error is correctly propagated.
Fine by me, I can do these change.
So the values shall not set to NaN or whatever? We just leave old values or whatever garbage cames out from the failing read? I have no preference, it is just to have a shared agreement on this.
We just leave old values or whatever garbage cames out from the failing read? I have no preference, it is just to have a shared agreement on this.
I would just not set anything in the read method, but just directly return false.
@barbalberto note that the devel version of the devices has been already ported to MAS, so if you want to change the device I suggest to modify that version, as YARP 3.2 is immenenent, thanks!
I would just not set anything in the read method, but just directly return false.
Just to be 100% sure I get what you mean: to not set anything means do not change the current vector, which means keeping the previous values.
Example: consider the previous read was successful, the vector is filled with data. In the following run the read fails and I just return an error. The vector will still hold old values, is this what you mean?
Or do you mean free the vector's memory and set the vector size to zero? I think this is ricky because user code calling the read may not be smart enough to handle vector which change size at run time.
Good point. I would probably just return false without touching the input vector at all, I am afraid that trying to implement some kind of "smart" handling of the input vector for users that ignore the return value of the read is going to be an ill posed problem otherwise.
Same problem occurred on the purple (iCubGenova02) today.
cc: @Yeshasvitvs @lrapetti
Fixed by @barbalberto in #2034