Hello,
I am doing a 3 IRIS quadcopters simulation in gazebo.
I have noticed that after running simulation for 4 days at low quadcopters speed (1 m/s), and after updating to the last SITL commit in which iris SDF file has been update with new plugins for magnometers and gyroscope, that gazebo is slowing down, the real time factor is going down to 0.22
Also the communication between the SDK and the PX4 is slowing. I am even losing heart beats during take off and landing.
The output from the SDK is the following :
11:06:27|Debug] MAVLink: info: DISARMED by auto disarm on land (system_impl.cpp:300)
[11:06:28|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:28|Debug] Lost 5283920058631409232 (mavsdk_impl.cpp:388)
[11:06:29|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:29|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:29|Debug] Lost 5283920058631409233 (mavsdk_impl.cpp:388)
[11:06:29|Debug] Lost 5283920058631409234 (mavsdk_impl.cpp:388)
[11:06:29|Debug] Discovered 1 component(s) (UUID: 5283920058631409232) (system_impl.cpp:557)
[11:06:29|Info ] Discovered system with UUID: 5283920058631409232
[11:06:31|Debug] Discovered 1 component(s) (UUID: 5283920058631409233) (system_impl.cpp:557)
[11:06:31|Info ] Discovered system with UUID: 5283920058631409233
[11:06:31|Debug] Discovered 1 component(s) (UUID: 5283920058631409234) (system_impl.cpp:557)
[11:06:31|Info ] Discovered system with UUID: 5283920058631409234
[11:06:32|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:32|Debug] Lost 5283920058631409232 (mavsdk_impl.cpp:388)
[11:06:34|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:34|Debug] Lost 5283920058631409233 (mavsdk_impl.cpp:388)
[11:06:34|Info ] heartbeats timed out (system_impl.cpp:305)
[11:06:34|Debug] Lost 5283920058631409234 (mavsdk_impl.cpp:388)
[11:06:34|Debug] Discovered 1 component(s) (UUID: 5283920058631409232) (system_impl.cpp:557)
[11:06:34|Info ] Discovered system with UUID: 5283920058631409232
[11:06:35|Debug] Discovered 1 component(s) (UUID: 5283920058631409233) (system_impl.cpp:557)
[11:06:35|Info ] Discovered system with UUID: 5283920058631409233
[11:06:35|Debug] Discovered 1 component(s) (UUID: 5283920058631409234) (system_impl.cpp:557)
[11:06:35|Info ] Discovered system with UUID: 5283920058631409234
I am using Arch Linux with 2 Intel(R) Xeon(R) CPU E5-2609 v4 @ 1.70GHz with 16 cores and 32 GB of RAM.
Also using htop, I have noticed that gzserver is using only 70% of only one core of CPU, there is no over consumption at any level, even though it is a little bit reduced over the four days
Does any one have an idea about what is happening??
@JonasVautherin @julianoes, Please check here for the issue,
Note: it seems like it is not MAVSDK-related, but rather comes from an update in SITL-gazebo.
I have noticed that after running simulation for 4 days
Does that mean that gazebo has to run for 4 days before the issue appears? Or can you reproduce in a small example that does not run for so long? :blush:
after updating to the last SITL commit in which iris SDF file
Could you confirm that when reverting that SDF file, then the problem disappears?
@JonasVautherin Not really, just run it for 3 hours, and the real time factor will go down to 0.72
You would have to wait for another 3 hours until it goes down to 0.55
I will try revert to the old SDF file, the only thing that I do not understand it that this SDK file is not tracked by git
@shrit does this happen with the default Gazebo sim with only one vehicle? Does it need to be flying or does it happen on the ground as well? The easier it is to reproduce, the better the chance that we can reproduce it.
@julianoes, It does happen with default gazebo, I only do several quadcopters vehicles simulation in gazebo. Never tried only one vehicles. I have no idea if leaving the quadcopters on the ground might reduce or not the real time factor.
I will try to have more information and post it here
@julianoes Maybe my modified offboard example (yesterday's post on #mavsdk slack channel) can help you with easily reproducing the error. Directly after an actuator control message is sent, the heartbeat is lost. The real time factor drops from 0.9-1 during takeoff (on my machine) to 0.04 during offboard with actuator control.
Actuator_controls.zip
I exchanged a few emails with Ildar Sadykov, according to him my modified example is correct and it is indeed a Gazebo issue.
I'm not sure if it is the exact same issue though as in my case it always happens immediately after calling 'set_actuator_control()' and not over a longer period of time.
@ThomasRigi what do you mean with Gazebo issue? Would that be an issue of a Gazebo version, or something about the gazebe_mavlink_interface plugin?
I can reproduce this with iris just hovering. After about 3 hours the real time factor drops from 0.98 to around 0.9.
@julianoes my best guess is that it's something in the gazebo_mavlink_interface plugin. But I wasn't able to pin down the root cause. The reported error in the terminal is poll timeout in gazebo_mavlink_interface.cpp, line 1080.
Running my modified example takes about 30s until the real time factor drops to 0.04; giving it less time for takeoff would shorten the duration even more.
@ThomasRigi ok I wonder if the effect that you see is the same issue but it is certainly faster to observe :smile:. I will try to reproduce it, thanks for sharing that code.
@julianoes Sorry, In fact gzserver is consuming more CPU when the real time factor decreases, I have misread it with gzclient, In fact it does increase from 47% of one core CPU to 70% at the same core.
@shrit good ovservation. I can see the same thing, gzserver starts off at 50% and then goes to 70% later.
@julianoes @JonasVautherin, I have went back to an old commit in SITL gazebo, I have started a simulation yesterday around 19:00h, until now every thing seems to be fine, real time factor is still 1 or 0.99
The commit I am using now is this one: 0e43ddafa0c70a190a9080c17c9540d8f7ab5261
After this commit, sdf files started to change because they added separate plugin for magnetometer.
I can not be sure which of the following commit has affected the real time factor. But this one is working perfectly.
I'm doing a git bisect in the sitl_gazebo submodule but it obviously takes a long time.
Any new observation about this issue ?
No, what about you? :smile: Sorry, this fell off my radar.
@julianoes, Well after an enormous update for my arch system. Everything needs to be rebuild.
Since I have to rebuild the binaries for px4. I have to relook into it since the last working commit before this issue can not be compiled because of a broken python dependecy
So, yes, I have to relook into it today.
@julianoes After finishing compiling px4 and gazebo 11, I am using the master version for sitl_gazebo and for the firmware.
I have just started the simulator and I was not able to even take off the quadrotors, only arming and then I receive all these set of errors:
I think these errors are related to this issue:
05:11:07|Debug] MAVLink: info: ARMED by Arm/Disarm component command (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: info: ARMED by Arm/Disarm component command (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: info: ARMED by Arm/Disarm component command (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: critical: Failsafe enabled: no RC (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: critical: Failsafe enabled: no RC (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: critical: Failsafe enabled: no RC (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: info: Failsafe mode activated (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: info: Failsafe mode activated (system_impl.cpp:309)
[05:11:07|Debug] MAVLink: info: Failsafe mode activated (system_impl.cpp:309)
[[1177::1111:09:|Info ] 09|Info ] Setting altitude takeoff to 15metersSetting altitude takeoff to 15meters
[17:11:09|Info ] Setting altitude takeoff to 15meters
[05:11:09|Warn ] command temporarily rejected (22). (mavlink_commands.cpp:170)
[05:11:09|Debug] MAVLink: critical: Takeoff denied! Please disarm and retry (system_impl.cpp:309)
[17:11:09|Error] Taking off has failed: Command denied
[05:11:09|Warn ] command temporarily rejected (22). (mavlink_commands.cpp:170)
[05:11:09|Debug] MAVLink: critical: Takeoff denied! Please disarm and retry (system_impl.cpp:309)
[17:11:09|Error] Taking off has failed: Command denied
[05:11:09|Warn ] command temporarily rejected (22). (mavlink_commands.cpp:170)
[05:11:09|Debug] MAVLink: critical: Takeoff denied! Please disarm and retry (system_impl.cpp:309)
[17:11:09|Error] Taking off has failed: Command denied
[05:11:10|Warn ] sending again, retries to do: 3 (176). (mavlink_commands.cpp:222)
[05:11:10|Warn ] sending again, retries to do: 2 (176). (mavlink_commands.cpp:222)
[05:11:11|Warn ] sending again, retries to do: 1 (176). (mavlink_commands.cpp:222)
[05:11:11|Error] Retrying failed (176) (mavlink_commands.cpp:239)
Offboard::start() failed: Timeout[05:11:11|Info ] heartbeats timed out (system_impl.cpp:314)
[05:11:11|Debug] Lost 5283920058631409232 (mavsdk_impl.cpp:417)
[05:11:11|Info ] heartbeats timed out (system_impl.cpp:314)
[05:11:11|Debug] Lost 5283920058631409233 (mavsdk_impl.cpp:417)
[05:11:13|Warn ] sending again, retries to do: 3 (176). (mavlink_commands.cpp:222)
[05:11:13|Warn ] sending again, retries to do: 3 (176). (mavlink_commands.cpp:222)
[05:11:13|Warn ] sending again, retries to do: 3 (176). (mavlink_commands.cpp:222)
[05:11:13|Warn ] sending again, retries to do: 2 (176). (mavlink_commands.cpp:222)
[05:11:13|Warn ] sending again, retries to do: 2 (176). (mavlink_commands.cpp:222)
[05:11:13|Warn ] sending again, retries to do: 2 (176). (mavlink_commands.cpp:222)
[05:11:14|Warn ] sending again, retries to do: 1 (176). (mavlink_commands.cpp:222)
[05:11:14|Warn ] sending again, retries to do: 1 (176). (mavlink_commands.cpp:222)
[05:11:14|Warn ] sending again, retries to do: 1 (176). (mavlink_commands.cpp:222)
[05:11:14|Error] Retrying failed (176) (mavlink_commands.cpp:239)
Offboard::stop() failed: Timeout[17:11:14|Error] Landing..., Even can not stop offboard mode
[05:11:14|Error] Retrying failed (176) (mavlink_commands.cpp:239)
Offboard::stop() failed: Timeout[17:11:14|Error] Landing..., Even can not stop offboard mode
[05:11:15|Warn ] sending again, retries to do: 3 (21). (mavlink_commands.cpp:222)
[05:11:15|Warn ] sending again, retries to do: 3 (21). (mavlink_commands.cpp:222)
[05:11:15|Warn ] sending again, retries to do: 3 (21). (mavlink_commands.cpp:222)
[05:11:15|Warn ] sending again, retries to do: 2 (21). (mavlink_commands.cpp:222)
[05:11:15|Warn ] sending again, retries to do: 2 (21). (mavlink_commands.cpp:222)
[05:11:16|Warn ] sending again, retries to do: 1 (21). (mavlink_commands.cpp:222)
[05:11:16|Warn ] sending again, retries to do: 1 (21). (mavlink_commands.cpp:222)
[05:11:16|Error] Retrying failed (21) (mavlink_commands.cpp:239)
[05:11:16|Error] Retrying failed (21) (mavlink_commands.cpp:239)
[17:11:16|Error] Landing command has failed:Timeout
[05:11:16|Error] Retrying failed (21) (mavlink_commands.cpp:239)
[17:11:16|Error] Landing command has failed:Timeout
[17:11:16|Info ] All quadrotors have been reset...
I'm still on Gazebo 10 :smile:. Looks like I should not upgrade! :confused:
The first thing to try is to set NAV_RCL_ACT to 0 so that it doesn not complain about "No RC".
@julianoes Actually, it is worse, it is gazebo-hg directly brought from bitbucket by yay
I am not going to mix things up. I am opening a new issue in sitl_gazebo repository https://github.com/PX4/sitl_gazebo/issues/381 since the original one is not yet resolved
Makes sense.
I'm using this Gazebo AUR package in Arch.
@julianoes I am going to test this version to see if the problem is originated from gazebo
@shrit and this works nicely to build it in Arch:
yay -S gazebo --noconfirm --rebuildtree
@julianoes Finally, it seems to be related to sitl_gazebo rather than gazebo.
Check this out https://github.com/PX4/sitl_gazebo/issues/381
This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.
Keep open as the issue is still not resolved
I think this issue is caused by a part of code below.
src/gazebo_magnetometer_plugin.cpp
mag_message_.add_magnetic_field_covariance() would be called when world update start event occurs (This event is called frequentry). So mag_message_'s contents keep growing.
This part should be written in Load event, if its contents is constant. But I'm not sure if it's constant.
Interesting, thanks @tnkkk! Have you tried changing it to see if it makes a difference?
@julianoes After I tried above, it was fixed. Simulation keeps the same speed.
Oh nice, can you make a pull request?
It's first time for me to make a pull request. I'm not sure if I did it in correct way. Please tell me if there are any problems.
Thanks @tnkkk!
Thanks @tnkkk
@julianoes Thanks for merging, this will help me a lot
Most helpful comment
I think this issue is caused by a part of code below.
src/gazebo_magnetometer_plugin.cpp
mag_message_.add_magnetic_field_covariance()would be called when world update start event occurs (This event is called frequentry). Somag_message_'s contents keep growing.This part should be written in
Loadevent, if its contents is constant. But I'm not sure if it's constant.