Px4-autopilot: Performance issues on Windows that didn't exist before

Created on 16 Oct 2018  路  11Comments  路  Source: PX4/PX4-Autopilot

Describe the bug
Inconsistent and unexpected simulation performance issues when running SITL jMAVsim on Windows that end up in messages like this and very small to huge visible twitches in the simulation result. It can even lead to loss of global position and a crash of the simulated vehicle.

ERROR [sensors] Accel #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN  [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1
WARN  [ekf2] accel id changed, resetting IMU bias
WARN  [ekf2] accel id changed, resetting IMU bias
ERROR [sensors] Accel #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN  [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail:  TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN  [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1
WARN  [ekf2] accel id changed, resetting IMU bias

To Reproduce
Steps to reproduce the behavior:

  1. Fast desktop computer with Windows 10
  2. Installed Cygwin Toolchain
  3. start make posix jmavsim on latest master
  4. Sometimes it works fine even for longer periods, sometimes the errors and twitches occur

Expected behavior
Simulation should work just fine on a modern computer that isn't overloaded. It also worked fine before, I'm not sure but I think on 1.8 I didn't have any obvious problem not even on a slower laptop.

Log Files and Screenshots
I can provide a simulation log if that helps.

Add screenshots to help explain your problem.

Additional context
If I look closely at the description of this issue it looks very similar: https://github.com/PX4/Firmware/issues/10098

Machine is fast enough and not overloaded and it doesn't always happen but regularly/consistently.

I'd suspect threading priority problems with Windows if none of this happens on any other platform but there might be a general problem that just shows off on Windows. While I ported SITL to Cygwin I had to adjust some priority numbers. https://github.com/PX4/Firmware/pull/8407/files#diff-b3ac3b41912e12a40f44757b561a0e4dR216 could be related.

Another general question is why should the simulation not work fine just slower if the physics simulation cannot run in real time. I think this was discussed before.

bug

All 11 comments

What do you see in task manager while this is happening? CPU, memory, and network utilization for both PX4 and jmavsim.

What do you see in uorb top?

@julianoes Will your SITL changes likely help with this issue?

@julianoes Will your SITL changes likely help with this issue?

Not in principle but I'm now debugging issues like this hoping to find out why uorb drops messages.

@dagar Thanks for the hints. CPU usage in task manager looks fine with PX4 taking 0-0.2% and jmavsim 2-5%. Here's an uorb top output snapshot:

update: 1s, num topics: 83
TOPIC NAME                        INST #SUB #MSG #LOST #QSIZE
actuator_controls_0                  0    7  171   287 1
battery_status                       0    6   79   206 1
ekf2_innovations                     0    1  114     0 1
ekf2_timestamps                      0    1  169     0 1
ekf_gps_drift                        0    1    9     0 1
estimator_status                     0    4  114   392 1
geofence_result                      0    1    4     0 1
rate_ctrl_status                     0    1  171     0 1
sensor_accel                         0    2  174   133 1
sensor_accel                         1    1  250    86 1
sensor_baro                          0    1   99    16 1
sensor_bias                          0    4  114   221 1
sensor_combined                      0    4  169   236 1
sensor_gyro                          0    3  174   136 1
sensor_mag                           0    2   99    71 1
sensor_preflight                     0    1  169     0 1
vehicle_air_data                     0    5   83    91 1
vehicle_attitude                     0    9  169   334 1
vehicle_attitude_groundtruth         0    1  140     0 1
vehicle_global_position              0    5  114   271 1
vehicle_global_position_groundtruth  0    1  140     0 1
vehicle_gps_position                 0    5    9     8 1
vehicle_local_position               0    8  114   321 1
vehicle_local_position_groundtruth   0    1  140     0 1
vehicle_magnetometer                 0    4   83    95 1
vehicle_rates_setpoint               0    4  152     0 1

Is there something striking?

Update: I talked to @Stifael who uses linux jmavsim and there seems to be a problem in simulation in general, it probably just hides better on linux. But it can always happen with a certain CPU load and timing jitter that your sensors are producing errors and especially that you loose global position and do not regain it until you restart simulation...

I have seen that too. Let's hope that it will be fixed with #10648.

I think I have the same issue.

Environment:
MacOS 10.14.1
Macbook Pro, i7

Branch:
tag/v1.8.2

Command:

make posix jmavsim

Output:

ERROR [sensors] Accel #0 fail: TIMEOUT!
ERROR [sensors] Sensor Accel #0 failed. Reconfiguring sensor priorities.
WARN [sensors] Remaining sensors after failover event 0: Accel #0 priority: 1
WARN [sensors] Remaining sensors after failover event 0: Accel #1 priority: 1
ERROR [sensors] Gyro #0 fail: TIMEOUT!
ERROR [sensors] Sensor Gyro #0 failed. Reconfiguring sensor priorities.
WARN [sensors] Remaining sensors after failover event 0: Gyro #0 priority: 1

Performance:
JarSrcLoader takes about 40% of a core
px4 takes another 20%

Behavior:
Vehicle tends to crash into the ground randomly, although I'm not sure this is related as I am not familiar with the sim and I may be sending bad commands.

edit: the issue is gone in the same environment with latest master.

Just as an update. Since the lockstep was introduced the result of this problem is the whole simulation stops. At least you can assume now that before it stops nothing unexpected like a simulated drone crash happens. I'm not sure why it should stop but a first guess would be that a packet and hence the "lockstep token" gets lost e.g. in the simulator and there's a deadlock.

Fun :sob:, so when does it stop? Straight away or after a while of flying? And does it always happen consistently?

It always happens but not consistently after the same amount of time. I just compared to linux, there I can easily let jmavsim fly for an hour without problem but on windows it happens quite often that the simulation completely locks before I did my tests which usually take about a minute or two. It also happens when I don't fly at all: I just start sitl jmavsim and do something else when I come back I see the symptom "global position loss" all the time.

In my tests after #11177 this issue is gone. Thanks to @bkueng!
For more details read: https://github.com/PX4/Firmware/pull/11177#issuecomment-453973931

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wuxianshen picture wuxianshen  路  3Comments

kainism picture kainism  路  4Comments

robin-shaun picture robin-shaun  路  4Comments

FaboNo picture FaboNo  路  5Comments

Stifael picture Stifael  路  3Comments