Px4-autopilot: Exiting under Windows is strange

Created on 12 Dec 2018  路  12Comments  路  Source: PX4/PX4-Autopilot

I find it annoying that I can't easily quit SITL under windows. When I press Ctrl+C the signal handler SIGINT is called correctly but then it doesn't shutdown cleanly but hands and you need to press either Enter or Ctrl+D. Presumably, the pxh shell is hanging at getchar.

FYI: @MaEtUgR

bug stale windows

All 12 comments

Related: https://github.com/PX4/Firmware/issues/10098#issuecomment-415426790
My issue with less explanation for the same issue: https://github.com/PX4/windows-toolchain/issues/3
I have to deal with this problem every day... so I would be greatful to have a solution given it cannot be that complicated. The shutdown command works just fine now. Can we not do the same thing on Ctrl+C?

I believe the shutdown command works because you won't call getchar anymore afterwards. However, when you catch the SIGINT signal and try to exit it still hangs at getchar.

Thanks for sharing your findings. So the expected behavior would be that it kills all threads but since the pxh shell thread is still waiting for input it also waits for the thread. Maybe that's similar to the waiting for the poll of the named pipe issue that is solved now.

Since #11177 simulation runs a lot more stable on windows but the shutdown command also hangs so killing the process is the only way left.

I started debugging and found that:

  • With this line left in but a basically empty startup script the shutdown command still works but Ctrl + C requires an additional enter like described in OP despite no obvious problems:
    INFO [px4] Calling startup script: /bin/sh etc/init.d-posix/rcS 0 INFO [px4] Startup script returned successfully

Aha
https://github.com/PX4/Firmware/blob/d8ab059ff3dcd50e0b8953da4de4150dffc2b75e/platforms/posix/src/main.cpp#L528
http://man7.org/linux/man-pages/man3/system.3.html

During execution of the command, SIGCHLD will be blocked, and SIGINT
and SIGQUIT will be ignored, in the process that calls system()
(these signals will be handled according to their defaults inside the
child process that executes command).

As mentioned, system() ignores SIGINT and SIGQUIT. This may make
programs that call it from a loop uninterruptible, unless they take
care themselves to check the exit status of the child.

I'm currently assuming newlib which Cygwin uses somehow screws up the SIG's in the process of implementing this specification.

https://github.com/mirror/newlib-cygwin/blob/17918cc6a6e6471162177a1125c6208ecce8a72e/newlib/libc/stdlib/system.c

Maybe I can find a simple workaround.

So the system() call is one thing that confuses the signal handling and makes the application not exit anymore and sadly I found out that something in the lockstep scheduler (#10648) is the other thing.

So in short if I replace

  1. https://github.com/PX4/Firmware/blob/2ffb49b734fad6b6dfd6fcaee79bf31c6cf595a7/platforms/posix/src/main.cpp#L529 with popen(shell_command.c_str(), "r") and comment out this line
  2. https://github.com/PX4/Firmware/blob/2ffb49b734fad6b6dfd6fcaee79bf31c6cf595a7/boards/px4/sitl/default.cmake#L101

it all runs smooth in my takeoff Ctrl+C test.

Now this is only a possible quick hotfix to reestablish functionality for currently blocked users and not something desirable since lockstep works (other than exiting) super well on Windows. I'll follow up with a more detailed explanation of what I tried and I'd really appreciate if I could ask @julianoes some futher questions (he already helped me when we last met). 馃槆

So here's the patch I last described https://github.com/PX4/Firmware/compare/master...MaEtUgR:exiting-experiments with the biggest disadvantage of disabling lockstep which is no solution.

I tracked the problem down to this call:
https://github.com/PX4/Firmware/blob/master/src/modules/simulator/simulator_mavlink.cpp#L295

In my tests it always returned 0 and according to what I've seen does what it should setting a clock that starts counting when the simulator starts. The frustrating part is: I tried to only process the first 1 or 100 MAVLink messages from the simulator and then exiting still works fine. If I process >300 exiting the application is broken. Now my debugging on why this could happen is inconclusive. What's suspicious is that it breaks after processing around one second of simulator data but not exactly to the milisecond.

@julianoes Could it be that something is triggered after the monotonic time in the simulator runs for one second that then starts other threads that lock everything up?

I just encountered a case on Linux where the first Ctrl+C didn't exit but also got stuck and the second Ctrl+C then then killed it:

pxh> INFO  [ecl/EKF] EKF aligned, (pressure height, IMU buf: 22, OBS buf: 14)
INFO  [ekf2] Mag sensor ID changed to 196616
INFO  [ecl/EKF] EKF GPS checks passed (WGS-84 origin set)
INFO  [ecl/EKF] EKF commencing GPS fusion
pxh> INFO  [mavlink] partner IP: 127.0.0.1
INFO  [commander] Takeoff detected

Exiting...
^C
Exiting...
pxh> Shutting down
ninja: build stopped: interrupted by user.
Makefile:190: recipe for target 'px4_sitl_default' failed
make: *** [px4_sitl_default] Interrupt

maetugr@ubuntuvm:~/Firmware$

It's not one more SIGINT for the simulator the px4 binary processes both as can be seen from the two Exiting.... This might give a hint.

I'm still relying on the hotfix https://github.com/PX4/Firmware/pull/11305 and Ctrl+C works as expected. Not sure how to track down why the normal way it gets stuck.

Still a problem

True, this is related: https://github.com/PX4/Firmware/pull/11654 It fixed the shutdown command in my tests.

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bthnekn picture bthnekn  路  4Comments

robin-shaun picture robin-shaun  路  4Comments

felix-west picture felix-west  路  4Comments

dk7xe picture dk7xe  路  3Comments

julianoes picture julianoes  路  3Comments