For commands that stop processes (e.g. destroy, stop, stop_node, restart), add a -f/--force flag that kills them with SIGKILL. This is in response to the recent regression that caused yb-ctl to be unable to stop postgres processes. For developers, we need to manually kill the processes each time (or write a script ourselves). Having this built into yb-ctl will help for now and other future hiccups.
Hi I'll work on this
Great! The "recent regression" is issue #6258, and yb-ctl is at scripts/installation/bin/yb-ctl. One way to test it would be to
./bin/yb-ctl destroy --force and see it worksIf you don't have easy access to a centos machine, maybe you can find some other condition that gets the processes badly stuck. Currently, index backfill with an ongoing long CREATE INDEX might qualify.
thank you for the clear explanation @jaki.
is it only happen on centos or could happen in another Linux distribution?
@adzimzf, I heard that it does not happen on mac. Maybe it may happen on ubuntu. It for sure happens on centos. Those are our three supported OSes at the moment.
okay noted. I'll try on ubuntu first if it doesn't occur I'll try on centos.
Hi @jaki I've created the PR
I tested it by creating a table with 200K data, then create an index on it, while it's still in processing I run yb-ctl stop and the other commands
@adzimzf, to be clear, you tried without --force and it didn't work (immediately) but with --force it did?
yes, with --force it stopped immediately.
is it enough to reproduce?
We need a control test. If it stopped immediately without --force, then it doesn't demonstrate that --force is useful. It's important to find a situation that shows a clear improvement using --force vs not using it.
One example I found (after switching to using SIGQUIT for postmaster) was https://github.com/yugabyte/yugabyte-db/issues/6269#issue-735746516. However, that got fixed not too long ago (commit 5fd432910f5dd9061e90f90dda893048aa6fa4cb), so I prefer finding a different test. You could try really stressing out the system and hope that regular yb-ctl stop won't work (immediately) whereas --force does.
In my testing, creating an index on a table with 200K data takes around 1 minute. then I drop the index and re-create the same index, while it's creating I run yb-ctl stop. It hangs (both yb-ctl stop and ysqlsh) for more than 5 minutes until I hit CTRL+C to quit the process on yb-ctl stop terminal session, then I run yb-ctl stop again, it'll hang for more than 5 mins as well. After I hit CTRL+C I run yb-ctl stop --force then it stops immediately.
I'm not sure this's enough to show a clear improvement using --force or not using it. But I'll try another possibility.
Whether yb-ctl hangs or not shouldn't be a problem compared to whether yb-master, yb-tserver, and postgres processes hang. Especially pay attention to postgres proceses. With ysqlsh open, you should see it terminate connection (if it were running a command) or terminate connection when you attempt the next command (if it were not running a command). That's what I got when using SIGQUIT for Postmaster; using SIGKILL, it was not user friendly because input like \q didn't work.
An easy test, as I explained at the start, is issue #6258. It got fixed recently, but you can just revert the commit. SIGQUIT gives
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>
which is good because the user can exit. SIGKILL makes existing ysqlsh hang, which is bad.
yes, I agree,
I also got the ysqlsh is hang after I ran yb-ctl stop --force, will update.