Consul: "consul lock" process doesn't handle signals until lock is acquired

Created on 24 Jul 2018  路  13Comments  路  Source: hashicorp/consul

Overview of the Issue

If you start "consul lock" it will attach signal handlers so it can pass them to the child process. However, until the lock is acquired, every signal is discarded, so there isn't any graceful way for stopping it.

Reproduction Steps

Run "consul lock" on two nodes. On the node that didn't acquire the lock, try pressing CTRL+C for killing it. It won't close.

Operating system and Environment details

Ubuntu 18 LTS, nothing fancy

typbug

All 13 comments

I'm opening this issue because I'm using "consul lock" on a systemd service. If Consul has the lock, everything works fine. However, if I want to stop the service, and Consul doesn't currently hold the lock, Systemd will timeout and SIGTERM it

@JohnKiller Thanks for the report. It sounds perfectly reasonable to expect consul lock to still handle signals and exit even when the lock isn't held.

I don't know GO very well, but I've noticed that in the watch command there is this line:
https://github.com/hashicorp/consul/blob/8cdba9611d4cda22691fb5715872e2fa538bb389/command/watch/watch.go#L20-L24

which in lock is missing:
https://github.com/hashicorp/consul/blob/8cdba9611d4cda22691fb5715872e2fa538bb389/command/lock/lock.go#L64-L68

However that variable is referenced here:
https://github.com/hashicorp/consul/blob/8cdba9611d4cda22691fb5715872e2fa538bb389/command/lock/lock.go#L207-L212

So maybe it's just missing that. Thanks

@JohnKiller Maybe you know more GO than you thought. That is a good catch and certainly looks suspect.

So I looked into it a bit and that is part of the issue. Right now not having a shutdown chan means that it will just keep issuing the lock until it gets it.

There is a second piece to note in that the blocking query issued to consul to gain the lock could block for up to 15 seconds.

@JohnKiller do you know how long before systemd times out and issues a sigterm

Default is 90s. Until timeout, it will just keep trying to get the lock, so I did try another thing:

  • Start lock 2 times
  • Press CTRL+C on the waiting node (no effect)
  • Press CTRL+C on the running node (closes process and correctly releases lock)
  • The other one gets the lock, but won't close
  • Press CTRL+C again, it now works

This means that there is in fact a signal handler that just discards everything.

Forked the repo, made the changes to have ShutdownCh with MakeShutdownCh() and now I observe two things:

  • On the process which helds the lock, 3 signals gets sent to the child process instead of one
  • On the other process, the shutdown works but only after the timeout you mentioned.

Any suggestion on where to look?

We are having the same problem.

Any ideas when this will be fixed? (I'm unfortunately not fluent in GO myself)

Sorry, i did not dig further since it's behind my possibilities. My workaround is a SIGKILL after a timeout.

Fixed by #5909

Hi @freddygv is this backported to 1.6 or should I wait for 1.7? Thanks

Hi @JohnKiller I just saw that it didn't get backported to 1.6.3. That means it will be in 1.7, which is coming very soon.

OK, just upgraded to 1.7.0 and the fix is working.
However, this is the output:

Setting up lock at path: test/.lock
Attempting lock acquisition
^CShutdown triggered or timeout during lock acquisition

Is there a way to abort immediately instead of waiting the lock timeout? It still took about 10 seconds to quit

Was this page helpful?
0 / 5 - 0 ratings

Related issues

runswithd6s picture runswithd6s  路  3Comments

nicholasjackson picture nicholasjackson  路  3Comments

wing731 picture wing731  路  3Comments

philsttr picture philsttr  路  3Comments

slackpad picture slackpad  路  3Comments