Consul: Consul lock does not kill child process when it loses lock

Created on 5 May 2016  路  6Comments  路  Source: hashicorp/consul

Server: 0.6.4, Ubuntu 14.04, 64-bit

When consul lock gets a lock, it launches /bin/sh -c <child-process>, which then launches the actual child process. When it loses the lock, it kills /bin/sh but leaves <child-process> running.

Steps to reproduce:

  1. launch consul
  2. start a service using consul lock, e.g. consul lock locks/replicate consul-replicate -config /etc/consul-replicate.d
  3. kill consul to release the lock
  4. run ps aux | grep consul and see that the child process is still running.

Relevant log fragments:

  1. With everything running, ps aux | grep consul:
consul    2750  0.1  3.3  25220 16776 ?        Ssl  20:52   0:04 /opt/consul/0.6.4/consul agent -config-file=/etc/consul/consul.json -config-dir=/etc/consul/conf.d
consul-+  2868  0.0  2.6  19200 13028 ?        Ssl  20:53   0:00 /opt/consul/0.6.4/consul lock -verbose locks/replicate consul-replicate -config /etc/consul-replicate.d
consul-+  2875  0.0  0.1   4448   800 ?        S    20:53   0:00 /bin/sh -c consul-replicate -config /etc/consul-replicate.d
consul-+  2876  0.0  0.9   8296  4656 ?        Sl   20:53   0:00 consul-replicate -config /etc/consul-replicate.d
  1. Log when losing lock:
Setting up lock at path: locks/replicate/.lock
Attempting lock acquisition
Starting handler 'consul-replicate -config /etc/consul-replicate.d'
Lock lost, killing child
Terminating child pid 2875
Error running handler: signal: terminated
signal: terminated
Child terminated
Lock release failed: failed to release lock: Put http://127.0.0.1:8500/v1/kv/locks/replicate/.lock?flags=3304740253564472344&release=c97cc39e-b5a7-0bd8-e8aa-baf9226a4ddb: dial tcp 127.0.0.1:8500: getsockopt: connection refused

Note: pid 2875 is the /bin/sh process

  1. ps aux | grep consul afterwards:
consul-+  2876  0.0  0.9   8296  4656 ?        Sl   20:53   0:00 consul-replicate -config /etc/consul-replicate.d
consul    2972  0.4  2.8  23108 14236 ?        Ssl  21:36   0:00 /opt/consul/0.6.4/consul agent -config-file=/etc/consul/consul.json -config-dir=/etc/consul/conf.d

The child process, pid 2876, is still running
Pid 2972 is a new consul agent process restarted by upstart.

Most helpful comment

If you are seeing this on ubuntu, it may be caused by "sh == dash". Switching from dash to bash as the sh interpreter fixed this behavior for me.

All 6 comments

If you are seeing this on ubuntu, it may be caused by "sh == dash". Switching from dash to bash as the sh interpreter fixed this behavior for me.

That looks like it did the trick.

It might be worthwhile to allow the shell to be overridden so that we don't have to change the default shell globally. https://wiki.ubuntu.com/DashAsBinSh claims that there are many speed improvements from using dash which are often incorrectly attributed to upstart.

Alternatively, you can ask the kernel to kill your process when its parent (shell) dies with prctl(PR_SET_PDEATHSIG, SIGKILL).

However, consul should not launch processes that it controls in a shell.

We experienced corruption today using consul-replicate due to this issue (multiple consul-replicates running due to leaks over time). We run consul-replicate as an Upstart job under Ubuntu with consul lock. I suspect that this is a common pattern.

Not explicitly setting SHELL in the Upstart job leads to this leak. The behavior is both obscure and dangerous. I think it warrants a special callout in the docs or something. I think that this is the real solution though: https://github.com/hashicorp/consul/issues/1692

I'll send a PR to update the docs if it will be accepted

@evan2645 sorry about that - I'd definitely take a PR to update the docs and push that out while we work on the fix.

@slackpad no apology needed :) opened https://github.com/hashicorp/consul/issues/2090

Was this page helpful?
0 / 5 - 0 ratings

Related issues

philsttr picture philsttr  路  3Comments

powerman picture powerman  路  3Comments

eshujiushiwo picture eshujiushiwo  路  3Comments

lmb picture lmb  路  4Comments

pritam97 picture pritam97  路  3Comments