Nomad: Ability to send Unix Signals to processes Nomad is running

Created on 18 Feb 2016  路  26Comments  路  Source: hashicorp/nomad

Many of our services respond to Unix Signals (SIGHUP, SIGUSR1, etc.) to change their behavior. The most important one is to allow a service to "drain" (stop accepting new work, complete current in-flight work) before it is fully shut down.

The question is how this will work once Nomad is running our services. How can we send signals or achieve the equivalent behavior?

Happy to provide more detail and discuss as needed. :+1:

themdrivedocker themdriveexec themdrivejava themdriveraw_exec typenhancement

Most helpful comment

@schmichael @dadgar sorry to comment a closed ticket, but I have a use case that can really benefit from such capability the best.

The use case is for a runtime controlled by nomad client that can benefit from having a chance for clean shutdown when the machine is being decommissioned. Specifically, it's a Kafka broker process ran as nomad job on AWS ec2 machine (amazon linux2 with nomad client ran as systemd unit).

When the ec2 machine gets killed (say b/c degraded hardware, aws scheduled maintenance etc.) without nomad being in the loop, nomad job level kill_signal or nomad migration feature are not able to help. Such cases can benefit from a clean/controlled shutdown process initiated by the nomad client on the affected machine.

We can use ExecStop hook in the nomad client systemd unit to initiate such a shutdown process. But without the capability to send a unix signal (I understand this is asking for a *nix specific feature that does not fit well in windows world) to trigger such process, we'd have to resort to node draining API. To enable that, we'd have to grant nomad worker nodes capability to get a nomad token with "node:write" ACL, which if we try to turn this into a generic clean shutdown config to all nomad worker nodes, resulting in enabling any nomad node being able to drain/purge/toggle eligibility of any other nomad node in the same nomad cluster - sounds unnecessary escalation of privilege IMHO.

On the other hand, with capability to send the nomad client a signal as a trigger, each node can only trigger such draining for itself.

I understand not all system NEEDs such clean shutdown capability (I'm a fan of crash only system in fact). But our production Kafka on nomad can really benefit from such local node draining for unexpected machine shutdown cases

All 26 comments

@phinze We can do this for our exec based drivers. Docker doesn't have any API AFAIK to send signals to a running pid inside a container.

@c4milo The kill API you referenced assumes that the user wants to kill the container by sending a signal, and the call waits until the container exits.

The usecase for this ticket is that a user might want to send an arbitrary signal to a pid asynchronously. So the kill API won't work in this case.

@diptanu it looks to me, from the docs, like you can send arbitrary signals via the signal query parameter?

@diptanu, Docker has no assumption that a signal would stop the process inside a container. Ue use Docker kill API extensively to send signals to processes.

@skozin @lord2800 Yes, just re-read the docs. It looks like that the docker kill API waits for the container to exit only when the SIGTERM signal is passed. So we should be able to use this api to pass any arbitrary signal to the pid inside a docker container.

@phinze Is there any reason you can't put an HTTP API on top of the services. It is nice to limit the surface area of the scheduler to do just what is required. We already send a soft-kill (SIGTERM) signal before SIGKILL to let you do cleanup work and the kill timeout between these two is configurable.

@dadgar Definitely understand the desire to keep the scheduler simple!

So I'm coming from the opposite direction - trying to minimize the number of for-each-service changes I have to make as I lift an existing microservices architecture into Nomad. So the idea of requiring a sweep through to flop upstart / Unix Signals driven behavior to HTTP API across two dozen services doesn't sound super appealing at first glance. :grinning:

We might be able to make things work with the TERM/KILL window though. (We have some very long running jobs (6-12 hours) that need to drain from some of our services.)

From where I sit, it seems that the interrupt-driven use cases of pause/resume service, immediate config reload, and other behavior triggered by Unix signals is a relatively core enough feature to warrant inclusion in Nomad. But happy to discuss further!

@dadgar, I agree with @phinze here. Ability to send signals to a task process would be really useful addition to Nomad.

For example, NginX, as well as a large amount of other widely-used software, uses signals for a number of essential actions: HUP for zero-downtime config reloading, QUIT and WINCH for graceful shutdown, USR2 for upgrading the executable, etc. And this is not configurable.

One could definitely put some kind of HTTP API on top of NginX by running a coprocess that would listen to HTTP requests and communicate with NginX using signals, but that would require non-trivial effort.

@skozin So I think in the world of cluster schedulers some of the use cases you have described changes -
For example, if you want to upgrade the Nginx binary, you will probably deploy a new docker container or change the artifact source of your exec driver based task and do a rolling upgrade. So the need to send USR2 for upgrading the executable goes away.

On the topic of config reloading, if you use something like consul template or any other co-process which re-generates the nginx config, I would imagine that the co-process is going to send a signal to the Nginx pid to reload the config and not the operator.

But I don't disagree that sometimes sending signals can be handy, but I agree with @dadgar that in an environment where services are run on cluster schedulers the need for sending signals to processes becomes less.

@diptanu we're attempting to co-schedule a task group with two docker tasks: a consul-template container and an nginx container. I'm curious as to your statement:

I would imagine that the co-process is going to send a signal to the Nginx pid to reload the config and not the operator.

It's seems a bit tricky in this scenario for the consul-template container to send a signal to the nginx container.

Might a proper Nomad HTTP API to send a signal would simplify this problem?

Fictional example: Nomad injects metadata (env var) for a unique endpoint to POST a signal to specific sibling tasks in the group. My consul-template task might then curl -X POST -d SIGHUP ${NOMAD_signal_frontend} to signal nginx in the "frontend" task to reload.

@dadgar, you wrote

kill timeout between these two is configurable.

Could you maybe post a reference to this? I couldn't find anything in https://www.nomadproject.io/docs/drivers/docker.html.

@JensRantil I think you can add a kill_timeout parameter on the task object. Docs can be found here: https://www.nomadproject.io/docs/jobspec/index.html#kill_timeout. It is not docker specific.

@diptanu, you wrote

We can do this for our exec based drivers

Any news on that? I can't find any reference in the documentation

@maruina I think he meant in the abstract. We haven't done this because not only is it driver specific it is also operating system specific. It requires more thought as to whether we want to support this.

We would be happy to be able to send signals to jobs/groups/individual tasks (via HTTP API as described in one of the comments above) as well!

I'd propose adding a kill_signal parameter, analogous to the template update signal.

Background is that different signals lead to different exit behaviour, in my case e.g. for gitlab-ci runner i want to send SIGQUIT instead of SIGINT: https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/blob/master/docs/commands/README.md#signals

Closing. Nomad v0.9.2 added the nomad alloc signal ... command and corresponding API via #5515.

Feel free to open a new issue if there are use cases we didn't cover. Thanks and sorry for the delay in closing this issue!

@schmichael @dadgar sorry to comment a closed ticket, but I have a use case that can really benefit from such capability the best.

The use case is for a runtime controlled by nomad client that can benefit from having a chance for clean shutdown when the machine is being decommissioned. Specifically, it's a Kafka broker process ran as nomad job on AWS ec2 machine (amazon linux2 with nomad client ran as systemd unit).

When the ec2 machine gets killed (say b/c degraded hardware, aws scheduled maintenance etc.) without nomad being in the loop, nomad job level kill_signal or nomad migration feature are not able to help. Such cases can benefit from a clean/controlled shutdown process initiated by the nomad client on the affected machine.

We can use ExecStop hook in the nomad client systemd unit to initiate such a shutdown process. But without the capability to send a unix signal (I understand this is asking for a *nix specific feature that does not fit well in windows world) to trigger such process, we'd have to resort to node draining API. To enable that, we'd have to grant nomad worker nodes capability to get a nomad token with "node:write" ACL, which if we try to turn this into a generic clean shutdown config to all nomad worker nodes, resulting in enabling any nomad node being able to drain/purge/toggle eligibility of any other nomad node in the same nomad cluster - sounds unnecessary escalation of privilege IMHO.

On the other hand, with capability to send the nomad client a signal as a trigger, each node can only trigger such draining for itself.

I understand not all system NEEDs such clean shutdown capability (I'm a fan of crash only system in fact). But our production Kafka on nomad can really benefit from such local node draining for unexpected machine shutdown cases

@rmlsun you might be able to cover this case with the stop_after_client_disconnect stanza and an appropriately configured kill signal. If that won't do the trick, please feel free to open a new issue. Thanks!

@rmlsun Hm, what you're describing sounds like https://github.com/hashicorp/nomad/issues/2052 ?

Would a client.drain_shutdown = true agent configuration parameter fit your use case? The idea being that when the nomad client received the signal to shutdown it would block exiting until it had drained all running allocations?

If so please leave a comment over on that issue. If not please file a new issue like @tgross said.

@rmlsun you might be able to cover this case with the stop_after_client_disconnect stanza and an appropriately configured kill signal. If that won't do the trick, please feel free to open a new issue. Thanks!

@tgross thanks for the pointer. In most of our cases, we want the exact opposite of that stop_after_client_disconnect behavior, which is just leaving the task runtimes alone instead of taking them down. We did quite a bit of destructive test of nomad and were very happy to observe that task runtimes continue to run in the face of nomad server failure, network partition between nomad server and client etc. To that end, I really like what @schmichael mentioned in #2052:

A guiding principle in Nomad's design is in the face of errors: do not stop user services! Nomad downtime should prevent further scheduling, but it should avoid causing service downtime as much as possible.

@schmichael thanks for the pointer. Yes I think that kinds of configurable nomad client shutdown behavior will be helpful in this particular case:

Would a client.drain_shutdown = true agent configuration parameter fit your use case? The idea being that when the nomad client received the signal to shutdown it would block exiting until it had drained all running allocations?

Basically what we want is, if nomad itself is running into unexpected issues, leave the task runtime alone and confined nomad issue to be just nomad issue as much as possible (smallest blast radius possible). On the other hand, if it's an intentional shutdown of nomad client, provides a way to trigger a clean shutdown of task runtimes

I think there might be a fine line here @schmichael

Ideally, if nomad client itself crashes or shutdown b/c not operator initiated reasons, it should not trigger task shutdown. Only if it's an operator initiated shutdown, it triggers (and waits for the finish of) clean shutdown of all tasks.

So would a signal be a good way to indicate it's an intentional shutdown? Like, instead of having client.drain_shutdown = true, how about client.drain_shutdown_signal = SIGINT something along that line.

Was this page helpful?
0 / 5 - 0 ratings