It would be great to be able to register a task health check in a job definition (a la Consul) that, optionally, could be leveraged by Nomad to kill and replace a failed task.
This will be included in our consul integration (probably in 0.2).
This would be really nice to have. Are there any updates @cbednarski?
@cbednarski is this already implemented?
@abhidrona No it is not! Nomad will register services and checks but does not use the state of the checks to restart tasks.
I'm not sure what the priority is for this, but IMHO it should be of highest priority.
Using service health checks but no task health checks, Nomad creates a zombie fleet of tasks.
What exactly is the point of having unhealthy jobs non-discoverable but still running?
This also is one of the very few issues that keep us from switching over from mesos to nomad. (We run nomad in our staging environment, but can't afford to manually restart tasks in production.)
Closing in favor of https://github.com/hashicorp/nomad/issues/876 given more people have seen that one.
@dadgar this is #164
Fixed reference! My bad!
Most helpful comment
I'm not sure what the priority is for this, but IMHO it should be of highest priority.
Using service health checks but no task health checks, Nomad creates a zombie fleet of tasks.
What exactly is the point of having unhealthy jobs non-discoverable but still running?