Among our jobs there are tasks that run processes that cannot be monitored via health-checks like HTTP or TCP - usually it applies to async workers that process some tasks. There is no really good way to figure out how the process is doing but so far we've been monitoring whether process is alive with ianitor or consul-announcer. While migrating to Nomad we were able to allow nomad to take care of all health checks except this one. We still have to wrap our process into some "monitor" that will use TTL-based health-check to monitor the process health. I'm wondering whether such a health-check can be added to Nomad.
@dadgar Could you maybe bump this up in terms of priority? Nomad's removing checks associated with a process that it doesn't know about at the moment, so it's not possible to add a TTL manually and set up the heartbeat.
My preferred implementation would be a service-associated env interpolation token exposing the ttl check id that I could then heartbeat to. What I was attempting to do was manually constructing the service id and registering an associated check.
As an interim, maybe just ttl checks could be ignored from the synchronization? I feel like that wouldn't break anything necessarily.
In the time being, I guess I have to avoid using Nomad's service registration (which I find to be a bummer).
Not sure why this is marked as an enhancement. My nomad jobs are registering themselves as services (within my code) and registering checks associated with those services. (within my code). When nomad runs a garbage collection cycle, it decides to deregister my services/checks. What gives nomad the right to do that? What if my job is registering services external to itself (this job just happens to be registering itself)? Seems like a bug to me.
EDIT: this only occurs if you set the service-name or service-id to nomad defaults.
Bumping this for 2019. We have a bunch of long-running daemons that need better health checking. These cannot currently open an http port, but we are looking into that. A TTL check could be useful.
Most helpful comment
@dadgar Could you maybe bump this up in terms of priority? Nomad's removing checks associated with a process that it doesn't know about at the moment, so it's not possible to add a TTL manually and set up the heartbeat.
My preferred implementation would be a service-associated env interpolation token exposing the ttl check id that I could then heartbeat to. What I was attempting to do was manually constructing the service id and registering an associated check.
As an interim, maybe just ttl checks could be ignored from the synchronization? I feel like that wouldn't break anything necessarily.
In the time being, I guess I have to avoid using Nomad's service registration (which I find to be a bummer).