Healthcheck is now checked with some delay so service after registration appear as unhealthy when actually it's healthy. To solve this problem health should be checked just after service is registered.
Would it be possible to perform the healthcheck _before_ the service appears in the registry, so its initial status reflects the true status of the service?
Hi @janisz and @kshep, some health checks can be expensive, and multiple services might get registered all in one go, such as with a consul restart, so we always stagger them to try to randomize their phase. You can set the initial health state, so you can default healthy or unhealthy until the first check. Having it not show up until it has run the first check is an interesting idea. It's practically the same as starting it unhealthy, but I could see how you might care about that if you have monitoring set up.
You can set the initial health state, so you can default healthy or unhealthy until the first check.
How to set initial health state?
There's a status field described in the "Initial Health Check Status" section of the checks guide:
By default, when checks are registered against a Consul agent, the state is set immediately to "critical". This is useful to prevent services from being registered as "passing" and entering the service pool before they are confirmed to be healthy. In certain cases, it may be desirable to specify the initial state of a health check. This can be done by specifying the status field in a health check definition, like so:
{
"check": {
"id": "mem",
"script": "/bin/check_mem",
"interval": "10s",
"status": "passing"
}
}
The above service definition would cause the new "mem" check to be registered with its initial state set to "passing".
You can set the initial health state, so you can default healthy or unhealthy until the first check. Having it not show up until it has run the first check is an interesting idea.
It'd be slick if there were either a fourth(?) initial health status like 'check' that indicated the corresponding service shouldn't be registered until a check is performed.
It's practically the same as starting it unhealthy, but I could see how you might care about that if you have monitoring set up.
That's exactly our use case.
This is related to #2450, my take on that when new healthcheck is added consul should keep the same state of the service until healthcheck is performed. Perhaps this could be implemented by having initial state "unknown"? that makes consul not take it into account when determining service state. Of course this is only when check was added while consul was running, otherwise old behavior is fine.
Btw, I've found that initial status is not always respected. I set initial status to "passing"
service := &api.AgentServiceRegistration{
Name: c.groupName,
Port: 50511,
Checks: api.AgentServiceChecks{
&api.AgentServiceCheck{
HTTP: "http://127.0.0.1:50513/healthcheck",
Interval: "10s",
Status: "passing",
},
},
}
err := agent.ServiceRegister(service)
Randomly service starts in "critical" state until first healthcheck is ran (actually that happens more often than not). Tried on 0.7.3.
I have a script check for which the frequency can be very low (either because the check is expensive or because we don't immediate feedback), for example a check on a disk array. If I set the interval to 1hour, then during 1 hour after startup my check will critical (or Passing if I use the status field, but in my case the use case is that after boot the user will check the health).
What I would need is a way to specifiy an interval + a way to express that the first check should be done immediately and not at the end of the first interval.
Most helpful comment
Would it be possible to perform the healthcheck _before_ the service appears in the registry, so its initial status reflects the true status of the service?