It looks like we run GET _cat/nodes?local in order to check a pod's readiness:
This API call does two things: it gets a list of nodes from the local cluster state, and then it reaches out to all the nodes in the cluster and obtains some extra info from them. If a single node in the cluster fails to respond in this second step within a few seconds then I think this means we consider all the pods to be failing their readiness checks.
I think it's a bug to accept ?local on this API at all given this gotcha (see https://github.com/elastic/elasticsearch/issues/50088) but also think we should be using a different API for our readiness checks. For instance, GET _cluster/health?timeout=0s returns 200 OK iff the node is part of a cluster with a master, and GET / returns (something) if the node is actually alive and responding.
(I say "something" because in certain versions GET / return 503 Service Unavailable if there is no master, but more recent versions always return 200 OK here).
We changed this in https://github.com/elastic/cloud-on-k8s/issues/1748. The intention was to bind pod readiness to a node local request so that we can evaluate pod readiness independently from overall cluster health/cluster membership as we are only interested in the status of the single node that is being requested and want to know whether it is principally ready to enter into operation.
From what you are saying GET / seems the request that gets us closest to that?
[Update] actually we had the health check at GET / originally and changed it because of the 503s we were seeing.
but more recent versions always return 200 OK here
So do you know which versions that are? I think we saw it on 6.x clusters IIRC
So do you know which versions that are? I think we saw it on 6.x clusters IIRC
It changed in 7.0.0, see https://github.com/elastic/elasticsearch/pull/29045.
From what you are saying GET / seems the request that gets us closest to that?
Maybe, but note that this will consider a running node to be ready even if it cannot contact the master of the cluster, and arguably a node without a master isn't "ready" in any meaningful sense since it can't really serve any useful requests.
In the light of https://github.com/elastic/cloud-on-k8s/issues/1748 maybe we do indeed want GET /, treating 503 as a benign misspelling of 200. Alternatively GET _cluster/health?local would suffice here since that _is_ a local request that always returns 200.
I will admit I don't fully understand the surrounding context -- deleting temporarily-unhealthy pods so enthusiastically seems like it might cause other problems. For instance a node could stop responding even to GET / for a while if it enters a long GC pause, and in that case it seems best to wait.
I'm encountering this readiness problem.
I start 3 nodes in a benchmark, each pod will get readiness probe failed every three or five minutes(not same time). if one of pod get failed, ingress(traefix or nginx) will return 503 service unavailable error, it cause write process terminated immediately.
I will admit I don't fully understand the surrounding context -- deleting temporarily-unhealthy pods so enthusiastically seems like it might cause other problems
The purpose of this readiness check as I see it is only to see if a node has booted successfully to such an extent that it is ready to accept API calls. This readiness state then predicates whether or not the node is included in the k8s service so that clients can reach it.
The purpose of this readiness check as I see it is only to see if a node has booted successfully to such an extent that it is ready to accept API calls. This readiness state then predicates whether or not the node is included in the k8s service so that clients can reach it.
These are two different purposes. A node without a master can technically accept API calls, but can't really handle most of them, so I think it's debatable whether a node without a master should be included in the service so that clients can reach it.
If I'm understanding https://github.com/elastic/elasticsearch/pull/29045 correctly, we might want to:
/ returns 200/ returns 200||503 and contains a body that looks like something we expect (e.g.{
"name" : "quickstart-es-default-0",
"cluster_name" : "quickstart",
"cluster_uuid" : "7Apxo0XXSD6CTNsY_UNNew",
...
Does that make sense?
We discussed offline and are going to implement as proposed in https://github.com/elastic/cloud-on-k8s/issues/2248#issuecomment-564753629
We made this an issue on the Elasticsearch repo: https://github.com/elastic/elasticsearch/issues/50187.
Most helpful comment
If I'm understanding https://github.com/elastic/elasticsearch/pull/29045 correctly, we might want to:
/returns 200/returns 200||503 and contains a body that looks like something we expect (e.g.Does that make sense?