It will be helpful to be able to specify a custom readiness check in certain cases such as Centos and RHEL 7 3.10 kernel with kmem accounting to avoid pod improperly being accounted for memory consumption repeatedly by curl against an https endpoint.
Disabling curl at this time is the workaround to avoid the pod being killed, and once an alternative is identified it will be beneficial to be able to specify the alternative.
You can do this today by specifying your own readiness probe via the pod template:
spec:
nodes:
- podTemplate:
containers:
- name: elasticsearch
readinessProbe:
...
If you enjoy using yaml anchors, and if you have multiple nodes entries (say, one set of data nodes and one set of master-eligible nodes), you can reduce the number of times you specify readinessProbe by using yaml anchors and references:
spec:
nodes:
- count: 6
podTemplate: &custom
containers:
- name: elasticsearch
readinessProbe:
...
- count: 3
podTemplate: *custom
This uses yaml anchors to tell the yaml parser that the 2nd podTemplate entry should have its value come from the first podTemplate (an anchor named custom)
I was considering closing this out but I wonder if it might be worth calling this out in the Openshift doc. @barkbay do you have an opinion one way or the other?
@mikeh-elastic if I recall correctly this was brought up because it was OOMing on v3 kernels (so rhel/centos 7) with an https health check, but not an http health check, is that right?
Yes, CentOS/RHEL 7 kernels have a bug, see also https://github.com/elastic/cloud-on-k8s/issues/1076#issuecomment-503894627
As Peter mentioned you can specify your own readiness check.
Regarding Openshift I would not add a note in the doc unless the problem also occurs on it. I don't understand if it is the case here, if it's the case then we need an other issue.
certain cases such as Centos and RHEL 7 3.10 kernel with kmem accounting
For posterity, I'm writing this here...
I'm seeing a similar issue on Google Container OS 69, kernel 4.14.127+, GKE 1.12.8-gke.10
Tests performed:
With the symptoms (memory growth with default readinessProbe), I see the following:
dentries in /proc/slabinfo (viewable with slabtop if desired)echo 2 > /proc/sys/vm/drop_caches but then rises again.There's a kernel bug here that seems unfixed in some set of newer kernels. We're tracking this elsewhere as an internal ticket, but in case anyone ends up here from an internet search... you aren't alone ;)
being accounted for memory consumption repeatedly by curl against an https endpoint
@mikeh-elastic good news! Teaming up with @pmoust, we got a solution that doesn't blow up memory usage (in my testing):
Citation: https://bugzilla.redhat.com/show_bug.cgi?id=1571183
If you set export NSS_SDB_USE_CACHE=no, this problem goes away because it disables the particular behavior within curl (libnss). I believe there's still a bug in Kubernetes (or the Linux Kernel) causing memory accounting to be incorrect, but at least this env var setting will prevent the curl in the readinessProbe from making this problem worse.
Relevant kubernetes issue (filesystem cache is counting _against_ pod memory usage): https://github.com/kubernetes/kubernetes/issues/43916
After some discussion with Google, our strongest hypothesis is that there's a kernel bug in how memory is accounted (or a bug in how kubernetes interprets memory accounting). Tracking it here: https://issuetracker.google.com/issues/140577001
https://github.com/elastic/cloud-on-k8s/pull/1716 sets the an environment variable which should help minimize symptoms of kernel/kubernetes memory accounting bug caused by the readinessProbe's curl.
Most helpful comment
@mikeh-elastic good news! Teaming up with @pmoust, we got a solution that doesn't blow up memory usage (in my testing):
Citation: https://bugzilla.redhat.com/show_bug.cgi?id=1571183
If you set
export NSS_SDB_USE_CACHE=no, this problem goes away because it disables the particular behavior within curl (libnss). I believe there's still a bug in Kubernetes (or the Linux Kernel) causing memory accounting to be incorrect, but at least this env var setting will prevent the curl in the readinessProbe from making this problem worse.