Kops: Ensure critical networking pods don't get evicted

Created on 15 Feb 2017 · 16Comments · Source: kubernetes/kops

We should enable the following feature for weave,canal, etc. Currently under disk pressure networking pods can get evicted. This feature was released in k8s 1.5.3

https://github.com/kubernetes/kubernetes/pull/41052

Additional improvements over the memory improvements in #1874

P1 lifecyclfrozen

Source

sstarcher

👍1

All 16 comments

We are seeing weave frequently not getting scheduled under CPU pressure. The strange thing is that the DaemonSet will not consider this an error.

Assuming a 6 node cluster, if one node does not get weave-net due to exhausted CPU resources, the DaemonSet will happily say 5/5 pods running.

Our workaround so far is to set the CPU requests to 0 as the guaranteed QoS (#1894) doesn't seem to work anyways.

tazjin on 15 May 2017

👎1

This just caused a 30 minutes outage on one of our kubernetes clusters during upgrade from 1.5.4 to 1.6.0 as the basic resources on the master nodes increased. I see it the same way as tazjin. Set the cpu requests to 0 to be sure it is scheduled.

thomaspeitz on 17 May 2017

The CriticalPod annotation should resolve your issue - https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-add-on-as-critical

sstarcher on 17 May 2017

I believe we have these on the pods now. Need to double check

/assign

chrislovecnm on 14 Oct 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 12 Jan 2018

There is a new Priority feature which should help with evictions.

Currently in Alpha, and off by default.

(and I'm led to believe the CriticalPod annotation is not meant for add-ons)

bboreham on 12 Jan 2018

/remove-lifecycle stale

bboreham on 12 Jan 2018

/lifecycle froze

This needs to get addressed

chrislovecnm on 12 Jan 2018

This also needs to be addressed in clusters that have PodPriority enabled, in which case the criticalpod annotation isn't used.

For example, the kube-proxy manifest seems to be hard-coded in nodeup and thus somewhat difficult to reliably change to contain a priority setting that prevents kube-proxy from being evicted (this problem would also be fixed by moving to running kube-proxy from a daemonset).

ghost on 18 Jan 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 18 Apr 2018

/remove-lifecycle stale

sp-joseluis-ledesma on 2 May 2018

kube-dns should also have: “system-cluster-critical” or “system-node-critical” when podpriority is enabled
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

sp-joseluis-ledesma on 2 May 2018

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 31 Jul 2018

/lifecycle frozen

I think this was intended earlier.

mars64 on 8 Aug 2018

👍1

These all have critical PodPriority now.
/close

johngmyers on 9 Aug 2020

@johngmyers: Closing this issue.

In response to this:

These all have critical PodPriority now.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.