Kops: Ensure critical networking pods don't get evicted

Created on 15 Feb 2017  ·  16Comments  ·  Source: kubernetes/kops

We should enable the following feature for weave,canal, etc. Currently under disk pressure networking pods can get evicted. This feature was released in k8s 1.5.3

https://github.com/kubernetes/kubernetes/pull/41052

Additional improvements over the memory improvements in #1874

P1 lifecyclfrozen

All 16 comments

We are seeing weave frequently not getting scheduled under CPU pressure. The strange thing is that the DaemonSet will not consider this an error.

Assuming a 6 node cluster, if one node does not get weave-net due to exhausted CPU resources, the DaemonSet will happily say 5/5 pods running.

Our workaround so far is to set the CPU requests to 0 as the guaranteed QoS (#1894) doesn't seem to work anyways.

This just caused a 30 minutes outage on one of our kubernetes clusters during upgrade from 1.5.4 to 1.6.0 as the basic resources on the master nodes increased. I see it the same way as tazjin. Set the cpu requests to 0 to be sure it is scheduled.

The CriticalPod annotation should resolve your issue - https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-add-on-as-critical

I believe we have these on the pods now. Need to double check

/assign

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

There is a new Priority feature which should help with evictions.

Currently in Alpha, and off by default.

(and I'm led to believe the CriticalPod annotation is not meant for add-ons)

/remove-lifecycle stale

/lifecycle froze

This needs to get addressed

This also needs to be addressed in clusters that have PodPriority enabled, in which case the criticalpod annotation isn't used.

For example, the kube-proxy manifest seems to be hard-coded in nodeup and thus somewhat difficult to reliably change to contain a priority setting that prevents kube-proxy from being evicted (this problem would also be fixed by moving to running kube-proxy from a daemonset).

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

kube-dns should also have: “system-cluster-critical” or “system-node-critical” when podpriority is enabled
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/lifecycle frozen

I think this was intended earlier.

These all have critical PodPriority now.
/close

@johngmyers: Closing this issue.

In response to this:

These all have critical PodPriority now.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

drewfisher314 picture drewfisher314  ·  4Comments

rot26 picture rot26  ·  5Comments

owenmorgan picture owenmorgan  ·  3Comments

georgebuckerfield picture georgebuckerfield  ·  4Comments

joshbranham picture joshbranham  ·  3Comments