We should enable the following feature for weave,canal, etc. Currently under disk pressure networking pods can get evicted. This feature was released in k8s 1.5.3
https://github.com/kubernetes/kubernetes/pull/41052
Additional improvements over the memory improvements in #1874
We are seeing weave frequently not getting scheduled under CPU pressure. The strange thing is that the DaemonSet will not consider this an error.
Assuming a 6 node cluster, if one node does not get weave-net due to exhausted CPU resources, the DaemonSet will happily say 5/5 pods running.
Our workaround so far is to set the CPU requests to 0 as the guaranteed QoS (#1894) doesn't seem to work anyways.
This just caused a 30 minutes outage on one of our kubernetes clusters during upgrade from 1.5.4 to 1.6.0 as the basic resources on the master nodes increased. I see it the same way as tazjin. Set the cpu requests to 0 to be sure it is scheduled.
The CriticalPod annotation should resolve your issue - https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-add-on-as-critical
I believe we have these on the pods now. Need to double check
/assign
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
There is a new Priority feature which should help with evictions.
Currently in Alpha, and off by default.
(and I'm led to believe the CriticalPod annotation is not meant for add-ons)
/remove-lifecycle stale
/lifecycle froze
This needs to get addressed
This also needs to be addressed in clusters that have PodPriority enabled, in which case the criticalpod annotation isn't used.
For example, the kube-proxy manifest seems to be hard-coded in nodeup and thus somewhat difficult to reliably change to contain a priority setting that prevents kube-proxy from being evicted (this problem would also be fixed by moving to running kube-proxy from a daemonset).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
kube-dns should also have: “system-cluster-critical” or “system-node-critical” when podpriority is enabled
https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/lifecycle frozen
I think this was intended earlier.
These all have critical PodPriority now.
/close
@johngmyers: Closing this issue.
In response to this:
These all have critical
PodPrioritynow.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.