Kops: kubernetes node down when insufficient resources

Created on 29 Aug 2018  路  9Comments  路  Source: kubernetes/kops

1. What kops version are you running? The command kops version, will display
this information.

Version 1.9.1

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T22:29:25Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.0", GitCommit:"91e7b4fd31fcd3d5f436da26c980becec37ceefe", GitTreeState:"clean", BuildDate:"2018-06-27T20:08:34Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
I deploy a cluster with three nodes on aws and install istio version 1.0 with autosidecar. When new pod deploy to node has not enough resources, node is down with status not ready.
NAME STATUS ROLES AGE VERSION
ip-172-20-32-136.ap-southeast-1.compute.internal Ready node 10d v1.11.0
ip-172-20-49-210.ap-southeast-1.compute.internal Ready master 10d v1.11.0
ip-172-20-52-223.ap-southeast-1.compute.internal NotReady node 1h v1.11.0

Node detail events:
Warning SystemOOM 58m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal System OOM encountered
Normal NodeHasSufficientMemory 58m (x2 over 1h) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 58m (x2 over 1h) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasNoDiskPressure
Normal NodeReady 58m (x2 over 1h) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeReady
Normal NodeHasSufficientDisk 58m (x2 over 1h) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasSufficientDisk
Normal NodeAllocatableEnforced 56m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Updated Node Allocatable limit across pods
Normal Starting 56m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Starting kubelet.
Normal NodeHasSufficientPID 56m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasSufficientPID
Warning Rebooted 56m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal has been rebooted, boot id: 1894c080-4ec4-4226-a532-fb364d783ddd
Normal Starting 56m kube-proxy, ip-172-20-52-223.ap-southeast-1.compute.internal Starting kube-proxy.
Warning ContainerGCFailed 28m kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal rpc error: code = DeadlineExceeded desc = context deadline exceeded
Normal NodeNotReady 28m (x2 over 56m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeNotReady
Warning SystemOOM 24m (x3 over 28m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal System OOM encountered
Normal NodeReady 24m (x4 over 56m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeReady
Normal NodeHasSufficientMemory 24m (x5 over 56m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasSufficientMemory
Normal NodeHasSufficientDisk 24m (x5 over 56m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasSufficientDisk
Normal NodeHasNoDiskPressure 24m (x5 over 56m) kubelet, ip-172-20-52-223.ap-southeast-1.compute.internal Node ip-172-20-52-223.ap-southeast-1.compute.internal status is now: NodeHasNoDiskPressure

But ec2 instance for this has 120GB disk.

6. What did you expect to happen?
Node isn't down

**7. Please provide your cluster manifest. Execute

cluster.txt

Most helpful comment

Hey @truyet

at first try to limit all resources from deployments, pods, ... as described by @huang-jy. Then your pods should not allocate to much memory and your system would be run without impact. In case of OOM-Kills only one pod will be killed and not the complete node.

Additionally you could set resource reservations to the internal systems:

spec:
    kubelet:
        kubeReserved:
            cpu: "100m"
            memory: "256Mi"
        systemReserved:
            cpu: "100m"
            memory: "768Mi"

You can add this to your ClusterSpec with Kops edit cluster
With this settings in place, the system will reserve the given memory to the internal components.

All 9 comments

Does this error occur when you have nothing deployed other than the basic pods and istio? If not, then you should probably look at your resource definitions in your deployments.

Hi @huang-jy , I just deploy some java services use istio auto sidecar. I destroy this cluster and deploy new cluster with default kubernetes version 10.3 to test again.

Hi @huang-jy , I don't define resource for my deployments on cluster. When a node have insufficient resources. It don't change to node have enough resource and down all service on node to unknown with node status not ready:

ip-.compute.internal Ready master 3d v1.10.3 Container Linux by CoreOS 1800.7.0 (Rhyolite) 4.14.63-coreos docker://18.3.1
ip-.compute.internal NotReady node 1d v1.10.3 Container Linux by CoreOS 1800.7.0 (Rhyolite) 4.14.63-coreos docker://18.3.1
ip-.compute.internal Ready node 1d v1.10.3 Container Linux by CoreOS 1800.7.0 (Rhyolite) 4.14.63-coreos docker://18.3.1

I have seen this happen before when an application was able to claim more resources than normal, and the application did not have a limit set on the resource claim. Whilst not enough to cause kubernetes to evict it, it claimed more than enough, leaving the kube system pods less to work with and causing the node to behave slower (and report back as notready). Increasing the node size helped in my case, as did setting specific, conservative limits on the max resource (memory/cpu) the application can claim.

Hi @huang-jy ,
Thankyou, you're right. I have a question hope you free to answer:
Do we can config limited resources can use on kubernetes node?

Because I want to save resources for kubelet work good on node, which response status to master. It can avoid status "notready" on node detail.

It is generally good practice to have (at minimum) a spec.containers[].resources.requests clause to allow your container to be scheduled even if there isn't much memory, and this works well if your application generally trundles along with a fairly constant memory usage with no spikes or minimal spikes.

It is also a very good idea to set a limit on your containers resources. If you know your application isn't going to go higher than 2G, then set the limit at 2G -- this allows Kubernetes to schedule more containers on that host as you're telling it upfront you won't use more than that. If your app misbehaves and it goes beyond that, you'll see it get killed and it'll show something like "OOMKilled" on the kubectl describe pod output -- telling you that you've either underestimated your resource limit, or your node is too small to handle your container. If you underprovision your CPU limit, your container gets throttled instead, but doesn't get killed.

Be warned that if you don't set a limit, the container COULD claim all the memory on the node, which, if the node is the master, and you have allowed scheduling on the master node, you could cause your entire cluster to become unresponsive (yes, I did that as an experiment and had to restart the node).

On standard worker nodes, it'll slow them down enough for them to start reporting NotReady to the master (as happened in your case)

Here's a nice video from Google to detail the resource requests and limits

https://www.youtube.com/watch?v=xjpHggHKm78

Hey @truyet

at first try to limit all resources from deployments, pods, ... as described by @huang-jy. Then your pods should not allocate to much memory and your system would be run without impact. In case of OOM-Kills only one pod will be killed and not the complete node.

Additionally you could set resource reservations to the internal systems:

spec:
    kubelet:
        kubeReserved:
            cpu: "100m"
            memory: "256Mi"
        systemReserved:
            cpu: "100m"
            memory: "768Mi"

You can add this to your ClusterSpec with Kops edit cluster
With this settings in place, the system will reserve the given memory to the internal components.

@truyet @dhemeier you can also specify namespace defaults to cover the times when an application doesn't declare upfront what it wants

Thank @dhemeier for kubelet configuration. I'm solved my case use limit resources suggest from @huang-jy
In my case, because I deploy java application on docker but JVM don't use docker memory limit by default, which used system memory. Node get over memory with no swap, which is suspended and waiting other services release resources. I reconfigured JVM options to work with docker memory limit and node works well.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DocValerian picture DocValerian  路  4Comments

owenmorgan picture owenmorgan  路  3Comments

chrislovecnm picture chrislovecnm  路  3Comments

lnformer picture lnformer  路  3Comments

thejsj picture thejsj  路  4Comments