Amazon-vpc-cni-k8s: Make Maximum Pods ENIConfig aware

Created on 23 Feb 2019  Â·  12Comments  Â·  Source: aws/amazon-vpc-cni-k8s

Summary

If you specify an ENIConfig which differs from the primary ENI configuration of the instance on startup, the plugin correctly does not allocate IP addresses on the primary ENI, however it the Maximum Number of Pods assignable to the worker node is still derived based upon the assumption that all secondary IP addresses can be consumed. This results in pods getting stuck in ContainerCreating status.

We should dynamically adjust the maximum number of pods when using an ENIConfig to reflect the maximum number of IPs that can be consuming that align with the ENI Config.

Reproduction Steps

  1. Configure instances to use an ENIConfig in a different subnet from the primary ENI.
  2. Start a single t2.medium or similar instance (must have a healthy number of secondary IP addresses, which t2.medium has 3 x 6).
  3. Start a deployment of a basic pod (e.g. nginx) with a large number of pods (e.g. 200)

Observe that while most pods stay in Pending one ENI's worth of pods will be stuck in ContainerCreating. In the case of a t2.medium, this is 5:

â–¶ kubectl get pod -o wide | grep nginx-deployment | grep ContainerCreating | wc -l
       5

Looking at these pods closely we can see they are stuck looping over the following state:

  Warning  FailedCreatePodSandBox  74s (x2710 over 55m)    kubelet, ip-10-100-0-228.ap-southeast-2.compute.internal  (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "54f42a6a4ce9184111402ed1bbe2b97390abb1f8f70f1b566afca1f688d26727" network for pod "nginx-deployment-67594d6bf6-6zcs9": NetworkPlugin cni failed to set up pod "nginx-deployment-67594d6bf6-6zcs9_default" network: add cmd: failed to assign an IP address to container
2.x CNI plugin enhancement

Most helpful comment

I use this script to generate max-pods.json to paste into my worker node's user-data

import requests
import json

from bs4 import BeautifulSoup

response = requests.get("https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI")

parsed_html = BeautifulSoup(response.text, features="html.parser")

table = parsed_html.find('table', attrs={'id': 'w299aac23c19c19b5'})

rows = table.find_all("tr")

instance_max_pods = {}

for row in rows:
    cells = row.find_all("td")
    if len(cells) < 1:
        continue
    # (# of ENI - 1) * (Max IPs per ENI - 1) + 2
    instance_max_pods[cells[0].text.strip()] = (int(cells[1].text) - 1) * (int(cells[2].text) - 1) + 2 
    # Add two for aws-node and kube-proxy hostNetwork pods
    # IPs per ENI - 1 = One IP address is allocated to the Host ENI itself.
print(json.dumps(instance_max_pods))

All 12 comments

Going to work on a PR to see if I can modify this behaviour next week. Will link here if I come up with a solution.

Thanks @taylorb-syd, I'm trying to catch up a bit on the issues again.

Okay, digging a little deeper, the maximum pod setting is actually set when kubelet starts. While making the CNI plugin aware of the max IP addresses is desirable, it is lower priority than correctly setting the maximum pod setting. I am therefore focusing my attention on modifying the bootstrap script. I will link issues and PRs as/when I create them.

We should dynamically adjust the maximum number of pods when using an ENIConfig to reflect the maximum number of IPs that can be consuming that align with the ENI Config.

Do you plan to handle pods with hostNetwork: true in this effort? It's not critical, but I hate to see nodes underutilized because a bunch of hostNetwork daemonset containers are counted against max pods even though they don't use an additional IP.

@cnelson Unfortunately there is no way to dynamically set the number of max pods in kubelet by the looks of it. It can only be set on startup. Therefore it's not possible to say "don't count this pod towards the max pod limits".

However, that being said, if you know ahead of time how many hostNetwork: true daemon sets you are going to use on a given instance, we might as well add an option to adjust the max pods. I'll add this into my script modifications.

In the basic solution, it's simple. If we properly have access to a true export of this[^1] then we can do:

maxPods = (numInterfaces - 1) * maxIpv4PerInterface + daemonSetPrediction

... in the bootstrap. Where daemonSetPrediction is that operator-provided educated guess.

I'd be happy for that as an initial break-fix, but seems not very good in the long run.

@taylorb-syd curious if you did anything clever yet?

@frimik Happy for you to put attention into this, I have been swamped last month and couldn't put any energy/effort into this.

In case of custom CNI Networking,

maxPods = (numInterfaces - 1) * (maxIpv4PerInterface - 1) + 2 
  • Subtracting 1 from numInterfaces - One interface goes for the node's network.
  • Subtracting 1 from maxIpv4PerInterface - One IP from the custom CNI subnet goes to the worker node.
  • Adding 2 as a constant to account for aws-node and kube-proxy as both use hostNetwork.

I use this script to generate max-pods.json to paste into my worker node's user-data

import requests
import json

from bs4 import BeautifulSoup

response = requests.get("https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI")

parsed_html = BeautifulSoup(response.text, features="html.parser")

table = parsed_html.find('table', attrs={'id': 'w299aac23c19c19b5'})

rows = table.find_all("tr")

instance_max_pods = {}

for row in rows:
    cells = row.find_all("td")
    if len(cells) < 1:
        continue
    # (# of ENI - 1) * (Max IPs per ENI - 1) + 2
    instance_max_pods[cells[0].text.strip()] = (int(cells[1].text) - 1) * (int(cells[2].text) - 1) + 2 
    # Add two for aws-node and kube-proxy hostNetwork pods
    # IPs per ENI - 1 = One IP address is allocated to the Host ENI itself.
print(json.dumps(instance_max_pods))
w299aac23c19c19b5

table ID is changed to w295aac21c13c15b5 now, this needs to be handled as well

Was this page helpful?
0 / 5 - 0 ratings