Amazon-vpc-cni-k8s: VPC CIDR is cached forever causing pod routing issues when secondary VPC CIDR ranges are created afterwards

Created on 2 Dec 2019 · 10Comments · Source: aws/amazon-vpc-cni-k8s

Problem statement:
As of today, VPC CIDR ranges are cached during initialization. If new CIDR ranges are added afterwards to address IP space issue, the CNI should be restarted to fetch new CIDR ranges to update the cache to add ip rules/routes to reach to other pods in the cluster with new subnet IP range.

Solution:
Refresh VPC CIDR ranges cache every 2 seconds to avoid staleness.

Steps to replicate the issue:
1) Create EKS cluster in a VPC which has just one CIDR range (10.10.0.0/16)
2) Create worker nodes in the above VPC CIDr range
3) Add secondary CIDR range (100.10.0.0/16) to existing VPC
4) Launch new worker nodes in the subnet which has 100.10.0.0/24 CIDR
5) Pod 1 with 100.10.12.13 IP part of secondary VPC CIDR subnet cannot be talk to coreDNS pod with IP 10.10.23.45 which is part of primary VPC CIDR subnet.

enhancement prioritP1

Source

nithu0115

👍4

Most helpful comment

Before getting the rollout for the fix in next couple versions, I would to post the couple suggestions/workarounds for users having the cache issue after adding secondary VPC CIDR:

Restart the CNI Plugin should remedy this issue. This can be done by adding label or update ENV of CNI Plugin to trigger this update behavior.
For production usage concern, it would be great if can launch new work node and so new running aws-node Pod should aware the secondary VPC CIDR, then replace the old one to ensure it won't impact existing environment.

0xlen on 30 Dec 2019

👍2

All 10 comments

Every two seconds may be a bit excessive. Certainly we should refresh the VPC CIDR ranges periodically, but I think 15 or 30 seconds might be a better interval. Alternately, is there a way we can be notified of VPC subnets being created, deleted or modified instead of periodically refreshing our view?

jaypipes on 3 Dec 2019

Every two seconds may be a bit excessive. Certainly we should refresh the VPC CIDR ranges periodically, but I think 15 or 30 seconds might be a better interval.

Agreed! I will go with 15 seconds then.

Alternately, is there a way we can be notified of VPC subnets being created, deleted or modified instead of periodically refreshing our view?

I did some research around adding a watch/poll to get notifications about VPC changes without making AWS API calls, but did not find a way.

nithu0115 on 5 Dec 2019

Before getting the rollout for the fix in next couple versions, I would to post the couple suggestions/workarounds for users having the cache issue after adding secondary VPC CIDR:

Restart the CNI Plugin should remedy this issue. This can be done by adding label or update ENV of CNI Plugin to trigger this update behavior.
For production usage concern, it would be great if can launch new work node and so new running aws-node Pod should aware the secondary VPC CIDR, then replace the old one to ensure it won't impact existing environment.

0xlen on 30 Dec 2019

👍2

Just to add a note: Simple refresh the cache for 2 second / or 15 second won’t solve this problem
it will only make new pods ok, but not old pods

M00nF1sh on 18 Mar 2020

@M00nF1sh, I was planning to add ticker function to update cache every x seconds and if there are any updates like additional CIDRs, then re-configure the rules/routes which will should fix both old and new pods.

nithu0115 on 18 Mar 2020

We have also been facing this problem recently and have temporarily solved using https://github.com/giantswarm/aws-cni-restarter wcan help test this once PR #903 has been merged

paurosello on 25 May 2020

❤1

@paurosello #903 has been merged, and some follow up PRs as well, so in order to test these changes you would have to use a build of the latest master branch. Be sure to use the configs in /config/master since the config have been changed quite a bit, including adding an init container. Not sure yet when we will have a v1.7.0-rc1 ready for testing.