Current scale:
I did a rollout deploy on my autoscaling group, destroying everything all at once. Now no deployments are getting scheduled… I got this from /var/log/aws-routed-eni/ipamd.log
2018-10-11T18:01:48Z [DEBUG] AssignPodIPv4Address, skip ENI eni-0f07ae33f1742927b that does not have available addresses
2018-10-11T18:01:48Z [DEBUG] AssignPodIPv4Address, skip ENI eni-05614d11f3c1ac652 that does not have available addresses
2018-10-11T18:01:48Z [DEBUG] AssignPodIPv4Address, skip ENI eni-0bf0c79aaf7e90493 that does not have available addresses
2018-10-11T18:01:48Z [INFO] DataStore has no available IP addresses
This is an example error that we are seeing from all of the pods' events.
Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "redis-6f4b649bff-85j4z_dev" network: add cmd: failed to assign an IP address to container
Update: Problem solved by doing an upgrade of the cni, from 1.0.0 to 1.2.1. See guide https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html
For anyone else having similar issues:
we also found updating the CNI to the latest recommended version (see EKS docs) fixed this.
However, we also saw the same/similar issue due to having small subnets and running out of IPs on the subnet for our worker nodes.
Most helpful comment
Update: Problem solved by doing an upgrade of the cni, from 1.0.0 to 1.2.1. See guide https://docs.aws.amazon.com/eks/latest/userguide/cni-upgrades.html