I build an cluster on ec2 and use amazon-vpc-cni-k8s as cni plugin, But when I create a pod, it cant gain an ip.
k8s version:v1.13.2
cni version: amazon-vpc-cni-k8s:v1.5.3
I set AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: "true" and ENI_CONFIG_LABEL_DEF:failure-domain.beta.kubernetes.io/zone, and then create eniConfig for each available zone like this
---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
name: ap-south-1a
spec:
securityGroups:
- sg-6953xxxx
subnet: subnet-0de8838bc5072xxxx
---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
name: ap-south-1b
spec:
securityGroups:
- sg-6953xxxx
subnet: subnet-0fb2a0cb71f99xxxx
---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
name: ap-south-1c
spec:
securityGroups:
- sg-6953xxxx
subnet: subnet-02bc53365ae50xxxx
I found some clues in ipamd.log.
2019-09-06T06:39:20.662Z [ERROR] Failed to get pod ENI config
2019-09-06T06:00:02.479Z [INFO] Handle corev1.Node: node01.awsind, map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 volumes.kubernetes.io/controller-managed-attach-detach:true], map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/hostname:node01.awsind]
2019-09-06T06:00:02.479Z [INFO] Setting myENI to: default
It seems that the ipam didn't find the right eniConfig and then pick default,but unfortunately I did't set an eniConfig with name default.
I delete the CNI Configuration Variables: ENI_CONFIG_LABEL_DEF
and the label each node to set which eniConfig to use.
kubectl label no node01.awsind k8s.amazonaws.com/eniConfig=ap-south-1a
Should have some code to choose eniConfig for current zone when lable setted with failure-domain.beta.kubernetes.io/zone ? Or I misuse this feature? @mogren
// Handle handles ENIConfig updates from API Server and store them in local cache
func (h *Handler) Handle(ctx context.Context, event sdk.Event) error {
switch o := event.Object.(type) {
case *v1alpha1.ENIConfig:
eniConfigName := o.GetName()
if event.Deleted {
log.Debugf("Deleting ENIConfig: %s", eniConfigName)
h.controller.eniLock.Lock()
defer h.controller.eniLock.Unlock()
delete(h.controller.eni, eniConfigName)
return nil
}
curENIConfig := o.DeepCopy()
log.Debugf("Handle ENIConfig Add/Update: %s, %v, %s", eniConfigName, curENIConfig.Spec.SecurityGroups, curENIConfig.Spec.Subnet)
h.controller.eniLock.Lock()
defer h.controller.eniLock.Unlock()
h.controller.eni[eniConfigName] = &curENIConfig.Spec
case *corev1.Node:
log.Debugf("Handle corev1.Node: %s, %v, %v", o.GetName(), o.GetAnnotations(), o.GetLabels())
// Get annotations if not found get labels if not found fallback use default
if h.controller.myNodeName == o.GetName() {
val, ok := o.GetAnnotations()[h.controller.eniConfigAnnotationDef]
if !ok {
val, ok = o.GetLabels()[h.controller.eniConfigLabelDef]
if !ok {
val = eniConfigDefault
}
// Should have some code like this?
if val == "failure-domain.beta.kubernetes.io/zone"{
// get available zone for current node
// set val to current az to use eniConfig of current zone
}
}
if h.controller.myENI != val {
h.controller.eniLock.Lock()
defer h.controller.eniLock.Unlock()
h.controller.myENI = val
log.Debugf("Setting myENI to: %s", val)
}
}
}
return nil
}
@xvdy, I think you are missing failure-domain.beta.kubernetes.io/zone label on the worker node and is reason why controller is defaulting myEni variable to default.
To verify if the node has label: kubectl describe nodes <node name> |grep failure
I was to replicate the issue by deleting failure-domain.beta.kubernetes.io/zone on my worker node which is part of EKS cluster by running below command:
kubectl label node ip-1-2-3-102.us-west-2.compute.internal failure-domain.beta.kubernetes.io/zone-
[ec2-user@ip-1-2-3-102 aws-routed-eni]$ grep -i setting ipamd.log.2019-09-10-19 |tail -n 10
2019-09-10T19:11:33.744Z [INFO] Setting myENI to: default
2019-09-10T19:11:38.488Z [INFO] Setting myENI to: default
Could you try adding failure-domain.beta.kubernetes.io/zone label to the node, delete the running aws-node cni to allow daemon set to recreate a new and see if that resolves your issue ?
To add label, I used below command:
kubectl label node <worker node name> failure-domain.beta.kubernetes.io/zone=<ap-south-1c>
Thank you for your answer.I use kubeadm to build the cluster without setting specific cloudprovider to aws, so there is no label named failure-domain.beta.kubernetes.io/zone. This is why my ipamd can't find the right eniConfig.
Most helpful comment
@xvdy, I think you are missing
failure-domain.beta.kubernetes.io/zonelabel on the worker node and is reason why controller is defaulting myEni variable todefault.To verify if the node has label:
kubectl describe nodes <node name> |grep failureI was to replicate the issue by deleting
failure-domain.beta.kubernetes.io/zoneon my worker node which is part of EKS cluster by running below command:Could you try adding
failure-domain.beta.kubernetes.io/zonelabel to the node, delete the running aws-node cni to allow daemon set to recreate a new and see if that resolves your issue ?To add label, I used below command: