Amazon-vpc-cni-k8s: set ENI_CONFIG_LABEL_DEF to failure-domain.beta.kubernetes.io/zone not work

Created on 6 Sep 2019  路  4Comments  路  Source: aws/amazon-vpc-cni-k8s

Question

I build an cluster on ec2 and use amazon-vpc-cni-k8s as cni plugin, But when I create a pod, it cant gain an ip.

Environment

k8s version:v1.13.2
cni version: amazon-vpc-cni-k8s:v1.5.3

I set AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: "true" and ENI_CONFIG_LABEL_DEF:failure-domain.beta.kubernetes.io/zone, and then create eniConfig for each available zone like this

---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
 name: ap-south-1a
spec:
  securityGroups:
  - sg-6953xxxx
  subnet: subnet-0de8838bc5072xxxx
---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
 name: ap-south-1b
spec:
  securityGroups:
  - sg-6953xxxx
  subnet: subnet-0fb2a0cb71f99xxxx
---
apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
 name: ap-south-1c
spec:
  securityGroups:
  - sg-6953xxxx
  subnet: subnet-02bc53365ae50xxxx

Logs

I found some clues in ipamd.log.

2019-09-06T06:39:20.662Z [ERROR]        Failed to get pod ENI config
2019-09-06T06:00:02.479Z [INFO] Handle corev1.Node: node01.awsind, map[kubeadm.alpha.kubernetes.io/cri-socket:/var/run/dockershim.sock node.alpha.kubernetes.io/ttl:0 volumes.kubernetes.io/controller-managed-attach-detach:true], map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux kubernetes.io/hostname:node01.awsind]
2019-09-06T06:00:02.479Z [INFO] Setting myENI to: default

It seems that the ipam didn't find the right eniConfig and then pick default,but unfortunately I did't set an eniConfig with name default.

question

Most helpful comment

@xvdy, I think you are missing failure-domain.beta.kubernetes.io/zone label on the worker node and is reason why controller is defaulting myEni variable to default.

To verify if the node has label: kubectl describe nodes <node name> |grep failure

I was to replicate the issue by deleting failure-domain.beta.kubernetes.io/zone on my worker node which is part of EKS cluster by running below command:

kubectl label node ip-1-2-3-102.us-west-2.compute.internal failure-domain.beta.kubernetes.io/zone-
[ec2-user@ip-1-2-3-102 aws-routed-eni]$ grep -i setting ipamd.log.2019-09-10-19 |tail -n 10
2019-09-10T19:11:33.744Z [INFO] Setting myENI to: default
2019-09-10T19:11:38.488Z [INFO] Setting myENI to: default

Could you try adding failure-domain.beta.kubernetes.io/zone label to the node, delete the running aws-node cni to allow daemon set to recreate a new and see if that resolves your issue ?

To add label, I used below command:

kubectl label node <worker node name> failure-domain.beta.kubernetes.io/zone=<ap-south-1c>

All 4 comments

Temp Solution

I delete the CNI Configuration Variables: ENI_CONFIG_LABEL_DEF
and the label each node to set which eniConfig to use.
kubectl label no node01.awsind k8s.amazonaws.com/eniConfig=ap-south-1a

Should have some code to choose eniConfig for current zone when lable setted with failure-domain.beta.kubernetes.io/zone ? Or I misuse this feature? @mogren

// Handle handles ENIConfig updates from API Server and store them in local cache
func (h *Handler) Handle(ctx context.Context, event sdk.Event) error {
    switch o := event.Object.(type) {
    case *v1alpha1.ENIConfig:
        eniConfigName := o.GetName()
        if event.Deleted {
            log.Debugf("Deleting ENIConfig: %s", eniConfigName)
            h.controller.eniLock.Lock()
            defer h.controller.eniLock.Unlock()
            delete(h.controller.eni, eniConfigName)
            return nil
        }

        curENIConfig := o.DeepCopy()

        log.Debugf("Handle ENIConfig Add/Update: %s, %v, %s", eniConfigName, curENIConfig.Spec.SecurityGroups, curENIConfig.Spec.Subnet)

        h.controller.eniLock.Lock()
        defer h.controller.eniLock.Unlock()
        h.controller.eni[eniConfigName] = &curENIConfig.Spec

    case *corev1.Node:
        log.Debugf("Handle corev1.Node: %s, %v, %v", o.GetName(), o.GetAnnotations(), o.GetLabels())
        // Get annotations if not found get labels if not found fallback use default
        if h.controller.myNodeName == o.GetName() {
            val, ok := o.GetAnnotations()[h.controller.eniConfigAnnotationDef]
            if !ok {
                val, ok = o.GetLabels()[h.controller.eniConfigLabelDef]
                if !ok {
                    val = eniConfigDefault
                }
                                 // Should have some code like this?
                if val == "failure-domain.beta.kubernetes.io/zone"{
                    // get available zone for current node
                    // set val to current az to use eniConfig of current zone
                }
            }

            if h.controller.myENI != val {
                h.controller.eniLock.Lock()
                defer h.controller.eniLock.Unlock()
                h.controller.myENI = val
                log.Debugf("Setting myENI to: %s", val)
            }
        }
    }
    return nil
}

@xvdy, I think you are missing failure-domain.beta.kubernetes.io/zone label on the worker node and is reason why controller is defaulting myEni variable to default.

To verify if the node has label: kubectl describe nodes <node name> |grep failure

I was to replicate the issue by deleting failure-domain.beta.kubernetes.io/zone on my worker node which is part of EKS cluster by running below command:

kubectl label node ip-1-2-3-102.us-west-2.compute.internal failure-domain.beta.kubernetes.io/zone-
[ec2-user@ip-1-2-3-102 aws-routed-eni]$ grep -i setting ipamd.log.2019-09-10-19 |tail -n 10
2019-09-10T19:11:33.744Z [INFO] Setting myENI to: default
2019-09-10T19:11:38.488Z [INFO] Setting myENI to: default

Could you try adding failure-domain.beta.kubernetes.io/zone label to the node, delete the running aws-node cni to allow daemon set to recreate a new and see if that resolves your issue ?

To add label, I used below command:

kubectl label node <worker node name> failure-domain.beta.kubernetes.io/zone=<ap-south-1c>

Thank you for your answer.I use kubeadm to build the cluster without setting specific cloudprovider to aws, so there is no label named failure-domain.beta.kubernetes.io/zone. This is why my ipamd can't find the right eniConfig.

Was this page helpful?
0 / 5 - 0 ratings