Amazon-vpc-cni-k8s: CNI plugin 1.0.0 crashes while 0.1.4 works fine in the same env

Created on 15 Jun 2018  Â·  13Comments  Â·  Source: aws/amazon-vpc-cni-k8s

Hello,

We are using this plugin in our own k8s cluster and everything has been working fine until we upgrade to 1.0.0 (redeploy a full cluster, not in-place upgrade).

A few unusual things regarding our network/cluster:

  • We have VPC with multiple secondary IPv4 CIDR blocks (as per doc)
  • We have a proxy for all outbound connections.
  • To inject proxy settings to pods we use Pod Preset.
  1. Plugin crushes with the following logs:
=====Starting installing AWS-CNI =========
=====Starting amazon-k8s-agent ===========
ERROR: logging before flag.Parse: W0615 02:36:27.560592       9 client_config.go:533] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
  1. Logs in /var/log/aws-routed-eni
  2. ls -al /var/log/aws-routed-eni
total 920
drwxr--r-x+  2 root root     67 Jun 15 12:12 .
drwxr-xr-x+ 17 root root   4096 Jun 15 12:12 ..
-rw-r--r--+  1 root root   1652 Jun 15 12:36 ipamd.log.2018-06-15-02
-rw-r--r--+  1 root root 814228 Jun 15 12:40 plugin.log.2018-06-15-02
  • cat /var/log/aws-routed-eni/ipamd.log.2018-06-15-02
2018-06-15T02:12:30Z [INFO] Starting L-IPAMD 1.0.0  ...
2018-06-15T02:12:30Z [INFO] Testing communication with server
[ ... skipped many duplicates ...]
2018-06-15T02:36:27Z [INFO] Starting L-IPAMD 1.0.0  ...
2018-06-15T02:36:27Z [INFO] Testing communication with server
  • cat /var/log/aws-routed-eni/plugin.log.2018-06-15-02 (thousands similar lines are skipped)
018-06-15T02:26:38Z [INFO] Received CNI add request: ContainerID(c11bf6d18d938bf9e64a48b889358ef1f9d919e3e7af70d44971d60e57167a8b) Netns(/proc/7106/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=c11bf6d18d938bf9e64a48b889358ef1f9d919e3e7af70d44971d60e57167a8b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:39Z [INFO] Received CNI add request: ContainerID(cdf5d2dc0d4c449fe81f95e7e1179d0de4b23497e85ebda29abc9069ea98166e) Netns(/proc/7195/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=cdf5d2dc0d4c449fe81f95e7e1179d0de4b23497e85ebda29abc9069ea98166e) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:40Z [INFO] Received CNI add request: ContainerID(4d80aefed852bcda93348b54f33a74fbe32feec9a50129ff3b0439f5048e10a4) Netns(/proc/7282/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=4d80aefed852bcda93348b54f33a74fbe32feec9a50129ff3b0439f5048e10a4) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:41Z [INFO] Received CNI add request: ContainerID(f2d7900a1939205c589abf4e021c49270e8ee0999eda9b364525efc603a3c15b) Netns(/proc/7366/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f2d7900a1939205c589abf4e021c49270e8ee0999eda9b364525efc603a3c15b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:42Z [INFO] Received CNI add request: ContainerID(aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca) Netns(/proc/7458/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:42Z [ERROR] Error received from AddNetwork grpc call for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container aa143a27a4097346862475f597b2933eef573487051bfac0e10b2d7feff1baca: rpc error: code = Unavailable desc = grpc: the connection is unavailable
2018-06-15T02:26:43Z [INFO] Received CNI add request: ContainerID(2b1fbb2744ae7d5e72c52230346f9a331650847724ceb68ad72f2de373847a5e) Netns(/proc/7546/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=2b1fbb2744ae7d5e72c52230346f9a331650847724ceb68ad72f2de373847a5e) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:44Z [INFO] Received CNI add request: ContainerID(ebadb4daf43fd330f1182b4b5b9797e12406e68c1156a9e5fe75311cd9f26cec) Netns(/proc/7633/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=ebadb4daf43fd330f1182b4b5b9797e12406e68c1156a9e5fe75311cd9f26cec) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:45Z [INFO] Received CNI add request: ContainerID(f6b511e25f47017d8439c4c90af6e3a222064fd3b2c2c16eb17694a251e9e466) Netns(/proc/7721/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f6b511e25f47017d8439c4c90af6e3a222064fd3b2c2c16eb17694a251e9e466) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:46Z [INFO] Received CNI add request: ContainerID(2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7) Netns(/proc/7806/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:26:46Z [ERROR] Error received from AddNetwork grpc call for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container 2ffaa10f5ba964324a90f995f1a8d03a36a04fcfc75e697ece416c5aae3684a7: rpc error: code = Unavailable desc = grpc: the connection is unavailable
2018-06-15T02:26:48Z [INFO] Received CNI add request: ContainerID(3c3ca85bdaa36f50790d0143225581ef7681fb699c11920b670d89d284ab9630) Netns(/proc/7896/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=3c3ca85bdaa36f50790d0143225581ef7681fb699c11920b670d89d284ab9630) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
  1. Pod:

    • kubectl get pod aws-node-9dm8j -o yaml -n kube-system

apiVersion: v1
kind: Pod
metadata:
  annotations:
    podpreset.admission.kubernetes.io/podpreset-proxy-preset: "230"
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: 2018-06-15T02:22:02Z
  generateName: aws-node-
  labels:
    controller-revision-hash: "993007391"
    k8s-app: aws-node
    pod-template-generation: "1"
  name: aws-node-9dm8j
  namespace: kube-system
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: aws-node
    uid: 8af02149-7041-11e8-b62b-02877b22b514
  resourceVersion: "1640"
  selfLink: /api/v1/namespaces/kube-system/pods/aws-node-9dm8j
  uid: e367fa34-7042-11e8-b62b-02877b22b514
spec:
  containers:
  - env:
    - name: AWS_VPC_K8S_CNI_LOGLEVEL
      value: DEBUG
    - name: MY_NODE_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
    - name: WARM_ENI_TARGET
      value: "1"
    envFrom:
    - configMapRef:
        name: proxy-config
    image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.0.0
    imagePullPolicy: IfNotPresent
    name: aws-node
    resources:
      requests:
        cpu: 10m
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /host/opt/cni/bin
      name: cni-bin-dir
    - mountPath: /host/etc/cni/net.d
      name: cni-net-dir
    - mountPath: /host/var/log
      name: log-dir
    - mountPath: /var/run/docker.sock
      name: dockersock
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: aws-node-token-g4hp6
      readOnly: true
  dnsPolicy: ClusterFirst
  hostNetwork: true
  nodeName: ip-10-8-208-183.ap-southeast-2.compute.internal
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: aws-node
  serviceAccountName: aws-node
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - hostPath:
      path: /opt/cni/bin
      type: ""
    name: cni-bin-dir
  - hostPath:
      path: /etc/cni/net.d
      type: ""
    name: cni-net-dir
  - hostPath:
      path: /var/log
      type: ""
    name: log-dir
  - hostPath:
      path: /var/run/docker.sock
      type: ""
    name: dockersock
  - name: aws-node-token-g4hp6
    secret:
      defaultMode: 420
      secretName: aws-node-token-g4hp6
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-06-15T02:22:02Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-06-15T02:42:38Z
    message: 'containers with unready status: [aws-node]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-06-15T02:22:02Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa
    image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:1.0.0
    imageID: docker://sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae
    lastState:
      terminated:
        containerID: docker://2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa
        exitCode: 1
        finishedAt: 2018-06-15T02:42:38Z
        reason: Error
        startedAt: 2018-06-15T02:42:08Z
    name: aws-node
    ready: false
    restartCount: 8
    state:
      waiting:
        message: Back-off 5m0s restarting failed container=aws-node pod=aws-node-9dm8j_kube-system(e367fa34-7042-11e8-b62b-02877b22b514)
        reason: CrashLoopBackOff
  hostIP: 10.8.208.183
  phase: Running
  podIP: 10.8.208.183
  qosClass: Burstable
  startTime: 2018-06-15T02:22:02Z
  1. Container:
  2. docker ps -a | grep aws-node-9dm8j
2a79a400f494        7e6390decb99                                                   "/bin/sh -c /app/ins…"   About a minute ago       Exited (1) About a minute ago                       k8s_aws-node_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_8
91c0e4bc827d        gcrio.artifactory.ai.cba/google_containers/pause:latest        "/pause"                 22 minutes ago           Up 22 minutes                                       k8s_POD_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_0
  • docker inspect 2a79a400f494
[
    {
        "Id": "2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa",
        "Created": "2018-06-15T02:42:08.310748437Z",
        "Path": "/bin/sh",
        "Args": [
            "-c",
            "/app/install-aws.sh"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 1,
            "Error": "",
            "StartedAt": "2018-06-15T02:42:08.478002133Z",
            "FinishedAt": "2018-06-15T02:42:38.586149666Z"
        },
        "Image": "sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae",
        "ResolvConfPath": "/var/lib/docker/containers/91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290/hostname",
        "HostsPath": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts",
        "LogPath": "/var/lib/docker/containers/2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa/2a79a400f49413b8367c6ea918f5865ab0c228a41b88f1c7f59f7737013c14aa-json.log",
        "Name": "/k8s_aws-node_aws-node-9dm8j_kube-system_e367fa34-7042-11e8-b62b-02877b22b514_8",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/opt/cni/bin:/host/opt/cni/bin",
                "/etc/cni/net.d:/host/etc/cni/net.d",
                "/var/log:/host/var/log",
                "/var/run/docker.sock:/var/run/docker.sock",
                "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/volumes/kubernetes.io~secret/aws-node-token-g4hp6:/var/run/secrets/kubernetes.io/serviceaccount:ro,Z",
                "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts:/etc/hosts:Z",
                "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/containers/aws-node/da09d6ea:/dev/termination-log:Z"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "container:91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "container:91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 999,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "seccomp=unconfined",
                "label=disable"
            ],
            "UTSMode": "host",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 10,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "/kubepods/burstable/pode367fa34-7042-11e8-b62b-02877b22b514",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02-init/diff:/var/lib/docker/overlay2/c30cba254228c12f8cc1c2fd3f52356da0eebffc059e1615f906166ed139154d/diff:/var/lib/docker/overlay2/391f2b24d1f995a56d1b26efaec30eccb4c888d9c49c999f096dd7d8ae1aa37f/diff:/var/lib/docker/overlay2/7dd99ed23bff31b4f7edc34fd94b4af8f4d45ee8e74418552b7d503b6487f741/diff:/var/lib/docker/overlay2/b49c7dca7adebea285beedeeb4c1582995fdb24d410b752e75fa9061f384023c/diff:/var/lib/docker/overlay2/da29fe35e1be44003dc292f07a417fd349fc3696a5284ca3bd9ec83750c173f5/diff:/var/lib/docker/overlay2/af3dfca75d11fd663b8cfb64c3368377a62334f6555fb90c73fde785ca076f8a/diff:/var/lib/docker/overlay2/0ed9dd21485dacd3b0b38831dbf20e676b9ac612662ca09db01804bfb0ef5104/diff:/var/lib/docker/overlay2/1dfd12e815496f9dc05974cc75f3b7806acad57f4c8386839766f27b36e9cc9f/diff",
                "MergedDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/merged",
                "UpperDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/diff",
                "WorkDir": "/var/lib/docker/overlay2/8aee7f4dd35211ea18b313fa9775eb538ccee611b9920bbe3a85312266e93b02/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/volumes/kubernetes.io~secret/aws-node-token-g4hp6",
                "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
                "Mode": "ro,Z",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/etc-hosts",
                "Destination": "/etc/hosts",
                "Mode": "Z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/e367fa34-7042-11e8-b62b-02877b22b514/containers/aws-node/da09d6ea",
                "Destination": "/dev/termination-log",
                "Mode": "Z",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/opt/cni/bin",
                "Destination": "/host/opt/cni/bin",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/cni/net.d",
                "Destination": "/host/etc/cni/net.d",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/log",
                "Destination": "/host/var/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "ANL05300084",
            "Domainname": "",
            "User": "0",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "AWS_VPC_K8S_CNI_LOGLEVEL=DEBUG",
                "MY_NODE_NAME=ip-10-8-208-183.ap-southeast-2.compute.internal",
                "WARM_ENI_TARGET=1",
                "HTTPS_PROXY=http://proxy:3128",
                "HTTP_PROXY=http://proxy:3128",
                "NO_PROXY=169.254.169.254, localhost, 127.0.0.1, s3.ap-southeast-2.amazonaws.com, s3-ap-southeast-2.amazonaws.com, dynamodb.ap-southeast-2.amazonaws.com, 10.8.192.0/25, 10.8.200.0/25, 10.8.248.0/24, 10.8.224.0/23, 10.8.240.0/23, 10.8.208.0/22, 10.12.210.0/24, 10.12.210.1, 10.8.208.183, 10.12.210.2",
                "KUBERNETES_PORT_443_TCP=tcp://10.12.210.1:443",
                "KUBE_DNS_SERVICE_HOST=10.12.210.2",
                "KUBE_DNS_SERVICE_PORT=53",
                "KUBE_DNS_SERVICE_PORT_DNS=53",
                "KUBE_DNS_PORT=udp://10.12.210.2:53",
                "KUBE_DNS_PORT_53_UDP_PORT=53",
                "KUBE_DNS_PORT_53_TCP_PROTO=tcp",
                "KUBERNETES_SERVICE_PORT=443",
                "KUBERNETES_PORT_443_TCP_PROTO=tcp",
                "KUBERNETES_PORT_443_TCP_ADDR=10.12.210.1",
                "KUBE_DNS_PORT_53_UDP=udp://10.12.210.2:53",
                "KUBERNETES_SERVICE_HOST=10.12.210.1",
                "KUBE_DNS_PORT_53_TCP_PORT=53",
                "KUBE_DNS_PORT_53_TCP_ADDR=10.12.210.2",
                "KUBE_DNS_PORT_53_TCP=tcp://10.12.210.2:53",
                "KUBERNETES_PORT=tcp://10.12.210.1:443",
                "KUBERNETES_PORT_443_TCP_PORT=443",
                "KUBE_DNS_SERVICE_PORT_DNS_TCP=53",
                "KUBE_DNS_PORT_53_UDP_PROTO=udp",
                "KUBE_DNS_PORT_53_UDP_ADDR=10.12.210.2",
                "KUBERNETES_SERVICE_PORT_HTTPS=443",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": null,
            "Healthcheck": {
                "Test": [
                    "NONE"
                ]
            },
            "ArgsEscaped": true,
            "Image": "sha256:7e6390decb990137bdb11335c5d8c3f6b08fed446ec6c5283d3dac7bf3bd70ae",
            "Volumes": null,
            "WorkingDir": "/app",
            "Entrypoint": [
                "/bin/sh",
                "-c",
                "/app/install-aws.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "annotation.io.kubernetes.container.hash": "16e6c0d7",
                "annotation.io.kubernetes.container.restartCount": "8",
                "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
                "annotation.io.kubernetes.container.terminationMessagePolicy": "File",
                "annotation.io.kubernetes.pod.terminationGracePeriod": "30",
                "io.kubernetes.container.logpath": "/var/log/pods/e367fa34-7042-11e8-b62b-02877b22b514/aws-node/8.log",
                "io.kubernetes.container.name": "aws-node",
                "io.kubernetes.docker.type": "container",
                "io.kubernetes.pod.name": "aws-node-9dm8j",
                "io.kubernetes.pod.namespace": "kube-system",
                "io.kubernetes.pod.uid": "e367fa34-7042-11e8-b62b-02877b22b514",
                "io.kubernetes.sandbox.id": "91c0e4bc827dbfccebd7d4f3b69211a002e990a85feead95ca3d3e44c9285290"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }
    }
]
  1. Current fix:

    • If I just set image back to version 0.1.4 everything works fine in the same cluster with the same configuration:

kubectl -n kube-system set image ds/aws-node aws-node=602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.4
  • kubectl describe ds aws-node -n kube-system
Name:           aws-node
Selector:       k8s-app=aws-node
Node-Selector:  <none>
Labels:         k8s-app=aws-node
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"extensions/v1beta1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"k8s-app":"aws-node"},"name":"aws-node","namespace":"kube-...
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 1
Number of Nodes Scheduled with Up-to-date Pods: 1
Number of Nodes Scheduled with Available Pods: 1
Number of Nodes Misscheduled: 0
Pods Status:  1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=aws-node
  Annotations:      scheduler.alpha.kubernetes.io/critical-pod=
  Service Account:  aws-node
  Containers:
   aws-node:
    Image:      602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.4
    Port:       <none>
    Host Port:  <none>
    Requests:
      cpu:  10m
    Environment:
      AWS_VPC_K8S_CNI_LOGLEVEL:  DEBUG
      MY_NODE_NAME:               (v1:spec.nodeName)
      WARM_ENI_TARGET:           1
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /host/var/log from log-dir (rw)
      /var/run/docker.sock from dockersock (rw)
  Volumes:
   cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
   cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
   log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
   dockersock:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/docker.sock
    HostPathType:  
Events:
  Type    Reason            Age   From                  Message
  ----    ------            ----  ----                  -------
  Normal  SuccessfulCreate  43m   daemonset-controller  Created pod: aws-node-9rznc
  Normal  SuccessfulCreate  33m   daemonset-controller  Created pod: aws-node-9dm8j
  Normal  SuccessfulDelete  3m    daemonset-controller  Deleted pod: aws-node-9dm8j
  Normal  SuccessfulCreate  2m    daemonset-controller  Created pod: aws-node-zsqn9
  • tail /var/log/aws-routed-eni/plugin.log.2018-06-15-02
2018-06-15T02:53:36Z [INFO] Received CNI add request: ContainerID(f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b) Netns(/proc/13285/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system;K8S_POD_NAME=kube-dns-599dbfffb4-mpg6x;K8S_POD_INFRA_CONTAINER_ID=f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b) Path(/opt/aws-cni/bin:/opt/cni/bin) argsStdinData({"cniVersion":"","name":"aws-cni","type":"aws-cni","vethPrefix":"eni"})
2018-06-15T02:53:36Z [INFO] Received add network response for pod kube-dns-599dbfffb4-mpg6x namespace kube-system container f1fb5197777177af62fc93851dd451b48884bb5b8c8c247909a55dbc43d4721b: 10.8.208.169, table 0 
2018-06-15T02:53:36Z [INFO] Added toContainer rule for 10.8.208.169/32

Please let me know if you need any additional information.

Best,
Ruslan.

Most helpful comment

For those of us spinning up our first EKS clusters, what's a real fix? I changed the aws-node image back to 0.14 and it works, but what does the comment above, about NO_PROXY, mean?

All 13 comments

@xdrus , is this a EKS cluster?
also can you do following

  • ssh into worker,
  • run /opt/cni/bin/aws-cni-support.sh
  • collect /var/log/aws-routed-eni/aws-cni-support.tar.gz and attach it to the issue?

@xdrus you can also send me /var/log/aws-routed-eni/aws-cni-support.tar.gz to [email protected]

Hi @liwenwu-amazon, I've sent you details in email. Few more things:

  • I have to disable this line in support script, otherwise it fails on the first curl, because plugin isn't up (probably it is better to remove set -e at all): https://github.com/aws/amazon-vpc-cni-k8s/blob/master/scripts/aws-cni-support.sh#L20
  • You are right, it is not an EKS cluster.
  • I found that if I explicitly add real master IP to NO_PROXY env variable (not a Cluster ip for kubernetes services but instance’s IP) plugin version 1.0.0 works too. That being said version 0.1.4 works fine without this.

Best,
Ruslan.

@xdrus Glad to hear your problem is resolved. Yes, in version 1.0.0, ipamD require accessing to API server, whereas in 0.1.4 or 0.1.5 pamD only talks to local kubelet's insecure port.

I am going to close this issue for now. Please re-open it if you still think this is a problem
thanks

@liwenwu-amazon I'm fine with current solution, but can we have better logs in case of this error, as it is absolutely unclear why plugin fails.

@xdrus , /var/log/aws-routed-eni/ipamd.log.xxx have detail logs why ipamD failed or in crashloop in your case.

@liwenwu-amazon
in ipamd.log I can see only two lines repeated hundred times:

2018-06-15T02:36:27Z [INFO] Starting L-IPAMD 1.0.0  ...
2018-06-15T02:36:27Z [INFO] Testing communication with server

without any explicit message why it is failed. Also I believe it worth having error message in the pod's log, because the first thing you do to investigate failed pod is kubectl logs.

Thanks,
Ruslan.

Have opened #122 to make ipamD to have explicit message why CNI ipamD is failed to start

For those of us spinning up our first EKS clusters, what's a real fix? I changed the aws-node image back to 0.14 and it works, but what does the comment above, about NO_PROXY, mean?

Thx, I spent many hours. Just change image version to 0.1.4. Everything is ok.

@liwenwu-amazon
I'm using 1.1.0, and my pod is in crashing loop, I spent many hours, still cannot fix it. Here are the details:
kubectl -n kube-system logs aws-node-cc5br

=====Starting installing AWS-CNI =========
=====Starting amazon-k8s-agent ===========
ERROR: logging before flag.Parse: W0831 00:44:45.172982      10 client_config.go:533] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
ERROR: logging before flag.Parse: E0831 00:44:45.185063      10 reflector.go:205] github.com/aws/amazon-vpc-cni-k8s/pkg/k8sapi/discovery.go:268: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-system:default" cannot list pods at the cluster scope

in ipamd.log I got this:

2018-08-31T00:42:21Z [ERROR] Failed to setup host networkfailed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory
2018-08-31T00:42:21Z [ERROR] initialization failureipamd init: failed to setup host network: failed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory

Any help would be appreciate!

Thanks,

@shutingnir Is this a EKS cluster or kops cluster?
Looks like the RBAC is not setup correctly.

@liwenwu-amazon
It's not a EKS cluster, I created cluster manually.
After configured RBAC, it seems like it reaches APIserver successfully.
The issue now is I want to pick up this fix #130 to apply to my cluster:
9d05e90a52d7ed72d24b6527346a2ea0366c358d
and i built image with that specific commit, the pod falls in crash loop.

kubectl -n kube-system logs aws-node-tcrq2

=====Starting installing AWS-CNI =========
=====Starting amazon-k8s-agent ===========
ERROR: logging before flag.Parse: W0901 01:02:49.423540      14 client_config.go:533] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

cat /var/log/aws-routed-eni/ipamd.log.2018-09-01-01

2018-09-01T01:05:34Z [INFO] Setting up host network
2018-09-01T01:05:34Z [ERROR] Failed to setup host networkfailed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory
2018-09-01T01:05:34Z [ERROR] initialization failureipamd init: failed to setup host network: failed to configure eth0 RPF check: open /proc/sys/net/ipv4/conf/eth0/rp_filter: no such file or directory
Was this page helpful?
0 / 5 - 0 ratings