External-dns: Unable to connect to EKS control plane endpoints

Created on 15 Apr 2019  路  17Comments  路  Source: kubernetes-sigs/external-dns

When deploying external DNS within a EKS cluster I encountered issues with external-dns connecting to the Kubernetes control plane endpoints.

time="2019-04-10T15:02:34Z" level=info msg="Created Kubernetes client https://172.20.0.1:443"
time="2019-04-10T15:03:34Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

EKS Server version: Server Version: v1.12.6-eks-d69f1b

The Kubernetes service was configured correctly, and other pods were able to communicate with the control plane endpoint.

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   1d

The resources deployed:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: ["core"]
  resources: ["services"]
  verbs: ["get","watch","list"]
- apiGroups: ["core"]
  resources: ["pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]
- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: default
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: external-dns
spec:
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: registry.opensource.zalan.do/teapot/external-dns:v0.5.12
        args:
        - --source=service
        - --domain-filter=example.com
        - --provider=aws
        - --policy=upsert-only
        - --aws-zone-type=private
        - --registry=txt
        - --txt-owner-id=example-com

Most helpful comment

I think in case of @szymonpk described, you also have to allow external-dns to get nodes

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

All 17 comments

Also worth noting, external-dns version v0.5.11 works correctly in this environment.

I can conform that 0.5.11 works in EKS

Latest also have issues in EKS (1.11.9), I am not sure if it is related:

ERROR: logging before flag.Parse: W0418 05:26:24.282304       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work
....
ERRRO: logging before flag.Parse: E0418 05:32:55.296176       1 round_trippers.go:291] CancelRequest not implemented
ERRRO: logging before flag.Parse: E0418 05:32:55.296383       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERRRO: logging before flag.Parse: E0418 05:32:56.123149       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERRRO: logging before flag.Parse: E0418 05:32:57.130803       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)

Reverting to 0.5.11 helps.

I think in case of @szymonpk described, you also have to allow external-dns to get nodes

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

Let me know if this helps @kristaxox 馃檪.

@njuettner I have list on nodes. It isn't mentioned in https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/aws.md#manifest-for-clusters-with-rbac-enabled. Also, v0.5.11 works.

Ok, this is a little bit embarrassing. Looks like I was running latest not 0.5.12, which I assume may have changes from the master. Sorry for hijacking this issue report!

We experience the same problems with external-dns v0.5.13 on AWS and adding the (undocumented, see https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/aws.md#manifest-for-clusters-with-rbac-enabled) rbac rules helps with some but not all issues:

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "watch", "list"]

Afterwards there are still errors:

ERROR: logging before flag.Parse: E0418 11:02:41.134592       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 11:02:41.233059       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body) 

@njuettner I can confirm that adding the suggested clusterrole works with external-dns:v0.5.12

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

When bumping the version to v0.5.13 I get the following errors:

ERROR: logging before flag.Parse: W0418 14:40:43.932789       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2019-04-18T14:40:43Z" level=info msg="Created Kubernetes client https://172.20.0.1:443"
ERROR: logging before flag.Parse: E0418 14:40:43.961066       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERROR: logging before flag.Parse: E0418 14:40:44.970765       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
time="2019-04-18T14:40:45Z" level=info msg="All records are already up to date"
ERROR: logging before flag.Parse: E0418 14:40:45.974600       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERROR: logging before flag.Parse: E0418 14:42:43.959965       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 14:42:43.960084       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0418 14:42:43.964843       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 14:42:43.964963       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0418 14:42:44.540602       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
time="2019-04-18T14:42:45Z" level=info msg="All records are already up to date"

@kristaxox please also add watch for v0.5.13.

I can confirm seeing logs:

ERROR: logging before flag.Parse: E0420 09:42:44.246472       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0420 09:42:44.246646       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body

Those are non-critical, however it is annoying but it seems they are coming from Kubernetes and not from external-dns. The reason why we see those is probably merging PR https://github.com/kubernetes-incubator/external-dns/pull/917, which was introduced in v0.5.12

@cmur2 Did you see those logs in v0.5.12 or did you skipped this version?

@njuettner we used (well, now use again) v0.5.12 and the CancelRequest not implemented messages did not occur there. They only appeared with v0.5.13.

Sorry this PR isn't relevant, it was the switch to go modules in v0.5.13.

When we used dep we supressed those logs by overwriting the glog package, which was used in Kubernetes (now it's klog). We can suppress it again by simply overwriting the package in go mod.

[[projects]]    
  branch = "master" 
  digest = "1:b12aff239810a9fa71e901a712a52f9da4c6e536852e943be693dec1d4519dfd" 
  name = "github.com/golang/glog"   
  packages = ["."]  
  pruneopts = ""    
  revision = "3fa5b9870d1d29f6d7907b29f1ae8c6eeb403829" 
  source = "github.com/kubermatic/glog-logrus"

I will create a PR for that. Again those logs are non-critical by I do understand they pretty annoying which needs to be fixed.

@njuettner with the suggested additions to the clusterrole I can confirm the issues around "unable to decode event" in v0.5.13.

ERROR: logging before flag.Parse: E0422 13:22:27.653595       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0422 13:22:27.653700       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)

Cluster role rules:

rules:
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch

At this point is this just a duplicate of #990 ? Or are there additional rbac rules that need to be added to the rbac example clusterrole, and therefore worth keeping this open?

@kristaxox We released v0.5.14 yesterday. This fixes the noisy logs you saw. Let me know if it works for you and we can close the ticket.

Works fine for us on Kubernetes 1.11 with External-DNS v0.5.14.

@njuettner looking good on my end! Thanks!

Was this page helpful?
0 / 5 - 0 ratings