External-dns: Unable to connect to EKS control plane endpoints

Created on 15 Apr 2019 · 17Comments · Source: kubernetes-sigs/external-dns

When deploying external DNS within a EKS cluster I encountered issues with external-dns connecting to the Kubernetes control plane endpoints.

time="2019-04-10T15:02:34Z" level=info msg="Created Kubernetes client https://172.20.0.1:443"
time="2019-04-10T15:03:34Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

EKS Server version: Server Version: v1.12.6-eks-d69f1b

The Kubernetes service was configured correctly, and other pods were able to communicate with the control plane endpoint.

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   1d

The resources deployed:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: external-dns
rules:
- apiGroups: ["core"]
  resources: ["services"]
  verbs: ["get","watch","list"]
- apiGroups: ["core"]
  resources: ["pods"]
  verbs: ["get","watch","list"]
- apiGroups: ["extensions"]
  resources: ["ingresses"]
  verbs: ["get","watch","list"]
- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
- kind: ServiceAccount
  name: external-dns
  namespace: default
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: external-dns
spec:
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
      - name: external-dns
        image: registry.opensource.zalan.do/teapot/external-dns:v0.5.12
        args:
        - --source=service
        - --domain-filter=example.com
        - --provider=aws
        - --policy=upsert-only
        - --aws-zone-type=private
        - --registry=txt
        - --txt-owner-id=example-com

Source

kristaxox

Most helpful comment

I think in case of @szymonpk described, you also have to allow external-dns to get nodes

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

njuettner on 18 Apr 2019

👍2

All 17 comments

Also worth noting, external-dns version v0.5.11 works correctly in this environment.

kristaxox on 15 Apr 2019

👍2

I can conform that 0.5.11 works in EKS

johan-smits on 17 Apr 2019

👍1

Latest also have issues in EKS (1.11.9), I am not sure if it is related:

ERROR: logging before flag.Parse: W0418 05:26:24.282304       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work
....
ERRRO: logging before flag.Parse: E0418 05:32:55.296176       1 round_trippers.go:291] CancelRequest not implemented
ERRRO: logging before flag.Parse: E0418 05:32:55.296383       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERRRO: logging before flag.Parse: E0418 05:32:56.123149       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERRRO: logging before flag.Parse: E0418 05:32:57.130803       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)

Reverting to 0.5.11 helps.

szymonpk on 18 Apr 2019

👍2

I think in case of @szymonpk described, you also have to allow external-dns to get nodes

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

njuettner on 18 Apr 2019

👍2

Let me know if this helps @kristaxox 🙂.

njuettner on 18 Apr 2019

@njuettner I have list on nodes. It isn't mentioned in https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/aws.md#manifest-for-clusters-with-rbac-enabled. Also, v0.5.11 works.

szymonpk on 18 Apr 2019

Ok, this is a little bit embarrassing. Looks like I was running latest not 0.5.12, which I assume may have changes from the master. Sorry for hijacking this issue report!

szymonpk on 18 Apr 2019

We experience the same problems with external-dns v0.5.13 on AWS and adding the (undocumented, see https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/aws.md#manifest-for-clusters-with-rbac-enabled) rbac rules helps with some but not all issues:

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "watch", "list"]

Afterwards there are still errors:

ERROR: logging before flag.Parse: E0418 11:02:41.134592       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 11:02:41.233059       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)

cmur2 on 18 Apr 2019

@njuettner I can confirm that adding the suggested clusterrole works with external-dns:v0.5.12

- apiGroups: ["core"]
  resources: ["nodes"]
  verbs: ["get", "list"]

When bumping the version to v0.5.13 I get the following errors:

ERROR: logging before flag.Parse: W0418 14:40:43.932789       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2019-04-18T14:40:43Z" level=info msg="Created Kubernetes client https://172.20.0.1:443"
ERROR: logging before flag.Parse: E0418 14:40:43.961066       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERROR: logging before flag.Parse: E0418 14:40:44.970765       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
time="2019-04-18T14:40:45Z" level=info msg="All records are already up to date"
ERROR: logging before flag.Parse: E0418 14:40:45.974600       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
ERROR: logging before flag.Parse: E0418 14:42:43.959965       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 14:42:43.960084       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0418 14:42:43.964843       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0418 14:42:43.964963       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
ERROR: logging before flag.Parse: E0418 14:42:44.540602       1 reflector.go:322] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99: Failed to watch *v1.Node: unknown (get nodes)
time="2019-04-18T14:42:45Z" level=info msg="All records are already up to date"

kristaxox on 18 Apr 2019

@kristaxox please also add watch for v0.5.13.

I can confirm seeing logs:

ERROR: logging before flag.Parse: E0420 09:42:44.246472       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0420 09:42:44.246646       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body

Those are non-critical, however it is annoying but it seems they are coming from Kubernetes and not from external-dns. The reason why we see those is probably merging PR https://github.com/kubernetes-incubator/external-dns/pull/917, which was introduced in v0.5.12

@cmur2 Did you see those logs in v0.5.12 or did you skipped this version?

njuettner on 20 Apr 2019

@njuettner we used (well, now use again) v0.5.12 and the CancelRequest not implemented messages did not occur there. They only appeared with v0.5.13.

cmur2 on 20 Apr 2019

Sorry this PR isn't relevant, it was the switch to go modules in v0.5.13.

When we used dep we supressed those logs by overwriting the glog package, which was used in Kubernetes (now it's klog). We can suppress it again by simply overwriting the package in go mod.

[[projects]]    
  branch = "master" 
  digest = "1:b12aff239810a9fa71e901a712a52f9da4c6e536852e943be693dec1d4519dfd" 
  name = "github.com/golang/glog"   
  packages = ["."]  
  pruneopts = ""    
  revision = "3fa5b9870d1d29f6d7907b29f1ae8c6eeb403829" 
  source = "github.com/kubermatic/glog-logrus"

I will create a PR for that. Again those logs are non-critical by I do understand they pretty annoying which needs to be fixed.

njuettner on 20 Apr 2019

👍1

@njuettner with the suggested additions to the clusterrole I can confirm the issues around "unable to decode event" in v0.5.13.

ERROR: logging before flag.Parse: E0422 13:22:27.653595       1 round_trippers.go:291] CancelRequest not implemented
ERROR: logging before flag.Parse: E0422 13:22:27.653700       1 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)

Cluster role rules:

rules:
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs:
  - get
  - watch
  - list
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch

kristaxox on 22 Apr 2019

At this point is this just a duplicate of #990 ? Or are there additional rbac rules that need to be added to the rbac example clusterrole, and therefore worth keeping this open?

kristaxox on 26 Apr 2019

@kristaxox We released v0.5.14 yesterday. This fixes the noisy logs you saw. Let me know if it works for you and we can close the ticket.

njuettner on 15 May 2019

👍1

Works fine for us on Kubernetes 1.11 with External-DNS v0.5.14.

cmur2 on 15 May 2019

@njuettner looking good on my end! Thanks!

kristaxox on 15 May 2019

Was this page helpful?

0 / 5 - 0 ratings