external-dns fails with "failed to sync cache: timed out waiting for the condition"

Created on 3 Apr 2019 · 19Comments · Source: kubernetes-sigs/external-dns

We are facing the situation that external-dns is not working at all. We are runnign it as a pod in our openshift 3.11 cluster. The pod starts up, but fails after 60 seconds with

time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

Version:
kubernetes 1.11.0
external-dns 0.5.12
Configuration:

- --source=service
- --provider=pdns
- --pdns-server=http://192.168.128.15:8081/api
- --pdns-api-key=xxx
- --txt-owner-id=external-dns
- --log-level=debug
- --interval=30s

It doesn't matter which dns-provider is configured, external--dns dies before working on zones.

The complete log looks like this

time="2019-04-03T13:19:28Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[service] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:pdns GoogleProject: DomainFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://192.168.128.15:8081/api PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:okddev01 TXTPrefix: Interval:30s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false}"
time="2019-04-03T13:19:28Z" level=info msg="Created Kubernetes client https://10.127.0.1:443"
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

We configured a serviceacount, added the required role and rolebinding, the pod is running as the configured serviceaccount.

When running the pod with the default service-account we get the same error-message.

I tried out some other controller pods that use k8s informers, those are working without problems.

Any help would be appreciated

kindocumentation

Source

iceman91176

👍2

Most helpful comment

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message "failed to sync cache: timed out waiting for the condition". It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

GeertJohan on 28 Jul 2020

👍72 🎉26 ❤17 🚀9 😄7 👀6

All 19 comments

Ok, problem solved. It tunred out that i messed up the clusterrole-binding. As soon as i got it right everything woked as expected.
So maybe it would help to hint at a possible RBAC problem in th error-message ?

iceman91176 on 3 Apr 2019

👍40

The second reply saved me many hours of head scratching, many thanks :+1:

nitaigao on 8 Apr 2019

👍10

I had similar problem trying to create the RBAC resources in a namespace other than "default". Is this by design - or is something incorrect in my configuration?

apigeeks-lee on 11 Apr 2019

👍7 ❤4

I had the same issue and I didnt enable RBAC external-dns. After I did, it worked.

I am using the helm chart: https://github.com/helm/charts/tree/master/stable/external-dns

The option is: rbac.create = true

sekka1 on 24 May 2019

👍15

I had similar problem trying to create the RBAC resources in a namespace other than "default". Is this by design - or is something incorrect in my configuration?

@apigeeks-lee You're probably referencing the wrong service account in your role binding. Double-check the subject's name and namespace.

linki on 28 May 2019

👍3

For anyone bumping into this issue. If you're using terraform, make sure to use kubernetes_cluster_role and kubernetes_cluster_role_binding, instead of kubernetes_role and kubernetes_role_binding :sweat_smile:

GeertJohan on 16 Sep 2019

Just spent a few hours, until I saw this. The RBAC clusterRole binding from the incubator documentation _explicitly_ binds to the _default_ namespace. Be wary if you try to deploy the external-dns to a namespace _other_ than default.

StephanX on 29 Sep 2019

👍25 🎉5 🚀4 ❤3

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

GeertJohan on 28 Jul 2020

👍72 🎉26 ❤17 🚀9 😄7 👀6

@GeertJohan, thanks It's saved my time

Viswa88 on 29 Jul 2020

😄1

@GeertJohan awesome, probably just saved me a couples minutes / hours ! xD

JnMik on 30 Jul 2020

😄1

This just bit me as well, one thing to check is the clusterRoleBinding in the documentation binds to a service account in the default namespace so if you want to run external-dns in a different namespace make sure you change the namespace from default to your new namespace before creating the clusterRoleBinding

ajgajg1134 on 19 Aug 2020

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message "failed to sync cache: timed out waiting for the condition". It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]
to your external-dns ClusterRole. Adding this solved the problem for me.

Works like a charm! thanks @GeertJohan

lcontini on 28 Aug 2020

😄1

@GeertJohan awesome man! Thank you!

evgenyvaganov on 22 Sep 2020

😄1

We follow the rfc2136 docs and found this is missing from its RBAC section:

- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list","watch"]

Once I added that, external-dns > 0.7.1 started working again without the "failed to sync cache" error.

cprivite on 14 Oct 2020

👍3 🎉1

This issue should be reopened and not closed until all actual documentation has been updated with the relevant required RBAC bits. The only source of truth is each individual tutorial, there's no master RBAC document, so this isn't done till they're ALL done.

cprivite on 14 Oct 2020

/reopen
/kind documentation

seanmalloy on 14 Oct 2020

@seanmalloy: Reopened this issue.

In response to this:

/reopen
/kind documentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.