external-dns fails with "failed to sync cache: timed out waiting for the condition"

Created on 3 Apr 2019  路  19Comments  路  Source: kubernetes-sigs/external-dns

We are facing the situation that external-dns is not working at all. We are runnign it as a pod in our openshift 3.11 cluster. The pod starts up, but fails after 60 seconds with

time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

Version:
kubernetes 1.11.0
external-dns 0.5.12
Configuration:

- --source=service
- --provider=pdns
- --pdns-server=http://192.168.128.15:8081/api
- --pdns-api-key=xxx
- --txt-owner-id=external-dns
- --log-level=debug
- --interval=30s

It doesn't matter which dns-provider is configured, external--dns dies before working on zones.

The complete log looks like this

time="2019-04-03T13:19:28Z" level=info msg="config: {Master: KubeConfig: RequestTimeout:30s IstioIngressGatewayServices:[istio-system/istio-ingressgateway] Sources:[service] Namespace: AnnotationFilter: FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false Compatibility: PublishInternal:false PublishHostIP:false ConnectorSourceServer:localhost:8080 Provider:pdns GoogleProject: DomainFilter:[] ZoneIDFilter:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false CloudflareZonesPerPage:50 RcodezeroTXTEncrypt:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml InMemoryZones:[] PDNSServer:http://192.168.128.15:8081/api PDNSAPIKey: PDNSTLSEnabled:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:sync Registry:txt TXTOwnerID:okddev01 TXTPrefix: Interval:30s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s ExoscaleEndpoint:https://api.exoscale.ch/dns ExoscaleAPIKey: ExoscaleAPISecret: CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false}"
time="2019-04-03T13:19:28Z" level=info msg="Created Kubernetes client https://10.127.0.1:443"
time="2019-04-03T13:20:28Z" level=fatal msg="failed to sync cache: timed out waiting for the condition"

We configured a serviceacount, added the required role and rolebinding, the pod is running as the configured serviceaccount.

When running the pod with the default service-account we get the same error-message.

I tried out some other controller pods that use k8s informers, those are working without problems.

Any help would be appreciated

kindocumentation

Most helpful comment

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message "failed to sync cache: timed out waiting for the condition". It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

All 19 comments

Ok, problem solved. It tunred out that i messed up the clusterrole-binding. As soon as i got it right everything woked as expected.
So maybe it would help to hint at a possible RBAC problem in th error-message ?

The second reply saved me many hours of head scratching, many thanks :+1:

I had similar problem trying to create the RBAC resources in a namespace other than "default". Is this by design - or is something incorrect in my configuration?

I had the same issue and I didnt enable RBAC external-dns. After I did, it worked.

I am using the helm chart: https://github.com/helm/charts/tree/master/stable/external-dns

The option is: rbac.create = true

I had similar problem trying to create the RBAC resources in a namespace other than "default". Is this by design - or is something incorrect in my configuration?

@apigeeks-lee You're probably referencing the wrong service account in your role binding. Double-check the subject's name and namespace.

For anyone bumping into this issue. If you're using terraform, make sure to use kubernetes_cluster_role and kubernetes_cluster_role_binding, instead of kubernetes_role and kubernetes_role_binding :sweat_smile:

Just spent a few hours, until I saw this. The RBAC clusterRole binding from the incubator documentation _explicitly_ binds to the _default_ namespace. Be wary if you try to deploy the external-dns to a namespace _other_ than default.

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message "failed to sync cache: timed out waiting for the condition". It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

@GeertJohan, thanks It's saved my time

@GeertJohan awesome, probably just saved me a couples minutes / hours ! xD

This just bit me as well, one thing to check is the clusterRoleBinding in the documentation binds to a service account in the default namespace so if you want to run external-dns in a different namespace make sure you change the namespace from default to your new namespace before creating the clusterRoleBinding

After I recreated some nodes, external-dns failed to startup again. It failed after printing the error message "failed to sync cache: timed out waiting for the condition". It seems that endpoints were added and external-dns now requires extra permissions.

Make sure you have added

- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get","watch","list"]

to your external-dns ClusterRole. Adding this solved the problem for me.

Works like a charm! thanks @GeertJohan

@GeertJohan awesome man! Thank you!

We follow the rfc2136 docs and found this is missing from its RBAC section:

- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["list","watch"]

Once I added that, external-dns > 0.7.1 started working again without the "failed to sync cache" error.

This issue should be reopened and not closed until all actual documentation has been updated with the relevant required RBAC bits. The only source of truth is each individual tutorial, there's no master RBAC document, so this isn't done till they're ALL done.

/reopen
/kind documentation

@seanmalloy: Reopened this issue.

In response to this:

/reopen
/kind documentation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cprivite would you consider submitting a PR to fix some of the docs that need to be updated?

@GeertJohan You are the real MVP.

Was this page helpful?
0 / 5 - 0 ratings