External-dns: Unable to use IAM Service Account on GKE

Created on 29 Mar 2018  路  23Comments  路  Source: kubernetes-sigs/external-dns

I'm seeing the following in the logs for my external-dns pod on GKE:

time="2018-03-29T00:57:30Z" level=info msg="config: &{Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: Compatibility: PublishInternal:false Provider:google GoogleProject:MY-PROJECT DomainFilter:[MY.MANAGED.ZONE] AWSZoneType: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InMemoryZones:[] Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix:external-dns Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:debug}" 
time="2018-03-29T00:57:30Z" level=info msg="Connected to cluster at https://10.55.240.1:443" 
time="2018-03-29T00:57:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T00:58:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T00:59:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 
time="2018-03-29T01:00:30Z" level=error msg="googleapi: Error 403: Forbidden, forbidden" 

I created an issue for this in the charts repo, but after doing so, I thought it might be better to raise the issue here. Feel free to close this one if it's the wrong place, or let me know if it's the right place and I'll close the other ticket.

Helm version:
"v2.8.2", GitCommit:"a80231648a1473929271764b920a8e346f6de844"

Kubernetes version:
1.9.4-gke.1

Installed via:
Helm chart stable/external-dns

Args:
(as passed through the helm chart to the container, obtained via kubectl describe pod foo)

      --log-level=debug
      --domain-filter=MY.MANAGED.ZONE
      --policy=upsert-only
      --provider=google
      --txt-prefix=external-dns
      --source=service
      --source=ingress
      --registry=txt
      --google-project=MY-PROJECT

Credentials:
Set via env var: GOOGLE_APPLICATION_CREDENTIALS: /etc/secrets/service-account/credentials.json
Credentials obtained from downloaded JSON file while creating a service account in Cloud Console Web UI.

Docker image:
registry.opensource.zalan.do/teapot/external-dns:v0.4.8

Helm command:

helm upgrade\
 --install\
 --recreate-pods\
 --namespace=kube-system\
 --set domainFilters[0]="MY.MANAGED.ZONE"\
 --set extraArgs.registry=txt\
 --set logLevel=debug\
 --set provider=google\
 --set google.serviceAccountSecret=external-dns\
 --set google.project=MY-PROJECT\
 --set txtPrefix="external-dns"\
 external-dns stable/external-dns

I am able to kubectl exec my way onto the pod and verify that the file /etc/secrets/service-account/credentials.json file is in place. In troubleshooting, I've granted the service account full owner permissions across the entire project, and it doesn't seem to have had any effect.

Steps to repro, as best as I can figure:

  • Create a GCP project
  • Create a managed zone in Cloud DNS (not sure if this part is strictly necessary to trigger the behavior)
  • Create a GKE cluster (K8S version: 1.9.4-gke.1)
  • Create a GKE node pool (K8S version: 1.9.4-gke.1)
  • Login to the cluster
  • Create an ingress which matches your managed zone (not sure if this part is strictly necessary to trigger the behavior)
  • Create an IAM service account.
  • Grant the IAM service account full project owner permissions.
  • Create a K8S secret with a data key of credentials.json and its value as the JSON object downloaded from the IAM service account creation dialog.
  • Run helm init to install tiller in the cluster.
  • Run helm repo update to get the latest version of the chart (I installed 0.5.2)
  • Run helm upgrade command I put further up in the issue.
  • Run kubectl logs -f THE_POD_NAME to see the error above.
lifecyclrotten providegoogle

Most helpful comment

Hi all,

I'm using the linked helm chart with GCE and a service account without any issue.

I'm supplying the helm chart with the following values:

provider: google

google:
  project: "projectName"
  serviceAccountSecret: "domain.tld"

rbac:
  create: true

Where domain.tld is a secret with key: credentials.json and value my downloaded service account credentials.

The container boots up, launches the external-dns binary, it has the following environment variables:

/ # xargs -0 -n 1 < /proc/1/environ 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=external-dns-56bcd78f87-cxcnz
GOOGLE_APPLICATION_CREDENTIALS=/etc/secrets/service-account/credentials.json
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBE_DNS_PORT=udp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP=tcp://10.51.240.227:44134
HEAPSTER_PORT_80_TCP_PROTO=tcp
HEAPSTER_PORT_80_TCP_PORT=80
KUBE_DNS_SERVICE_PORT_DNS=53
EXTERNAL_DNS_PORT_7979_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.51.240.1
TILLER_DEPLOY_PORT_44134_TCP_ADDR=10.51.240.227
EXTERNAL_DNS_SERVICE_HOST=10.51.244.129
TILLER_DEPLOY_SERVICE_PORT_TILLER=44134
TILLER_DEPLOY_PORT_44134_TCP_PROTO=tcp
KUBE_DNS_PORT_53_TCP_ADDR=10.51.240.10
TILLER_DEPLOY_SERVICE_PORT=44134
TILLER_DEPLOY_PORT=tcp://10.51.240.227:44134
EXTERNAL_DNS_PORT_7979_TCP_ADDR=10.51.244.129
KUBERNETES_PORT_443_TCP=tcp://10.51.240.1:443
HEAPSTER_SERVICE_PORT=80
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBE_DNS_PORT_53_UDP_PROTO=udp
KUBE_DNS_PORT_53_TCP=tcp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP_PORT=44134
EXTERNAL_DNS_SERVICE_PORT=7979
EXTERNAL_DNS_PORT_7979_TCP=tcp://10.51.244.129:7979
KUBERNETES_PORT=tcp://10.51.240.1:443
KUBE_DNS_PORT_53_TCP_PORT=53
TILLER_DEPLOY_SERVICE_HOST=10.51.240.227
EXTERNAL_DNS_PORT_7979_TCP_PORT=7979
KUBE_DNS_SERVICE_HOST=10.51.240.10
KUBE_DNS_PORT_53_UDP=udp://10.51.240.10:53
KUBE_DNS_SERVICE_PORT=53
KUBE_DNS_PORT_53_UDP_PORT=53
KUBE_DNS_PORT_53_UDP_ADDR=10.51.240.10
KUBE_DNS_PORT_53_TCP_PROTO=tcp
HEAPSTER_SERVICE_HOST=10.51.252.134
HEAPSTER_PORT_80_TCP=tcp://10.51.252.134:80
HEAPSTER_PORT_80_TCP_ADDR=10.51.252.134
HEAPSTER_PORT=tcp://10.51.252.134:80
KUBE_DNS_SERVICE_PORT_DNS_TCP=53
EXTERNAL_DNS_PORT=tcp://10.51.244.129:7979
KUBERNETES_SERVICE_HOST=10.51.240.1
KUBERNETES_SERVICE_PORT=443
HOME=/root

If I cat out /etc/secrets/service-account/credentials.json I get back the credentials I submitted via the secret.

My logs state the following:

time="2018-05-14T18:56:30Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: tldbineFQDNAndAnnotation:false tldpatibility: PublishInternal:false Provider:google GoogleProject:domain DomainFilter:[] ZoneIDFilter:[] AWSZoneType: AWSAssumeRole: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info}"
time="2018-05-14T18:56:30Z" level=info msg="Connected to cluster at https://10.51.240.1:443"
time="2018-05-14T18:56:31Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:57:31Z" level=info msg="All records are already up to date"
time="2018-05-14T18:58:32Z" level=info msg="All records are already up to date"
time="2018-05-14T18:59:32Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T19:00:33Z" level=info msg="All records are already up to date"
time="2018-05-14T19:01:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:02:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:03:34Z" level=info msg="All records are already up to date"

All 23 comments

I am also having the same issue.

I'm not using the External DNS Helm chart, but I can personally attest that our GKE tutorial works: https://github.com/kubernetes-incubator/external-dns/blob/master/docs/tutorials/gke.md

Maybe "somebody" (:tm:) can check what is missing from the Helm chart (or its docs)?

Any updates on this issue? We're also seeing the same 403 error when deploying ExternalDNS to a GKE cluster (following the guide above, we're not using Helm).

Hi all,

I'm using the linked helm chart with GCE and a service account without any issue.

I'm supplying the helm chart with the following values:

provider: google

google:
  project: "projectName"
  serviceAccountSecret: "domain.tld"

rbac:
  create: true

Where domain.tld is a secret with key: credentials.json and value my downloaded service account credentials.

The container boots up, launches the external-dns binary, it has the following environment variables:

/ # xargs -0 -n 1 < /proc/1/environ 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=external-dns-56bcd78f87-cxcnz
GOOGLE_APPLICATION_CREDENTIALS=/etc/secrets/service-account/credentials.json
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBE_DNS_PORT=udp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP=tcp://10.51.240.227:44134
HEAPSTER_PORT_80_TCP_PROTO=tcp
HEAPSTER_PORT_80_TCP_PORT=80
KUBE_DNS_SERVICE_PORT_DNS=53
EXTERNAL_DNS_PORT_7979_TCP_PROTO=tcp
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_ADDR=10.51.240.1
TILLER_DEPLOY_PORT_44134_TCP_ADDR=10.51.240.227
EXTERNAL_DNS_SERVICE_HOST=10.51.244.129
TILLER_DEPLOY_SERVICE_PORT_TILLER=44134
TILLER_DEPLOY_PORT_44134_TCP_PROTO=tcp
KUBE_DNS_PORT_53_TCP_ADDR=10.51.240.10
TILLER_DEPLOY_SERVICE_PORT=44134
TILLER_DEPLOY_PORT=tcp://10.51.240.227:44134
EXTERNAL_DNS_PORT_7979_TCP_ADDR=10.51.244.129
KUBERNETES_PORT_443_TCP=tcp://10.51.240.1:443
HEAPSTER_SERVICE_PORT=80
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBE_DNS_PORT_53_UDP_PROTO=udp
KUBE_DNS_PORT_53_TCP=tcp://10.51.240.10:53
TILLER_DEPLOY_PORT_44134_TCP_PORT=44134
EXTERNAL_DNS_SERVICE_PORT=7979
EXTERNAL_DNS_PORT_7979_TCP=tcp://10.51.244.129:7979
KUBERNETES_PORT=tcp://10.51.240.1:443
KUBE_DNS_PORT_53_TCP_PORT=53
TILLER_DEPLOY_SERVICE_HOST=10.51.240.227
EXTERNAL_DNS_PORT_7979_TCP_PORT=7979
KUBE_DNS_SERVICE_HOST=10.51.240.10
KUBE_DNS_PORT_53_UDP=udp://10.51.240.10:53
KUBE_DNS_SERVICE_PORT=53
KUBE_DNS_PORT_53_UDP_PORT=53
KUBE_DNS_PORT_53_UDP_ADDR=10.51.240.10
KUBE_DNS_PORT_53_TCP_PROTO=tcp
HEAPSTER_SERVICE_HOST=10.51.252.134
HEAPSTER_PORT_80_TCP=tcp://10.51.252.134:80
HEAPSTER_PORT_80_TCP_ADDR=10.51.252.134
HEAPSTER_PORT=tcp://10.51.252.134:80
KUBE_DNS_SERVICE_PORT_DNS_TCP=53
EXTERNAL_DNS_PORT=tcp://10.51.244.129:7979
KUBERNETES_SERVICE_HOST=10.51.240.1
KUBERNETES_SERVICE_PORT=443
HOME=/root

If I cat out /etc/secrets/service-account/credentials.json I get back the credentials I submitted via the secret.

My logs state the following:

time="2018-05-14T18:56:30Z" level=info msg="config: {Master: KubeConfig: Sources:[service ingress] Namespace: AnnotationFilter: FQDNTemplate: tldbineFQDNAndAnnotation:false tldpatibility: PublishInternal:false Provider:google GoogleProject:domain DomainFilter:[] ZoneIDFilter:[] AWSZoneType: AWSAssumeRole: AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: CloudflareProxied:false InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 InMemoryZones:[] PDNSServer:http://localhost:8081 PDNSAPIKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: Interval:1m0s Once:false DryRun:false LogFormat:text MetricsAddress::7979 LogLevel:info}"
time="2018-05-14T18:56:30Z" level=info msg="Connected to cluster at https://10.51.240.1:443"
time="2018-05-14T18:56:31Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:56:31Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:57:31Z" level=info msg="All records are already up to date"
time="2018-05-14T18:58:32Z" level=info msg="All records are already up to date"
time="2018-05-14T18:59:32Z" level=info msg="Change zone: domain-tld"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. A [23.251.129.2] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Del records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. A [35.195.152.45] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: docker.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T18:59:32Z" level=info msg="Add records: nexus.domain.tld. TXT [\"heritage=external-dns,external-dns/owner=default,external-dns/resource=ingress/infra/nexus-sonatype-nexus\"] 300"
time="2018-05-14T19:00:33Z" level=info msg="All records are already up to date"
time="2018-05-14T19:01:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:02:34Z" level=info msg="All records are already up to date"
time="2018-05-14T19:03:34Z" level=info msg="All records are already up to date"

Without using helm, I encountered this Error 403: Forbidden, forbidden error on GKE 1.10.5-gke.0 if my IAM service account wasn't set up correctly with the project role.

To reproduce:

# create a new IAM service account
$ gcloud iam service-accounts create gke-external-dns --display-name "Service account for ExternalDNS on GKE"

# create a new node pool to use the gke-external-dns service account
$ gcloud container node-pools create external-dns-pool --cluster=main --num-nodes=1 --service-account='gke-external-dns@<project_id>.iam.gserviceaccount.com'

# create the service account's key as secret. credentials.json is download from the gcp console
$ kubectl create secret generic external-dns-key --from-file=credentials.json

If I deployed external-dns now, I see these errors in the log:

time="2018-07-26T03:09:08Z" level=info msg="Connected to cluster at https://10.100.0.1:443"
time="2018-07-26T03:09:08Z" level=error msg="Get https://www.googleapis.com/dns/v1/projects/isim-default/managedZones?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 74.125.195.84:443: connect: connection refused"
time="2018-07-26T03:10:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:11:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:12:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"

Assign the project owner role:

$ gcloud projects add-iam-policy-binding <project_id> --member='serviceAccount:gke-external-dns@<project_id>.iam.gserviceaccount.com' --role='roles/owner'

Now it works:

time="2018-07-26T03:09:08Z" level=info msg="Connected to cluster at https://10.100.0.1:443"
time="2018-07-26T03:09:08Z" level=error msg="Get https://www.googleapis.com/dns/v1/projects/isim-default/managedZones?alt=json: oauth2: cannot fetch token: Post https://accounts.google.com/o/oauth2/token: dial tcp 74.125.195.84:443: connect: connection refused"
time="2018-07-26T03:10:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:11:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:12:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:13:08Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
time="2018-07-26T03:14:08Z" level=info msg="All records are already up to date"
time="2018-07-26T03:15:09Z" level=info msg="All records are already up to date"
time="2018-07-26T03:16:08Z" level=info msg="All records are already up to date"

PS I also made the mistake of using gcloud iam service-account add-iam-policy-binding which treats the service account as a _resource_, not an _identity_.

For me the only way to make external-dns work was adding scope "https://www.googleapis.com/auth/ndev.clouddns.readwrite" when creating GKE cluster. Even Owner role for nodes did not help. Cluster version 1.11.2-gke.18.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

I may have more information pertaining to this issue.

I am able to use cloud-dns secret key linked to an serviceAccount with --role roles/dns.admin permission configured on external-dns and cert-manager in us-east1. :+1:

However, I am unable to do this on europe-west1 for example and the node scope role "https://www.googleapis.com/auth/ndev.clouddns.readwrite" is necessary to make it work on the region otherwise with the credentials.json secret set just like in the us-east1 region I get the googleapi: Error 403: Forbidden, forbidden"

Just wanted to chime into the thread because it helped me and if someone sees this, don't worry you are not going crazy. This is something that the Google Cloud engineers have either added for extra security or the cloud-dns API authorization is currently broken in certain regions.

This seems to be a problem with IAM roles and permission in GKE/Google Cloud for certain regions. :point_left: :broken_heart:

/remove-lifecycle rotten

After further thinking about this, could the IP/DNS of the Coud DNS Google Service be different in different regions. Maybe external-dns has this IP/hostname hardcoded somewhere?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Looking through the code and comparing it to jetstacks cert-manager which also utilises a service account key to update cloud dns, it looks like a service account dns provider needs to be explicitly generated in order to auth.

https://github.com/jetstack/cert-manager/blob/07c34114e0ea737c35b9e54db87c3bf757ad2b13/pkg/issuer/acme/dns/clouddns/clouddns.go#L91

This works for GKE 1.14.7-gke.23 and ExternalDNS latest (v0.5.17)
For begin a had the same issue with messages in log
time="2019-11-19T08:36:41Z" level=error msg="googleapi: Error 403: Forbidden, forbidden"
But after looked to that deeper, i found out that my problem was in wrong configured google service account.
If you recreate SA, you have remove all iam roles binding and assign them to new service account again, even if SA has the same name.

@alekseydemidov What do you mean by "wrongly-configured service account"? Can you provide an example of a "properly-configured" one?

@alekseydemidov What do you mean by "wrongly-configured service account"? Can you provide an example of a "properly-configured" one?

I had the following issue, full steps were make by scripts using gcloud not web-ui

  • create SA and assign roles to that
  • after removing SA, you have to remove role assignment, if that did no do, and create new SA with the same name then new SA do not have correct permissions

So, that was my fault, i just forgot clear role assignment after removing SA and create new SA with same name that before

After wasting a number of hours on this issue I managed to get it working without any overly generous permissions.

  1. GKE cluster should be created with workload identity enabled. When using terraform we should use terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster with workload-identity
module "gke" {
  source     = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  project_id = module.variables.project_id
....
identity_namespace         = "${module.variables.project_id}.svc.id.goog"
}

module "my-workload-identity" {
  source              = "terraform-google-modules/kubernetes-engine/google//modules/workload-identity"
  name                = "${module.variables.name_prefix}-app-${terraform.workspace}"
  namespace           = "default"
  project_id          = module.variables.project_id
  use_existing_k8s_sa = false
}

  1. Oauth scopes should include the following
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/ndev.clouddns.readwrite",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
  1. Install external-dns based on the RBAC manifest here

  2. Add policy binding and service annotation

gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:project_id.svc.id.goog[default/external-dns]" \
  gsa_id@project_id.iam.gserviceaccount.com

kubectl annotate serviceaccount \
  --namespace default \
  external-dns \
  iam.gke.io/gcp-service-account=gsa_id@project_id.iam.gserviceaccount.com

# Test
kubectl run --rm -it \
  --generator=run-pod/v1 \
  --image google/cloud-sdk:slim \
  --serviceaccount external-dns \
  --namespace default \
  workload-identity-test

#gcloud auth list from inside the docker should print the service account
  1. Wait for the service token to refresh. external-dns will continue to report authentication error for few minutes before becoming fully functional

Note:

If your dns belongs to a different project, then manually create the service account in the other project and assign DNS administrator access.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Thanks for the clear steps @prabhu, I had trouble running this in a different namespace other than default and workload identity. I did the steps you mentioned from scratch and it worked as expected. Not sure if there may be some hard coded stuff regarding the token in this image: registry.opensource.zalan.do/teapot/external-dns:latest

Was this page helpful?
0 / 5 - 0 ratings

Related issues

naveeng68 picture naveeng68  路  4Comments

deimosfr picture deimosfr  路  3Comments

amalucelli picture amalucelli  路  4Comments

nyetwurk picture nyetwurk  路  4Comments

ysoldak picture ysoldak  路  3Comments