Kops: api and api.internal dns doesn't replace 203.0.113.123

Created on 27 Mar 2017 · 34Comments · Source: kubernetes/kops

kops version:
Version 1.5.3

Kubernetes:
kubernetesVersion: 1.5.2

Provider AWS.

I followed https://github.com/kubernetes/kops/blob/master/docs/aws.md.

Worked in the past with 1.4.8 k8s version and kops 1.4.4.

I'm miss something?

Thanks in advance

Source

jorge07

👍1

Most helpful comment

Issue is happening in 1.9.0

john-delivuk on 27 Apr 2018

👍9

All 34 comments

protokube:1.5.3 master logs:

I0327 13:09:46.349907       1 aws_volume.go:63] AWS API Request: ec2/DescribeVolumes
I0327 13:09:46.507239       1 tainter.go:53] Querying k8s for nodes with selector "kubernetes.io/role=master"
W0327 13:09:46.508558       1 kube_boot.go:117] error updating master taints: error querying nodes: Get http://localhost:8080/api/v1/nodes?labelSelector=kubernetes.io%2Frole%3Dmaster: dial tcp [::1]:8080: getsockopt: connection refused
I0327 13:09:46.508666       1 kube_boot.go:142] ensuring that kubelet systemd service is running
I0327 13:09:46.512474       1 channels.go:47] checking channel: "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.512575       1 channels.go:34] Running command: channels apply channel s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml --v=4 --yes
I0327 13:09:46.823853       1 channels.go:37] error running channels apply channel s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml --v=4 --yes:
I0327 13:09:46.824042       1 channels.go:38] I0327 13:09:46.650545      61 root.go:89] No client config found; will use default config
I0327 13:09:46.650822      61 addons.go:36] Loading addons channel from "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.765753      61 s3context.go:114] Found bucket "k8s-staging-state-store" in region "us-east-1"
I0327 13:09:46.765802      61 s3fs.go:162] Reading file "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"

error checking for required update: error querying namespace "kube-system": Get http://localhost:8080/api/v1/namespaces/kube-system: dial tcp [::1]:8080: getsockopt: connection refused
I0327 13:09:46.825830       1 channels.go:50] apply channel output was: I0327 13:09:46.650545      61 root.go:89] No client config found; will use default config
I0327 13:09:46.650822      61 addons.go:36] Loading addons channel from "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.765753      61 s3context.go:114] Found bucket "k8s-staging-state-store" in region "us-east-1"
I0327 13:09:46.765802      61 s3fs.go:162] Reading file "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"

error checking for required update: error querying namespace "kube-system": Get http://localhost:8080/api/v1/namespaces/kube-system: dial tcp [::1]:8080: getsockopt: connection refused
W0327 13:09:46.825959       1 kube_boot.go:131] error applying channel "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml": error running channels: exit status 1

jorge07 on 27 Mar 2017

Stuck master with:

8145e83cf80        gcr.io/google_containers/kube-controller-manager:v1.5.2   "/bin/sh -c '/usr/loc"   11 minutes ago      Up 11 minutes                           k8s_kube-controller-manager.df9be02c_kube-controller-manager-ip-172-20-56-37.ec2.internal_kube-system_d18429837a430d706f507b7f2ff2acd2_aea7212d
abba22a71fe0        gcr.io/google_containers/kube-scheduler:v1.5.2            "/bin/sh -c '/usr/loc"   11 minutes ago      Up 11 minutes                           k8s_kube-scheduler.7f910df5_kube-scheduler-ip-172-20-56-37.ec2.internal_kube-system_d2c1003b8ede2fdc54bd4e080a69493e_a58938a9
a17c7bb08343        gcr.io/google_containers/kube-proxy:v1.5.2                "/bin/sh -c 'echo -99"   12 minutes ago      Up 11 minutes                           k8s_kube-proxy.ce5415e7_kube-proxy-ip-172-20-56-37.ec2.internal_kube-system_8bf935143bf960de7c17908df3b36c83_51cea76e
78400919e09e        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_kube-controller-manager-ip-172-20-56-37.ec2.internal_kube-system_d18429837a430d706f507b7f2ff2acd2_2bd243ee
d3a93ffe7f70        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_kube-apiserver-ip-172-20-56-37.ec2.internal_kube-system_6091e94ddbe1685a7083e6df281698d6_0775b9a3
d918bbeaa632        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_etcd-server-ip-172-20-56-37.ec2.internal_kube-system_8188e100d3bf0d380bf077e521db91a1_a8f003ec
c75a8ebd1c97        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_etcd-server-events-ip-172-20-56-37.ec2.internal_kube-system_a28516be697cdf251b8b7f363bb37ef0_87104a10
a9b6bae0e432        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_kube-scheduler-ip-172-20-56-37.ec2.internal_kube-system_d2c1003b8ede2fdc54bd4e080a69493e_6ee58aea
29d04a83b786        gcr.io/google_containers/pause-amd64:3.0                  "/pause"                 12 minutes ago      Up 12 minutes                           k8s_POD.d8dbe16c_kube-proxy-ip-172-20-56-37.ec2.internal_kube-system_8bf935143bf960de7c17908df3b36c83_23506289
7151f25e3d2a        protokube:1.5.3                                           "/usr/bin/protokube -"   12 minutes ago      Up 12 minutes                           thirsty_pasteur

jorge07 on 27 Mar 2017

@jorge07 - I had the same issue today getting my cluster up and running. I initially tried manually changing the route53 dns to the correct ec2 external IPs, which nearly worked, but seemed to cause a slew of other issues with the internal DNS of the containers.

If you just wait after creating the cluster, kops will eventually figure out the correct IP itself, and properly setup the internal dns. For me it takes around 5-15 minutes for the cluster to properly set its route 53 IPs.

calebfavor on 27 Mar 2017

I was waiting for an hour and 10 without luck... I'll try again tomorrow, let see if it works. Thanks

jorge07 on 28 Mar 2017

It was impossible to setup the cluster with kops in AWS. The k8s master containers keeps pause forever.

kops version: Version 1.5.3
Kubernetes: kubernetesVersion: 1.5.2

I created a cluster with another tool (stackpoint) to see if it was something wrong related with our VPC or account and it was created successfully.

Any ideas or recommendations to debug in deep?

jorge07 on 28 Mar 2017

Running into the same issue with 1.5.3
Is there any way to manually re-trigger DNS configuration?

bsod90 on 28 Mar 2017

No idea but are your k8s containers up and running in the master?

jorge07 on 28 Mar 2017

yeah, the containers are fine, it's just DNS that is broken

bsod90 on 28 Mar 2017

Seems duplicated https://github.com/kubernetes/kops/issues/1599

jorge07 on 29 Mar 2017

Just had the same issue, turns out the masters/nodes didn't have access to the S3 state bucket (we had a custom bucket policy that blocked it), and that prevented the master nodes to bootstrap correctly. After fixing the bucket policy the masters and all other DNS entries registered properly.

wedgeV on 29 Mar 2017

Try to create the same cluster in a clean zone and no issues found. I'll close the issue but I still dont know the root of the problem.

jorge07 on 29 Mar 2017

Hi @jorge07, same issue, deleting and rebuilding clean several times (clean S3 bucket, clean hosted zone, new cluster, etc.)

The nodes come up (are visible in the EC2 console per my config), but the master doesn't come up, so all kubectl commands timeout. No errors on creating and deploying the cluster.

1.5.3

richburdon on 19 Apr 2017

I created the cluster in another zone without issues. Still don't know the root of the cause...

jorge07 on 20 Apr 2017

No errors creating cluster, but kubectil and kops validate cluster $NAME keep timing out, dns record ips are no updating and instead using a placeholder. Any suggestions moving forwards?

alejandroEsc on 9 May 2017

I'm having the same issue. I am creating a cluster for the first time and after the command terminates, I am seeing the following records in route53:

api.staging.mydomain.com. A 203.0.113.123
api.internal.staging.mydomain.com. A 203.0.113.123
etcd-a.internal.staging.mydomain.com. A 203.0.113.123
etcd-events-a.internal.staging.mydomain.com. A 203.0.113.123

So I assume 203.0.113.123 is a place holder and the real records are supposed to be set when the EC2 instances boot? Unfortunately, the instances have been running for a while now and the records still haven't been updated.

Edit: oops, I waited a little longer and records were finally updated! :)

olalonde on 10 May 2017

So I checked the master nodes with journalctl and trying to figure out what went wrong. I could not find out what happened exactly, except perhaps something with the docker bridge setup after downloading the protokube. Note that is the only image loaded and it seems like the process stopped there.

TL/DR I used the latest pre-release

Version 1.6.0-beta.1 (git-77f222d31)

and did not experience this problem further.

May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout '/etc/sysconfig/network-scripts/ec2net.hotplug'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout 'bridge-network-interface'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout 'net.agent'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout '/lib/systemd/systemd-sysctl --prefix=/proc/sys/net/ipv4/conf/

alejandroEsc on 10 May 2017

I am going to close as you got past the problem with the latest release, please reopen if you need to.

chrislovecnm on 20 May 2017

👎12

we have the same issue with kops 1.7.0 :(. Please consider reopening the issue.

viroos on 30 Aug 2017

we have the same issue with kops 1.6.1 :(. Please consider reopening the issue.

ankitkl on 31 Aug 2017

same story on 1.6.1

onokonem on 5 Sep 2017

Same story for us too

evnsio on 16 Oct 2017

I am also having this issue as we speak.

Version 1.7.0 (git-e04c29d)

Not sure if its my configurable variables or that i have been building it and tearing it down repeatedly getting used to things or what it is.

It did manage to set etcd-a.internal etcd-events-a.. but the master dns value for api and api.internal is not getting updated.. whatever triggers that dns update does not seem to come into place.. any idea what i might be doing wrong?

I waited the better part of an hour on this.. still nothing.. to be clear.. this isn't a propagation issue as the actual dns records themselves are still set to the initialized ip address..

Hypnocrit on 26 Oct 2017

I will re-open this issue, but this is typically a problem with DNS. If someone can get us more logging, we can investigate this more.

protokube is a container running on each master. This container makes AWS Route53 API calls and sets the DNS for Etcd endpoints. Please look at protokube logs for more details.
dns-controller deployment in kube-system sets the DNS API endpoints. Again the dns-controller makes Route53 DNS API calls to set the DNS for the clusters API server. Please look at those logs for more details.

A couple notes on diagnosis.

Set the --dns-zone to your route53 id. You can also set this value in the API with a kops edit cluster command.

Get the id with:

aws route53 list-hosted-zones | jq '.HostedZones[] | select(.Name=="subdomain.example.com.") | .Id'`

Replace subdomain.example.com with your cluster DNS subdomain and domain name. Use only the zone id component of the string provided which is after the forward slash.

This test needs to pass dig ns subdomain.example.com. See here for more information https://github.com/kubernetes/kops/blob/master/docs/aws.md#configure-dns

chrislovecnm on 26 Oct 2017

@Hypnocrit please use kops 1.7.1, as this is a release that containers a CVE fix.

chrislovecnm on 26 Oct 2017

I figured out what was causing my problem.

I was telling the system to use t2.nano machine sizes. I am going to wager that if i dug into those machines a little deeper i would have likely seen resource fatigue.. or some other reason the machine failed to make it to the DNS A record set for the api and api internal binding.. related to machine size.

Just to confirm this I set it back to use nano machines over night and let it run all night. It did not complete. Prior to leaving the office yesterday I was tinkering with the configs assuming that perhaps one of my configs may be damaging this. I had those settings off, and it did come in after about 10 minutes.

Hypnocrit on 26 Oct 2017

👍1

It looks like this one is wrapped up, so I'm going to close it.

If you continue to have problems using latest released version of kops (currently 1.7.1), please create a new issue and include your cluster spec and probably the logs from protokube and we'll try to help resolve this! Using tiny instances does tend to delay the cluster startup significantly, but TBH, even with more hefty instances, it still takes a few minutes (5? 7? 10?) to become "sentient"- and the length of time it takes heavily depends on a few things including your networking layout.

geojaz on 26 Oct 2017

Issue is happening in 1.9.0

john-delivuk on 27 Apr 2018

👍9

Had this issue with 1.9.0 too.

khaliddermoumi on 21 May 2018

Confirming that kops cannot boot up and change the k8s API DNS records with a t2.nano master and a t2.nano node. Changing both to t2.micro got kops validate cluster working within one minute.

charklewis on 24 May 2018

❤2 🚀1 🎉1 👍1

Have the same issue in 1.9.1 and the fix which
"Updates kube-dns to 1.14.10, fixes problems with externalName services. (Thanks @jjo)"
still not work.

ssnigorovskyi on 6 Jun 2018

Any updates on this? if not.
I don't have any ELB/NLB created after this so im guessing this should point to one of the masters?

bechampion on 8 Jun 2018

Guys i got some success by delegating a zone , so it is resolvable from the TLDs.

Basically the name of the cluster has to be resolvable from the TLDs , i hope that makes sense.

bechampion on 8 Jun 2018

😕1

Found the issue related to C5/M5 AWS instances, sorry didn't know that before.
Just sorted out the images for VMs, chosen --image="kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-05-27" for "C" and "M" type instances, now it works.
https://github.com/kubernetes/kops/blob/master/docs/releases/1.8-NOTES.md

AWS:
New instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.
Support for root provisioned IOPS.
Properly tag public and private subnets for ELB creation in advanced network topologies
Use SSL in ELB API server health check

Thank you.

ssnigorovskyi on 8 Jun 2018

Just had the same issue, turns out the masters/nodes didn't have access to the S3 state bucket (we had a custom bucket policy that blocked it), and that prevented the master nodes to bootstrap correctly. After fixing the bucket policy the masters and all other DNS entries registered properly.

What s3 bucket are you talking about?

farrukh90 on 1 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

CoreDNS externalCoreFile Parsing Invalid - Indentation

joshbranham · 3Comments

Confusion with the AWS Route53 readme

yetanotherchris · 3Comments

Cycle Nodes

owenmorgan · 3Comments

Is it possible to use Custom Admission Controllers/API server flags in kops using Terraform export?

thejsj · 4Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments