kops version:
Version 1.5.3
Kubernetes:
kubernetesVersion: 1.5.2
Provider AWS.
I followed https://github.com/kubernetes/kops/blob/master/docs/aws.md.
Worked in the past with 1.4.8 k8s version and kops 1.4.4.
I'm miss something?
Thanks in advance
protokube:1.5.3 master logs:
I0327 13:09:46.349907 1 aws_volume.go:63] AWS API Request: ec2/DescribeVolumes
I0327 13:09:46.507239 1 tainter.go:53] Querying k8s for nodes with selector "kubernetes.io/role=master"
W0327 13:09:46.508558 1 kube_boot.go:117] error updating master taints: error querying nodes: Get http://localhost:8080/api/v1/nodes?labelSelector=kubernetes.io%2Frole%3Dmaster: dial tcp [::1]:8080: getsockopt: connection refused
I0327 13:09:46.508666 1 kube_boot.go:142] ensuring that kubelet systemd service is running
I0327 13:09:46.512474 1 channels.go:47] checking channel: "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.512575 1 channels.go:34] Running command: channels apply channel s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml --v=4 --yes
I0327 13:09:46.823853 1 channels.go:37] error running channels apply channel s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml --v=4 --yes:
I0327 13:09:46.824042 1 channels.go:38] I0327 13:09:46.650545 61 root.go:89] No client config found; will use default config
I0327 13:09:46.650822 61 addons.go:36] Loading addons channel from "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.765753 61 s3context.go:114] Found bucket "k8s-staging-state-store" in region "us-east-1"
I0327 13:09:46.765802 61 s3fs.go:162] Reading file "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
error checking for required update: error querying namespace "kube-system": Get http://localhost:8080/api/v1/namespaces/kube-system: dial tcp [::1]:8080: getsockopt: connection refused
I0327 13:09:46.825830 1 channels.go:50] apply channel output was: I0327 13:09:46.650545 61 root.go:89] No client config found; will use default config
I0327 13:09:46.650822 61 addons.go:36] Loading addons channel from "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
I0327 13:09:46.765753 61 s3context.go:114] Found bucket "k8s-staging-state-store" in region "us-east-1"
I0327 13:09:46.765802 61 s3fs.go:162] Reading file "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml"
error checking for required update: error querying namespace "kube-system": Get http://localhost:8080/api/v1/namespaces/kube-system: dial tcp [::1]:8080: getsockopt: connection refused
W0327 13:09:46.825959 1 kube_boot.go:131] error applying channel "s3://k8s-staging-state-store/k8s.staging.xxx.com/addons/bootstrap-channel.yaml": error running channels: exit status 1
Stuck master with:
8145e83cf80 gcr.io/google_containers/kube-controller-manager:v1.5.2 "/bin/sh -c '/usr/loc" 11 minutes ago Up 11 minutes k8s_kube-controller-manager.df9be02c_kube-controller-manager-ip-172-20-56-37.ec2.internal_kube-system_d18429837a430d706f507b7f2ff2acd2_aea7212d
abba22a71fe0 gcr.io/google_containers/kube-scheduler:v1.5.2 "/bin/sh -c '/usr/loc" 11 minutes ago Up 11 minutes k8s_kube-scheduler.7f910df5_kube-scheduler-ip-172-20-56-37.ec2.internal_kube-system_d2c1003b8ede2fdc54bd4e080a69493e_a58938a9
a17c7bb08343 gcr.io/google_containers/kube-proxy:v1.5.2 "/bin/sh -c 'echo -99" 12 minutes ago Up 11 minutes k8s_kube-proxy.ce5415e7_kube-proxy-ip-172-20-56-37.ec2.internal_kube-system_8bf935143bf960de7c17908df3b36c83_51cea76e
78400919e09e gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_kube-controller-manager-ip-172-20-56-37.ec2.internal_kube-system_d18429837a430d706f507b7f2ff2acd2_2bd243ee
d3a93ffe7f70 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_kube-apiserver-ip-172-20-56-37.ec2.internal_kube-system_6091e94ddbe1685a7083e6df281698d6_0775b9a3
d918bbeaa632 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_etcd-server-ip-172-20-56-37.ec2.internal_kube-system_8188e100d3bf0d380bf077e521db91a1_a8f003ec
c75a8ebd1c97 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_etcd-server-events-ip-172-20-56-37.ec2.internal_kube-system_a28516be697cdf251b8b7f363bb37ef0_87104a10
a9b6bae0e432 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_kube-scheduler-ip-172-20-56-37.ec2.internal_kube-system_d2c1003b8ede2fdc54bd4e080a69493e_6ee58aea
29d04a83b786 gcr.io/google_containers/pause-amd64:3.0 "/pause" 12 minutes ago Up 12 minutes k8s_POD.d8dbe16c_kube-proxy-ip-172-20-56-37.ec2.internal_kube-system_8bf935143bf960de7c17908df3b36c83_23506289
7151f25e3d2a protokube:1.5.3 "/usr/bin/protokube -" 12 minutes ago Up 12 minutes thirsty_pasteur
@jorge07 - I had the same issue today getting my cluster up and running. I initially tried manually changing the route53 dns to the correct ec2 external IPs, which nearly worked, but seemed to cause a slew of other issues with the internal DNS of the containers.
If you just wait after creating the cluster, kops will eventually figure out the correct IP itself, and properly setup the internal dns. For me it takes around 5-15 minutes for the cluster to properly set its route 53 IPs.
I was waiting for an hour and 10 without luck... I'll try again tomorrow, let see if it works. Thanks
It was impossible to setup the cluster with kops in AWS. The k8s master containers keeps pause forever.
Version 1.5.3kubernetesVersion: 1.5.2I created a cluster with another tool (stackpoint) to see if it was something wrong related with our VPC or account and it was created successfully.
Any ideas or recommendations to debug in deep?
Running into the same issue with 1.5.3
Is there any way to manually re-trigger DNS configuration?
No idea but are your k8s containers up and running in the master?
yeah, the containers are fine, it's just DNS that is broken
Seems duplicated https://github.com/kubernetes/kops/issues/1599
Just had the same issue, turns out the masters/nodes didn't have access to the S3 state bucket (we had a custom bucket policy that blocked it), and that prevented the master nodes to bootstrap correctly. After fixing the bucket policy the masters and all other DNS entries registered properly.
Try to create the same cluster in a clean zone and no issues found. I'll close the issue but I still dont know the root of the problem.
Hi @jorge07, same issue, deleting and rebuilding clean several times (clean S3 bucket, clean hosted zone, new cluster, etc.)
The nodes come up (are visible in the EC2 console per my config), but the master doesn't come up, so all kubectl commands timeout. No errors on creating and deploying the cluster.
1.5.3
I created the cluster in another zone without issues. Still don't know the root of the cause...
No errors creating cluster, but kubectil and kops validate cluster $NAME keep timing out, dns record ips are no updating and instead using a placeholder. Any suggestions moving forwards?
I'm having the same issue. I am creating a cluster for the first time and after the command terminates, I am seeing the following records in route53:
api.staging.mydomain.com. A 203.0.113.123
api.internal.staging.mydomain.com. A 203.0.113.123
etcd-a.internal.staging.mydomain.com. A 203.0.113.123
etcd-events-a.internal.staging.mydomain.com. A 203.0.113.123
So I assume 203.0.113.123 is a place holder and the real records are supposed to be set when the EC2 instances boot? Unfortunately, the instances have been running for a while now and the records still haven't been updated.
Edit: oops, I waited a little longer and records were finally updated! :)
So I checked the master nodes with journalctl and trying to figure out what went wrong. I could not find out what happened exactly, except perhaps something with the docker bridge setup after downloading the protokube. Note that is the only image loaded and it seems like the process stopped there.
TL/DR I used the latest pre-release
Version 1.6.0-beta.1 (git-77f222d31)
and did not experience this problem further.
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout '/etc/sysconfig/network-scripts/ec2net.hotplug'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout 'bridge-network-interface'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout 'net.agent'
May 09 19:05:32 ip-172-20-53-219 systemd-udevd[597]: timeout '/lib/systemd/systemd-sysctl --prefix=/proc/sys/net/ipv4/conf/
I am going to close as you got past the problem with the latest release, please reopen if you need to.
we have the same issue with kops 1.7.0 :(. Please consider reopening the issue.
we have the same issue with kops 1.6.1 :(. Please consider reopening the issue.
same story on 1.6.1
Same story for us too
I am also having this issue as we speak.
Version 1.7.0 (git-e04c29d)
Not sure if its my configurable variables or that i have been building it and tearing it down repeatedly getting used to things or what it is.
It did manage to set etcd-a.internal etcd-events-a.. but the master dns value for api and api.internal is not getting updated.. whatever triggers that dns update does not seem to come into place.. any idea what i might be doing wrong?
I waited the better part of an hour on this.. still nothing.. to be clear.. this isn't a propagation issue as the actual dns records themselves are still set to the initialized ip address..
I will re-open this issue, but this is typically a problem with DNS. If someone can get us more logging, we can investigate this more.
A couple notes on diagnosis.
Set the --dns-zone to your route53 id. You can also set this value in the API with a kops edit cluster command.
Get the id with:
aws route53 list-hosted-zones | jq '.HostedZones[] | select(.Name=="subdomain.example.com.") | .Id'`
Replace subdomain.example.com with your cluster DNS subdomain and domain name. Use only the zone id component of the string provided which is after the forward slash.
This test needs to pass dig ns subdomain.example.com. See here for more information https://github.com/kubernetes/kops/blob/master/docs/aws.md#configure-dns
@Hypnocrit please use kops 1.7.1, as this is a release that containers a CVE fix.
I figured out what was causing my problem.
I was telling the system to use t2.nano machine sizes. I am going to wager that if i dug into those machines a little deeper i would have likely seen resource fatigue.. or some other reason the machine failed to make it to the DNS A record set for the api and api internal binding.. related to machine size.
Just to confirm this I set it back to use nano machines over night and let it run all night. It did not complete. Prior to leaving the office yesterday I was tinkering with the configs assuming that perhaps one of my configs may be damaging this. I had those settings off, and it did come in after about 10 minutes.
It looks like this one is wrapped up, so I'm going to close it.
If you continue to have problems using latest released version of kops (currently 1.7.1), please create a new issue and include your cluster spec and probably the logs from protokube and we'll try to help resolve this! Using tiny instances does tend to delay the cluster startup significantly, but TBH, even with more hefty instances, it still takes a few minutes (5? 7? 10?) to become "sentient"- and the length of time it takes heavily depends on a few things including your networking layout.
Issue is happening in 1.9.0
Had this issue with 1.9.0 too.
Confirming that kops cannot boot up and change the k8s API DNS records with a t2.nano master and a t2.nano node. Changing both to t2.micro got kops validate cluster working within one minute.
Have the same issue in 1.9.1 and the fix which
"Updates kube-dns to 1.14.10, fixes problems with externalName services. (Thanks @jjo)"
still not work.
Any updates on this? if not.
I don't have any ELB/NLB created after this so im guessing this should point to one of the masters?
Guys i got some success by delegating a zone , so it is resolvable from the TLDs.
Basically the name of the cluster has to be resolvable from the TLDs , i hope that makes sense.
Found the issue related to C5/M5 AWS instances, sorry didn't know that before.
Just sorted out the images for VMs, chosen --image="kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-05-27" for "C" and "M" type instances, now it works.
https://github.com/kubernetes/kops/blob/master/docs/releases/1.8-NOTES.md
AWS:
New instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes v1.9.
Support for root provisioned IOPS.
Properly tag public and private subnets for ELB creation in advanced network topologies
Use SSL in ELB API server health check
Thank you.
Just had the same issue, turns out the masters/nodes didn't have access to the S3 state bucket (we had a custom bucket policy that blocked it), and that prevented the master nodes to bootstrap correctly. After fixing the bucket policy the masters and all other DNS entries registered properly.
What s3 bucket are you talking about?
Most helpful comment
Issue is happening in 1.9.0