I have tried some times to get a cluster running with kops, but I keep being unable to connect. The apparent same thing happened when I tried upgrading a cluster's master nodes from 3x [kops 1.8.0, k8s 1.8.6, image stable 2018-01-14] to 3x [kops 1.8.1, k8s 1.8.9, image stable 2018-02-08].
------------- BUG REPORT TEMPLATE --------------------
kops version are you running? The command kops version, will displaykops 1.8.1
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
kubectl 1.9.2
What cloud provider are you using?
aws
kops create cluster --cloud=aws --name=test-1.company.domain --nodes=3 --node-size=m5.large --zones=eu-central-1a,eu-central-1b,eu-central-1c --master-zones=eu-central-1a,eu-central-1b,eu-central-1c --master-size=m5.large --vpc=vpc-xxxxxxx --ssh-public-key=.ssh/id_rsa_kubernetes.pub
kops edit cluster -> set subnet sizes to /22s
kops update cluster test-1.company.domain
All instances were created, but the api servers were unavailable / did actively disallow connects. When trying to upgrade the masters of an existing cluster, the first master was restarted with a new config, but didn't connect to the other 2 masters and didn't seem to work.
The cluster should get stable and accessible from the user's point of view, or warn about being unable to perform certain operations.
When comparing the master nodes of my half-upgraded cluster, I noticed that the network interface of the new master had its "Source/dest check" set to True, but the older ones had it set to "False". The new cluster also had that flag on all its network interfaces set to "True"
ssh using the private key of the public key given in create cluster didn't work, it rejected the connection.
I have fixed my problem: I didn't check the compatibility of the jessie image and the m5 instance types I was using. Upgrading to the stretch debian image fixed my problem.
im on kops Version 1.9.0, and when i edit my config and use m5.large instance for nodes and master and then run rolling update with the new config, all kops did was to kill all my EC2 instances and told me its done. i went panic mode.....hahaha
can share some insights is there something that i missed out to get kops to work with AWS m5 instances?
There are multiple reasons I can think of that this could happen
yes m5 is in supported in Singapore region
i found the issue. i need to use the stretch debian base image to get it to work
Here is the exact command in case anybody is looking:
kops create cluster \
--zones ${AWS_AVAILABILITY_ZONES} \
--master-size m5.xlarge \
--master-zones ${AWS_AVAILABILITY_ZONES} \
--node-count 5 \
--node-size m5.2xlarge \
--image 379101102735/debian-stretch-hvm-x86_64-gp2-2018-06-13-59294 \
--name mycluster.k8s.local \
--yes
You're using the upstream Stretch images, @arun-gupta? Does that work as expected? I don't know what magic kops add to its images (but I'm going to take a look right after posting this comment), but I would have expected they're created for a reason...
Most helpful comment
I have fixed my problem: I didn't check the compatibility of the jessie image and the m5 instance types I was using. Upgrading to the
stretchdebian image fixed my problem.