Kops: Unable to create a working cluster using m5.large, k8s 1.8.9, kops 1.8.1, stable image

Created on 16 Mar 2018 · 6Comments · Source: kubernetes/kops

I have tried some times to get a cluster running with kops, but I keep being unable to connect. The apparent same thing happened when I tried upgrading a cluster's master nodes from 3x [kops 1.8.0, k8s 1.8.6, image stable 2018-01-14] to 3x [kops 1.8.1, k8s 1.8.9, image stable 2018-02-08].

------------- BUG REPORT TEMPLATE --------------------

What kops version are you running? The command kops version, will display
this information.

kops 1.8.1

What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
kubectl 1.9.2
What cloud provider are you using?

aws

What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster --cloud=aws --name=test-1.company.domain --nodes=3 --node-size=m5.large --zones=eu-central-1a,eu-central-1b,eu-central-1c --master-zones=eu-central-1a,eu-central-1b,eu-central-1c --master-size=m5.large --vpc=vpc-xxxxxxx --ssh-public-key=.ssh/id_rsa_kubernetes.pub

kops edit cluster -> set subnet sizes to /22s

kops update cluster test-1.company.domain

What happened after the commands executed?

All instances were created, but the api servers were unavailable / did actively disallow connects. When trying to upgrade the masters of an existing cluster, the first master was restarted with a new config, but didn't connect to the other 2 masters and didn't seem to work.

What did you expect to happen?

The cluster should get stable and accessible from the user's point of view, or warn about being unable to perform certain operations.

Anything else do we need to know?

When comparing the master nodes of my half-upgraded cluster, I noticed that the network interface of the new master had its "Source/dest check" set to True, but the older ones had it set to "False". The new cluster also had that flag on all its network interfaces set to "True"

ssh using the private key of the public key given in create cluster didn't work, it rejected the connection.

Source

MMeent

Most helpful comment

I have fixed my problem: I didn't check the compatibility of the jessie image and the m5 instance types I was using. Upgrading to the stretch debian image fixed my problem.

MMeent on 19 Mar 2018

👍3

All 6 comments

I have fixed my problem: I didn't check the compatibility of the jessie image and the m5 instance types I was using. Upgrading to the stretch debian image fixed my problem.

MMeent on 19 Mar 2018

👍3

im on kops Version 1.9.0, and when i edit my config and use m5.large instance for nodes and master and then run rolling update with the new config, all kops did was to kill all my EC2 instances and told me its done. i went panic mode.....hahaha
can share some insights is there something that i missed out to get kops to work with AWS m5 instances?

badoet on 10 May 2018

There are multiple reasons I can think of that this could happen

are the m5.[nx]large-type nodes available in the region? This can be checked on e.g. the pricing list of ec2 instances (https://aws.amazon.com/ec2/pricing/, then select your region)
Did you use a m5-compatible image, and did you use kubernetes 1.9+?
Kubernetes 1.8 and lower do not support m5 volume mounting, so any pods started on m5-nodes do not have VolumeClaim support.
did your instances start? Do you see the instances in your ec2 console?

MMeent on 10 May 2018

yes m5 is in supported in Singapore region
i found the issue. i need to use the stretch debian base image to get it to work

badoet on 11 May 2018

Here is the exact command in case anybody is looking:

kops create cluster \
    --zones ${AWS_AVAILABILITY_ZONES} \
    --master-size m5.xlarge \
    --master-zones ${AWS_AVAILABILITY_ZONES} \
    --node-count 5 \
    --node-size m5.2xlarge \
    --image 379101102735/debian-stretch-hvm-x86_64-gp2-2018-06-13-59294 \
    --name mycluster.k8s.local \
    --yes

arun-gupta on 20 Jun 2018

👍1

You're using the upstream Stretch images, @arun-gupta? Does that work as expected? I don't know what magic kops add to its images (but I'm going to take a look right after posting this comment), but I would have expected they're created for a reason...

timstoop on 8 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Kubectl top nodes not working with the metrics server

minasys · 3Comments

Confusion with the AWS Route53 readme

yetanotherchris · 3Comments

Cycle Nodes

owenmorgan · 3Comments

kube-dns pods cannot be scheduled on master instances

georgebuckerfield · 4Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments