Kops: Could we have a single master instance group in multiple AZs?

Created on 11 Jan 2017 · 17Comments · Source: kubernetes/kops

I'd like to create a cluster with 3 master nodes and 3 workers in existing vpc with existing private subnets. All nodes, master and workers should be in the same subnet. The reason? It's not reusing NATgateways as far as I can see, so the first cluster provisioning going to hit all the sort of aws limits.

Is it possible to do with current version?

Also, what's the reason 3 ASG being created?
Actually 4! 1 per master plus one for 3 workers. Makes no sense.

lifecyclrotten

Source

rokka-n

Most helpful comment

Also, a note on tone: in the kubernetes community, we strive to be technical and polite. This phrasing here could have been more polite. But much more problematically, the issue was not expressed technically, and so it's not clear how we ever fix & close this issue i.e. what are the exit criteria. I'm going to change the title to what I believe the outstanding issue to be. Please feel free to open more issues, but focused issues with technical criteria are appreciated, even if that means opening more issues. For example, I'm not sure whether it is possible to have a single NAT gateway with multiple AZs, and I don't want that to be lost, so I'm going to open an issue for that.

Thanks!

justinsb on 12 Jan 2017

❤5

All 17 comments

  topology:
    dns:
      type: Public
    masters: private
    nodes: private

And yet it creates a public loadbalancer for master :(

rokka-n on 11 Jan 2017

Our design assumes that we need to house a master in on az, with that assumption we need a asg per master, one for a bastion, and another for the nodes. With 1.5 you have fine grain control of this. You can edit the instance groups to your hearts content.

We have noticed about the subnet, and that is an enhancement that we would like to see.

What would you like?

chrislovecnm on 11 Jan 2017

Actually, subnets question is solved.
I simply added in the config.yaml subnet ids, both for private (workers) and public (masters).
It seems that having API exposed is standard practice for k8s, makes lots of sense now to me.

Still not clear why 1 ASG per master, though. ASG can manage instances perfectly fine in multiple azs, it's a standard practice.

rokka-n on 11 Jan 2017

1 asg per az guarantees 1 node per region. Is there a way to ensure that the masters are spread across each az all the time?

chrislovecnm on 11 Jan 2017

Of course, that's what ASGs are for!
http://docs.aws.amazon.com/autoscaling/latest/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ

rokka-n on 11 Jan 2017

@justinsb can you comment here? I think we created 3 asg for some edge cases that I do not recall.

chrislovecnm on 11 Jan 2017

The edge case is probably called 'kubernetes"
Lol

rokka-n on 11 Jan 2017

Probably called etcd and ebs volumes.

chrislovecnm on 12 Jan 2017

I must be missing some implementation details, but what's special about those volumes?

rokka-n on 12 Jan 2017

Etcd is super critical and we should attempt to keep the same volumes in the same az, through az failures. Need @justinsb to chime in on this one though.

chrislovecnm on 12 Jan 2017

We use 3 single-AZ ASGs so we get a stronger guarantee from AWS. We don't benefit from having multiple master instances in the same AZ because they must each mount an EBS volume. Further, some people have asked to have 2 masters in the same zone, and the 3rd in another. I don't know how _that_ would be expressed in one ASG.

But... I actually think our model is rich enough to express a single master InstanceGroup in 3 AZs. We probably have some sanity checks, but it might be interesting to turn them off and see what happens. How is your go code?

justinsb on 12 Jan 2017

👍1

Thanks!

justinsb on 12 Jan 2017

❤5

More on only having 1 subnet https://github.com/kubernetes/kops/issues/1368

Will point the other issue as well.

kris-nova on 12 Jan 2017

I think it should be set something like this in aws:

one master ASG with at least 3 azs (min number for etcd cluster) will provide HA and reliability for master servers. Servers will be distributed evenly across azs
I believe mater's ASG healthcheck type should be ELB, not EC2 (service might cease to run, but EC2 still be up and running with passing check)
EBS volumes should not be considered as a persistent storage (99.999% availability according to aws docs). If etcd data is so critical, the system should make snapshots and save it to s3.
NAT gateway must be provisioned 1 per zone
> From aws docs: To create an Availability Zone-independent architecture, create a NAT gateway in each Availability Zone and configure your routing to ensure that resources use the NAT gateway in the same Availability Zone
all private subnets in one zone must use the same NAT Gateway (cost-saving and simplicity)

Also "instance group" is used in AWS for EMR (managed hadoop) service. It can be named simply as autoscaling group, so aws users won't get confused.

rokka-n on 12 Jan 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot on 20 Dec 2017

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 19 Jan 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 18 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings