Cluster-api: Decide how fault/availability zones should be represented in Cluster API

Created on 12 Apr 2018 · 24Comments · Source: kubernetes-sigs/cluster-api

_From @rsdcastro on February 28, 2018 20:52_

This issue tracks how fault zones (or availability zones) should be represented in the Cluster API. Should it be a top-level field? Should we be able to specify provider specific config for them (separately from the current provider config)? This is both related to master and nodes that might have some affinity to resources specific to that zone.

@krousey @maisem @justinsb

_Copied from original issue: kubernetes/kube-deploy#627_

areapi lifecyclstale prioritbacklog

Source

rsdcastro

All 24 comments

And while not explicit above, this could also have some pod scheduling implications that need to be considered.

rsdcastro on 12 Apr 2018

_From @jessicaochen on February 28, 2018 22:16_

Something to keep in mind is that fault zone concepts even exist on-prem on bare metal. For example, I would assume even if one owns a datacenter, one would want workloads to be spread across racks and across networking hardware etc. so that no one failure will take the whole workload out. However fault zones are represented, it would be good if it allows to express multiple types of fault zones to spread across.

rsdcastro on 12 Apr 2018

_From @jhorwit2 on March 2, 2018 19:16_

However fault zones are represented, it would be good if it allows to express multiple types of fault zones to spread across.

@jessicaochen The scheduler spread predicate by default only takes into account failure-domain.beta.kubernetes.io/region and failure-domain.beta.kubernetes.io/zone so supporting more than that out of the box would require you to do custom stuff with affinity/anti-affinity on the pod itself. We use those two labels for our on-prem clusters as well.

That also matches the pattern other core resources, like the cloud provider interface, do as well. They only support zone & region.

rsdcastro on 12 Apr 2018

_From @roberthbailey on March 6, 2018 21:57_

Strawman: Leave it inside the existing provider config for now. See if common patterns emerge that would cause us to promote it it a top level field.

Looking at GCP (should be the same for any cloud), you will already specify how you want machines spread across availability zones by putting the zone for each machine (or set) in the provider config. This works for masters just as well as workers.

What is the advantage of duplicating the data at the top level (and then needing to ensure it stays in sync)? What environment-independent form do you imagine that this would take? As @jessicaochen points out, on prem you would likely want to specify failure domains based on racks (TOR redundancy), or power domains, or maybe "clusters" in a datacenter, whereas on clouds you are really only given the option for the "cluster" level selection in the APIs.

rsdcastro on 12 Apr 2018

_From @krousey on March 6, 2018 22:4_

@roberthbailey Just to clarify your strawman, you're saying that if I want 12 nodes across 3 availability zones I should have 3 machine sets of count 4 with the proper zone in each template's provider config?

rsdcastro on 12 Apr 2018

_From @roberthbailey on March 6, 2018 22:14_

Yes. I think that way makes the most sense. I like having it explicitly in the machineset / deployment instead of having a hidden internal multiplier that you have to change indirectly by tweaking availability zones in the machine set.

rsdcastro on 12 Apr 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 28 Apr 2019

/remove-lifecycle stale

vincepri on 28 Apr 2019

/area api

vincepri on 10 Jun 2019

cc @juan-lee @cecilerobertmichon

alexeldeib on 11 Jun 2019

👍1

@timothysc @detiber @ncdc are you aware of anyone working on this?

alexeldeib on 12 Jun 2019

I'm not aware of any activity

ncdc on 12 Jun 2019

I have a draft proposal for this. Will share before Wednesday next week.

As so much is in flux with the data model and implementation, I am choosing to define an approach without taking any stance on specific data model changes. Naturally that means the proposal is incomplete until we expose the knob(s) for users to tune.

Still, I think I've managed to attain a nugget of value in sketching out the story. Will update back with the proposal.

alexeldeib on 15 Jun 2019

👍1

@ncdc @vincepri @detiber collected some of my thoughts, looking for a gut check

Topology Fault Tolerance

alexeldeib on 17 Jun 2019

👍1

/assign @alexeldeib @timothysc

We may not get to this in v1alpha2 but we'll try to make a decision during this timeframe.

timothysc on 2 Jul 2019

👍1

/milestone Next

vincepri on 19 Aug 2019

/unassign

alexeldeib on 19 Aug 2019

There is a tentative proposal for how to do this wrt control plane management here: https://github.com/kubernetes-sigs/cluster-api/issues/1647

detiber on 6 Dec 2019

Should we close this one in favor of the proposal?

vincepri on 6 Dec 2019

@vincepri I don't think so, this one is about deciding how to handle it, and the proposal is a concrete proposal. If we accept the proposal, then I think that would qualify as meeting the requirements to close this issue.

detiber on 6 Dec 2019

AFAIK, the proposal is in the milestone and waiting to be implemented, there wasn't much push back on it

vincepri on 6 Dec 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 5 Mar 2020

/close
closing, since we accepted the above mentioned failure domain proposal

detiber on 5 Mar 2020

@detiber: Closing this issue.

In response to this:

/close
closing, since we accepted the above mentioned failure domain proposal

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.