Cluster-api: Easier, simplified UX for cluster and machine creation

Created on 5 Aug 2019 · 36Comments · Source: kubernetes-sigs/cluster-api

/kind feature

Describe the solution you'd like
In v1alpha1, you typically have to create two resources to have a minimally functional Kubernetes cluster: a Cluster and a Machine. Anything that is provider-specific, from kubeadm configuration to infrastructure details such as instance type are stored in providerSpec in each resource.

In v1alpha2, now that providerSpec has moved to a standalone CRD, you have to create the following to get a functional cluster (using CAPA and kubeadm as an example):

Cluster
AWSCluster
Machine
AWSMachine
KubeadmConfig

This is five resources now, instead of two.

I'm thinking we could simplify the UX by allowing, either via opt-in or opt-out, defaulting for both bootstrap and infrastructure resources. For example, with an opt-in flow, we could do something like this (annotation names TBD):

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  annotations:
    boostrap.cluster.x-k8s.io/default: kubeadm
    infrastructure.cluster.x-k8s.io/default: aws
    aws.infrastructure.cluster.x-k8s.io/region: us-east-1
spec:
  # ...

apiVersion: cluster.x-k8s.io
kind: Machine
metadata:
  namespace: foo
  name: controlplane-0
  labels:
    cluster.x-k8s.io/cluster-name: my-cluster
spec:
  bootstrap:
    configRef: nil
    data: nil
  infrastructureRef: nil/empty

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in. The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations. This way, the user probably could go back to only creating two resources -- Cluster and Machine -- and end up with a functional cluster.

The alternative to opt-in would be opt-out, which essentially means that a bootstrap and infra provider could do the logic mentioned in the previous paragraph, except the annotations wouldn't be required. That might end up being problematic (what happens if two infra providers both try to be the default?), but I figure it's worth mentioning as an alternative.

WDYT?

areux help wanted kindesign kinfeature lifecyclfrozen prioritbacklog

Source

ncdc

❤1 👍1

All 36 comments

This is really interesting! I really like where you're going with this as a concept. Beyond what seems to me as providing a good experience, it would make (more) possible an idea that I've been not-so-seriously kicking around: having a control plane from one provider (say, a dynamically size-able cloud) manage nodes from a different one (say, expensive bare metal).

One detail I'm wondering about is:

aws.infrastructure.cluster.x-k8s.io/region: us-east-1

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

sethp-nr on 5 Aug 2019

How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?

Ideally, each provider that wants to support this would expose an absolute minimum number of annotations. Anything beyond a strict minimum must force the user to create their own resources (AWSMachine, etc).

Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?

I feel like this is AWSCluster and AWSMachine, which we already have, and which are the full-feature CRDs you'd fill out. Right?

ncdc on 5 Aug 2019

👍1

I really like the goal, of a better simplified UX

This seems like a lot of boilerplate already, if our goal is to reduce boilerplate.

Perhaps it makes more sense to have these as first-class fields on a new Cluster API Operator.

apiVersion: cluster.x-k8s.io
kind: ClusterOperator
metadata:
  namespace: foo
  name: my-cluster
spec:
  bootstrap:
    provider: kubeadm
    version: 0.1
  infrastructure:
    provider: aws
    version: 1.0

Region I think can be handled in a similar way to the proposed service topologies:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
  labels:
    region: us-east-1
spec:

And then the AWS provider can define "topology" mappings from labels to region, zone, subnet, etc..

moshloop on 5 Aug 2019

@moshloop this issue isn't about having an operator install specific versions of specific providers (which is a fantastic, separate idea - see https://docs.google.com/document/d/1ox_2dVzxJtWy_zv1SID0bGIes-hQyOVlAIC5ARcVdak/edit if you haven't already). It's about making it easier to get up and running by omitting the need to create the 3 extra resources and replacing them with a small set of annotations + a lot of defaulting logic.

ncdc on 5 Aug 2019

clusteradm could start out by providing just this UX before growing into a full-blown provider lifecycle manager or this could be implemented in a standalone _defaulting_ controller

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

/assign @timothysc

moshloop on 5 Aug 2019

or this could be implemented in a standalone defaulting controller

This is essentially what I'm proposing.

The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.

How - could you please explain?

ncdc on 5 Aug 2019

I think this is where we diverge:

The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in.

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

moshloop on 5 Aug 2019

Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.

I'm not sure I see this adding as much complexity as you're suggesting.

On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.

If we add a standalone controller to CAPI, it then has to know about all the different providers and their data types. I'd rather have the logic live in each provider.

ncdc on 5 Aug 2019

This might not be complex to create, But from readability, e2e testing and reasoning perspective, it adds complexity.

it then has to know about all the different providers and their data types

Isn't that an eventuality/requirement for clusteradm ?

moshloop on 5 Aug 2019

Isn't that an eventuality/requirement for clusteradm ?

Yes, and I'll admit I haven't been thinking of this as a clusteradm feature.

ncdc on 5 Aug 2019

Isn't that an eventuality/requirement for clusteradm ?

@moshloop I hope not, otherwise we remove support for custom providers that clusteradm does not know about and we are back in a situation similar to v1alpha1 where users would have to custom build their own clusteradm for non-blessed providers.

detiber on 5 Aug 2019

I think clusteradm init --infra=aws,vsphere --bootstrap=kubeadm would require knowing about these providers? non-blessed providers could implement a provider plugin similar to how packer does it today for custom provisioners/builders

moshloop on 5 Aug 2019

Let's table some of this for conversing high bandwidth on Wed. I generally like the core idea, but a question I would have is: "Shouldn't we be able to cache the defaults, and not even have to specify, e.g. added on admission?"

timothysc on 5 Aug 2019

@ncdc I've been thinking a bit more on this. I'm not sure I'm a fan of encouraging users to set annotations for this type of behavior as it essentially creates an unversioned shadow API that exists parallel to the defined and versioned API.

I think it's one thing for leveraging annotations for tracking internal state, since we do not expose that directly to the user.

For defaulting based on which provider to use, why not optional high level fields:

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  defaultInfrastructureProvider: aws
  defaultBootstrapProvider: kubeadm

The provider configuration settings are potentially more compelling as an annotation, but it still feels like we are potentially trending back to opaque embedded configuration with the annotations. Would it make sense instead to leverage a single provider-defined type per provider that can be attached to the Cluster spec to provide cluster-wide defaults instead?

detiber on 6 Aug 2019

@detiber I like your suggestions better than mine - thanks!

ncdc on 6 Aug 2019

Could we leverage the template resources in the example that @detiber suggested to let users to specify the default configurations?

vincepri on 7 Aug 2019

SGTM

ncdc on 7 Aug 2019

I think we all agree that we want to improve the user experience by reducing the boilerplate that the client has to send to the server.

But so far we are discussing reducing this boilerplate server-side. I'd like to hear alternatives.

dlipovetsky on 7 Aug 2019

Here's a naive alternative: Reduce boilerplate client-side. Create a shell script, kubectl plugin, or standalone CLI that generates the boilerplate.

Advantages:

It's easier to iterate on client-side functionality.

Known drawbacks:

Users must use a special client to reduce boilerplate.

Potential drawbacks (not yet sure if these are true):

Every infra and bootstrap provider has to provide their own client.

dlipovetsky on 7 Aug 2019

@dlipovetsky the client-side solution you bring up seems like it falls in line with the clusteradm discussion.

detiber on 7 Aug 2019

@detiber Good point. I assumed clusteradm would not be available in time. I also feel like there's room for experimentation, which I expect would be easier to do with something like a kubectl plugin.

dlipovetsky on 7 Aug 2019

Think some of the confusion around the use cases comes from the word default. Makes sense from our perspective of people developing Cluster API, but from an end-user perspective, the question would be '"default" in what sense?'

Is there better terminology?

randomvariable on 7 Aug 2019

The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations

Some providers do not (or cannot) provide sensible defaults. For example, CAPO requires the end user to manually create some infrastructure.

As an end-user, how will I discover that settings defaults (using fields or annotations) for these providers will not work?

Reducing boilerplate is important for UX, but so is having consistent behavior. So some providers may need help to support defaults.

dlipovetsky on 7 Aug 2019

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.

apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>

The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

vincepri on 12 Aug 2019

Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
  namespace: foo
  name: my-cluster
spec:
  machineDefaults:
    infrastructureTemplate: <object-reference>
    bootstrapTemplate: <object-reference>
The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.

WDYT?

If I understand correctly, most of the UX improvement is gained when multiple clusters rely on the same template? There's still a lot to define for a single cluster. Is the goal to optimize the multi or single cluster case?

juan-lee on 9 Oct 2019

@ncdc @detiber @vincepri Another thought I had, when the user is using a gitops approach with kustomize, they'll be able to do their own templating. I haven't had a lot of time to think about this, but we should ask ourselves, are we are trying to solve concerns that can be solved more effectively elsewhere?

juan-lee on 9 Oct 2019

👍1

@juan-lee I think the concern is more for those that are providing a managed cluster type experience.

For example, a centralized IT department wants to give developers access to request clusters using Cluster API, but wants to ensure that the clusters that are created are configured in a way that is compliant with their org requirements while still allowing users to customize other parts of the configuration.

detiber on 9 Oct 2019

Kubernetes already has defaulting today with the concept of limitranges which provide defaults based on namespace. Limitranges only default if a parameter has not been specified and enable us to set a global, or "default", override. On inspecting, I'd assert that you could easily hit 80+% of users by providing a single bootstrap config in the "default" namespace. Limitranges also enable settings on a per-namespace granularity which would give administrators the ability to override policy.

timothysc on 22 Oct 2019

cc @andrewsykim

timothysc on 22 Oct 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 5 Mar 2020

/lifecycle frozen

vincepri on 5 Mar 2020

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

vincepri on 11 Mar 2020

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 11 Mar 2020

/area ux

vincepri on 20 Apr 2020

I think you can address both ways by using (infrastructure/bootstrap)templates, but by asking the provider devs to install a default template along with their provider installation. So, for example, the AWS provider would add a 'AWSMachineTemplateWithSensibleDefaults.yaml' in their manifest.

That way, you can still get a cluster running with just the two resource definitions (i.e. optimized for the single cluster case), but you also enable admins to provide their own templates for users (i.e. optimized for the multi cluster case). And you even give them a jumping-off point for the template, with all the sensible defaults in it.