/kind feature
Describe the solution you'd like
In v1alpha1, you typically have to create two resources to have a minimally functional Kubernetes cluster: a Cluster and a Machine. Anything that is provider-specific, from kubeadm configuration to infrastructure details such as instance type are stored in providerSpec in each resource.
In v1alpha2, now that providerSpec has moved to a standalone CRD, you have to create the following to get a functional cluster (using CAPA and kubeadm as an example):
ClusterAWSClusterMachineAWSMachineKubeadmConfigThis is five resources now, instead of two.
I'm thinking we could simplify the UX by allowing, either via opt-in or opt-out, defaulting for both bootstrap and infrastructure resources. For example, with an opt-in flow, we could do something like this (annotation names TBD):
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
namespace: foo
name: my-cluster
annotations:
boostrap.cluster.x-k8s.io/default: kubeadm
infrastructure.cluster.x-k8s.io/default: aws
aws.infrastructure.cluster.x-k8s.io/region: us-east-1
spec:
# ...
apiVersion: cluster.x-k8s.io
kind: Machine
metadata:
namespace: foo
name: controlplane-0
labels:
cluster.x-k8s.io/cluster-name: my-cluster
spec:
bootstrap:
configRef: nil
data: nil
infrastructureRef: nil/empty
The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in. The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations. This way, the user probably could go back to only creating two resources -- Cluster and Machine -- and end up with a functional cluster.
The alternative to opt-in would be opt-out, which essentially means that a bootstrap and infra provider could do the logic mentioned in the previous paragraph, except the annotations wouldn't be required. That might end up being problematic (what happens if two infra providers both try to be the default?), but I figure it's worth mentioning as an alternative.
WDYT?
This is really interesting! I really like where you're going with this as a concept. Beyond what seems to me as providing a good experience, it would make (more) possible an idea that I've been not-so-seriously kicking around: having a control plane from one provider (say, a dynamically size-able cloud) manage nodes from a different one (say, expensive bare metal).
One detail I'm wondering about is:
aws.infrastructure.cluster.x-k8s.io/region: us-east-1
How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?
Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?
How would we avoid that becoming a vehicle for moving the runtime.RawExtension from the spec into the annotations' map[string]string?
Ideally, each provider that wants to support this would expose an absolute minimum number of annotations. Anything beyond a strict minimum must force the user to create their own resources (AWSMachine, etc).
Put another way, I wonder if there aren't two more CRDs waiting in the system for us to extract. maybe an opt-in AWSClusterConfig and AWSMachineConfig that could be referenced by name from the annotation? Does that split the difference between validation and opt-in piecemeal default overriding, or just confuse the issue?
I feel like this is AWSCluster and AWSMachine, which we already have, and which are the full-feature CRDs you'd fill out. Right?
I really like the goal, of a better simplified UX
This seems like a lot of boilerplate already, if our goal is to reduce boilerplate.
Perhaps it makes more sense to have these as first-class fields on a new Cluster API Operator.
apiVersion: cluster.x-k8s.io
kind: ClusterOperator
metadata:
namespace: foo
name: my-cluster
spec:
bootstrap:
provider: kubeadm
version: 0.1
infrastructure:
provider: aws
version: 1.0
Region I think can be handled in a similar way to the proposed service topologies:
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
namespace: foo
name: my-cluster
labels:
region: us-east-1
spec:
And then the AWS provider can define "topology" mappings from labels to region, zone, subnet, etc..
@moshloop this issue isn't about having an operator install specific versions of specific providers (which is a fantastic, separate idea - see https://docs.google.com/document/d/1ox_2dVzxJtWy_zv1SID0bGIes-hQyOVlAIC5ARcVdak/edit if you haven't already). It's about making it easier to get up and running by omitting the need to create the 3 extra resources and replacing them with a small set of annotations + a lot of defaulting logic.
clusteradm could start out by providing just this UX before growing into a full-blown provider lifecycle manager or this could be implemented in a standalone _defaulting_ controller
The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.
/assign @timothysc
or this could be implemented in a standalone defaulting controller
This is essentially what I'm proposing.
The problem I see is that this completely breaks the lifecycle phases that you so carefully put together - this will make it harder to reason about the system as a whole.
How - could you please explain?
I think this is where we diverge:
The presence of the annotations on the Cluster above tells the kubeadm bootstrap provider and the AWS infra provider that they should take action on the Cluster and any Machine that do not have infrastructureRef and/or bootstrap configRef filled in.
Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.
On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.
Adding this logic into CAPA/CAPV/CAPKB/etc adds many new conditions, branches and watches to the existing state diagram.
I'm not sure I see this adding as much complexity as you're suggesting.
On the other hand adding this logic into a new or standalone controller leaves the existing state diagram unchanged and adds a new defaulting diagram that is easy to reason about (Watch Machine and create AWSMachine ), and therefore easy to test.
If we add a standalone controller to CAPI, it then has to know about all the different providers and their data types. I'd rather have the logic live in each provider.
This might not be complex to create, But from readability, e2e testing and reasoning perspective, it adds complexity.
it then has to know about all the different providers and their data types
Isn't that an eventuality/requirement for clusteradm ?
Isn't that an eventuality/requirement for clusteradm ?
Yes, and I'll admit I haven't been thinking of this as a clusteradm feature.
Isn't that an eventuality/requirement for clusteradm ?
@moshloop I hope not, otherwise we remove support for custom providers that clusteradm does not know about and we are back in a situation similar to v1alpha1 where users would have to custom build their own clusteradm for non-blessed providers.
I think clusteradm init --infra=aws,vsphere --bootstrap=kubeadm would require knowing about these providers? non-blessed providers could implement a provider plugin similar to how packer does it today for custom provisioners/builders
Let's table some of this for conversing high bandwidth on Wed. I generally like the core idea, but a question I would have is: "Shouldn't we be able to cache the defaults, and not even have to specify, e.g. added on admission?"
@ncdc I've been thinking a bit more on this. I'm not sure I'm a fan of encouraging users to set annotations for this type of behavior as it essentially creates an unversioned shadow API that exists parallel to the defined and versioned API.
I think it's one thing for leveraging annotations for tracking internal state, since we do not expose that directly to the user.
For defaulting based on which provider to use, why not optional high level fields:
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
namespace: foo
name: my-cluster
spec:
defaultInfrastructureProvider: aws
defaultBootstrapProvider: kubeadm
The provider configuration settings are potentially more compelling as an annotation, but it still feels like we are potentially trending back to opaque embedded configuration with the annotations. Would it make sense instead to leverage a single provider-defined type per provider that can be attached to the Cluster spec to provide cluster-wide defaults instead?
@detiber I like your suggestions better than mine - thanks!
Could we leverage the template resources in the example that @detiber suggested to let users to specify the default configurations?
SGTM
I think we all agree that we want to improve the user experience by reducing the boilerplate that the client has to send to the server.
But so far we are discussing reducing this boilerplate server-side. I'd like to hear alternatives.
Here's a naive alternative: Reduce boilerplate client-side. Create a shell script, kubectl plugin, or standalone CLI that generates the boilerplate.
Advantages:
Known drawbacks:
Potential drawbacks (not yet sure if these are true):
@dlipovetsky the client-side solution you bring up seems like it falls in line with the clusteradm discussion.
@detiber Good point. I assumed clusteradm would not be available in time. I also feel like there's room for experimentation, which I expect would be easier to do with something like a kubectl plugin.
Think some of the confusion around the use cases comes from the word default. Makes sense from our perspective of people developing Cluster API, but from an end-user perspective, the question would be '"default" in what sense?'
Is there better terminology?
The providers would generate new resources (AWSCluster, AWSMachine, KubeadmConfig) to fulfill the missing pieces, using a combination of sensible defaults along with values from the annotations
Some providers do not (or cannot) provide sensible defaults. For example, CAPO requires the end user to manually create some infrastructure.
As an end-user, how will I discover that settings defaults (using fields or annotations) for these providers will not work?
Reducing boilerplate is important for UX, but so is having consistent behavior. So some providers may need help to support defaults.
Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the <Provider>Template resources like AWSMachineTemplate.
apiVersion: cluster.x-k8s.io
kind: Cluster
metadata:
namespace: foo
name: my-cluster
spec:
machineDefaults:
infrastructureTemplate: <object-reference>
bootstrapTemplate: <object-reference>
The bootstrapTemplate reference should be clear in documentation that's going only to be valid for worker nodes.
WDYT?
Following up on @detiber and my suggestion from above. I'd like to see a way to specify default configuration values using the
<Provider>Templateresources likeAWSMachineTemplate.apiVersion: cluster.x-k8s.io kind: Cluster metadata: namespace: foo name: my-cluster spec: machineDefaults: infrastructureTemplate: <object-reference> bootstrapTemplate: <object-reference>The
bootstrapTemplatereference should be clear in documentation that's going only to be valid for worker nodes.WDYT?
If I understand correctly, most of the UX improvement is gained when multiple clusters rely on the same template? There's still a lot to define for a single cluster. Is the goal to optimize the multi or single cluster case?
@ncdc @detiber @vincepri Another thought I had, when the user is using a gitops approach with kustomize, they'll be able to do their own templating. I haven't had a lot of time to think about this, but we should ask ourselves, are we are trying to solve concerns that can be solved more effectively elsewhere?
@juan-lee I think the concern is more for those that are providing a managed cluster type experience.
For example, a centralized IT department wants to give developers access to request clusters using Cluster API, but wants to ensure that the clusters that are created are configured in a way that is compliant with their org requirements while still allowing users to customize other parts of the configuration.
Kubernetes already has defaulting today with the concept of limitranges which provide defaults based on namespace. Limitranges only default if a parameter has not been specified and enable us to set a global, or "default", override. On inspecting, I'd assert that you could easily hit 80+% of users by providing a single bootstrap config in the "default" namespace. Limitranges also enable settings on a per-namespace granularity which would give administrators the ability to override policy.
cc @andrewsykim
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/lifecycle frozen
/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.
@vincepri:
This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
/kind design
This might need a design document first. Also cc @fabriziopandini, @frapposelli because it's kind-ish related to the templating work.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/area ux
I think you can address both ways by using (infrastructure/bootstrap)templates, but by asking the provider devs to install a default template along with their provider installation. So, for example, the AWS provider would add a 'AWSMachineTemplateWithSensibleDefaults.yaml' in their manifest.
That way, you can still get a cluster running with just the two resource definitions (i.e. optimized for the single cluster case), but you also enable admins to provide their own templates for users (i.e. optimized for the multi cluster case). And you even give them a jumping-off point for the template, with all the sensible defaults in it.
/lifecycle frozen