I've been thinking for a while now about how to improve the ergonomics of our config and extend it for more use cases. I intend to bring this up at the next meeting. Proposal is as follows:
Config type to ClusterThis is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally, config.Cluster is a better library type name than config.Config
This is a bikeshed. They both work. Users will need to update (but the same is pretty much true for changing the API version).
Specifically allow config of the form:
apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
nodes:
- role: worker
nodePorts: [8080]
- role: worker
nodePorts: [9090, 7070]
In single node mode this will be especially easy to use and allow serving traffic from workloads with Kubernetes nodePorts.
This should solve https://github.com/kubernetes-sigs/kind/issues/99, which is one of the earliest standing requests.
For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable. Suggestions are welcome.
This may also not be the most favorable solution long term versus say, implementing a form of LoadBalancer / Ingress, but should be much easier to solve and unblock usage.
networking struct to ClusterSpecifically:
apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
networking:
# 127.0.0.1 if unset, affects all nodes
listenAddress: '0.0.0.0'
apiServer:
# random if unset, applies to the control plane node or load balancer
listenPort: 6443
nodes:
....
There's increasing demand for configuring this. Additionally, we need to change the defaut listen address to 127.0.0.1 for security reasons ... users will need a way to configure this.
These fields are similar to kubeadm's config.
I'm not aware of any. Minor complexity increase?
load-balancer node role... but keep the node type internally, and mark it as "control-plane-load-balancer" (or something similarly precise)
We only actually support the control plane load balancer, which is implicitly necessary for multiple control-plane clusters. Nothing about the control plane is currently configurable, and if it were it would actually be cluster-wide details (like the api server listen address).
We already intend to implicitly add these to v1alpha2 https://github.com/kubernetes-sigs/kind/pull/274
Users need to stop explicitly specifying load-balancer nodes when they upgrade to v1alpha3.
replicas field from nodesHaving nodes internally copied is increasingly confusing as we add host mounts and other configuration knobs to nodes.
Writing one entry per node is more explicit and reduces some internal complexity.
The config will be slightly more verbose, users need to copy + paste identical nodes.
Specifically move the kubeadmConfigPatches and kubeadmConfigPatchesJson6902 fields off of the nodes in Cluster to the top level.
We already only generate one cluster-wide config, and it's unclear that it would ever make sense to have multiple configs in one cluster. If it does wind up making sense, we can add fields back to nodes later with higher precedence than the cluster-wide fields.
This should clarify the behavior.
None? It will be straightforward to add per-node patches or other improvements if and when the use cases become apparent.
This is an example of a config with several nodes and every knob set after the above changes.
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
# 127.0.0.1 if unset, affects all nodes
listenAddress: '0.0.0.0'
apiServer:
# random if unset, applies to the control plane node or load balancer
listenPort: 6443
# patch the generated kubeadm config with some extra settings
kubeadmConfigPatches:
- |
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
metadata:
name: config
networking:
serviceSubnet: 10.0.0.0/16
# patch it further using a JSON 6902 patch
kubeadmConfigPatchesJson6902:
- group: kubeadm.k8s.io
version: v1beta1
kind: ClusterConfiguration
patch: |
- op: add
path: /apiServer/certSANs/-
value: my-hostname
# 3 control plane nodes and 3 workers
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
# expose extra ports on this node
nodePorts: [9090, 7070]
# add extra bind mounts to this node
extraMounts:
# readonly bind mount the kind source code at /kind-source
containerPath: /kind-source
hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
readOnly: true
# explicitly set some advanced values to their defaults
selinuxRelabel: false
propagation: None
- role: worker
# add extra bind mounts to this node
extraMounts:
# readonly bind mount the kind source code at /kind-source
containerPath: /kind-source
hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
readOnly: true
- role: worker
# add extra bind mounts to this node
extraMounts:
# readonly bind mount the kind source code at /kind-source
containerPath: /kind-source
hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
readOnly: true
/priority important-soon
/assign
cc @neolit123 @fabriziopandini @munnerz @pablochacin
This is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally,
config.Clusteris a better library type name thanconfig.Config
support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.
Drop the
load-balancernode role
SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.
Drop the
replicasfield from nodes
doesn't matter that much. slightly less friendly UX config wise.
Make kubeadm config patches explicitly cluster-wide
someone might complain about this eventually and file the request to have them per node.
Example Config:
SGTM.
BenTheElder added this to the 0.2 milestone 19 minutes ago
has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.
Thanks for the comment @neolit123
support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.
We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?
This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.
SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.
Those settings can go under networking at the top level? (and we will be allowing setting eg the port)
doesn't matter that much. slightly less friendly UX config wise.
_very_ slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:
kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
- role: worker
someone might complain about this eventually and file the request to have them per node.
If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.
This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).
has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.
I shifted 0.2 to end of next week optimistically, most of these changes are quite trivial, and many of them like the ports can be added in a later edition of v1alpha3 without breaking change.
1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.
We should discuss these dates in another venue though (perhaps at the meeting)?
I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.
we might also want to move retain and wait as part of the plan for this config.
We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?
This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.
there are definitely multi-cluster topologies in products out there, yet this seems out of scope for the project. extarnal pivoting / tooling can be done to support such a scenario (if someone needs this).
Those settings can go under
networkingat the top level? (and we will be allowing setting eg the port)
yes, they can.
_very_ slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:
true.
If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.
This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).
yes, this can work if there is demand.
1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.
We should discuss these dates in another venue though (perhaps at the meeting)?
sounds good.
I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.
i don't have a strong preference on the versioning.
@BenTheElder thanks for keeping me in this loop!
The list of proposed changes SGTM; some comments / additional questions:
Rename the Config type to Cluster
Great to see moving cluster setting in a separated set of fields; this is consistend with kubeadm config / cluster api, and it was also in the original issue for multinode config in kind (but then punted in the initial prototypes for sake of simplicity). Now that the number of cluster settings is growing/consolidates, it makes definitely sense to rethink this.
Allow specifying extra ports on nodes
For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable.
I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs
Add a networking struct to Cluster
What is the rational behind the decision to assigning random ports as a default?
Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...
Drop the load-balancer node role
support for multi-cluster topology e.g. clusters sitting behind a load balancer
is not fully clear to me. Is there an issue to look at?
what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT?
Drop the replicas field from nodes
Cluster API (last time I chechek) was preserving both machine and machine sets, but I think that it is fine for kind dropping the concept of replica ATM.
what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?
I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs
As a user needing to schedule a workload to a machine with that nodeport available, I'm supposed to predict the node name...?
There needs to be a mechanism to pin workloads to the right node with the exposed port since they can't all bind to the same port.
What is the rational behind the decision to assigning random ports as a default?
Multiple clusters on the host? To be clear this is the port on the host, not the port inside the container. This is the existing behavior + being able to select a specific one if you want (which will fail if you create multiple with the same, but it's been requested and many will only create one cluster).
Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...
Yeah this is the host port. Not the internal port. Perhaps we should rename this field.
support for multi-cluster topology e.g. clusters sitting behind a load balancer
I don't think this is a problem for kind to solve. This can be done externally.
what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT
SGTM, let's make this a cluster control instead too and keep nodes for Kubernetes nodes.
what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?
I think that makes sense, and we can make the flag simply override this field 馃憤
think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs
As a user needing to schedule a workload to a machine with that nodeport available
how about requiring a node name only if nodePorts: .... is set?
I'm not sure we should require one ever (we can always autogenerate) or that name should be the mechanism to pin. Extra user supplied node labels could work maybe (that we apply at the end of cluster up).
I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.
I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?
I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?
I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.
Dropping replicas
I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.
How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).
If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden. The other fields don't work well with repeated nodes and the repeated node logic complicates our code.
If we keep replicas but continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.
Similarly, some other yaml munging tool can generate / manipulate kind configurations and implement repetition, templating over the paths, etc..
I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?
"roles" are unrelated, this has to do with all other node config, like mounts, ports etc.
I think we may have a terminology mismatch here, "roles" are just the node type, not any of the other fields? Replicas were essentially copying + pasting the rest of the fields, of which only image/role make any sense to copy.
Related: "Roles" are also not a super amazing config UX, we're dropping down to just control-planes and workers. Details like external-etcd and control-plane load-balancer are cluster wide and are not Kubernetes nodes.
Drop the load balancer role
I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?
We definitely _could_ add a cluster-wide field for the image used for this, but I think I'd rather this experimentation be done by modifying kind's source code. In particular I don't think it's reasonable for us to support multiple kinds of load balancers currently, and we need to know what implementation is used to generate the configuration, copy the certs, etc.
Given this, if someone experiments with different load balancers, they are going to need to touch the internals of kind anyhow.
Port naming
I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.
Right. I did some more experimentation, and I think we can unblock simple use cases in a fairly simple way by providing something like this, but it's not ideal.
Let's punt this one until further discussion.
How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).
If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden.
The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.
The other fields don't work well with repeated nodes and the repeated node logic complicates our code.
If we keepreplicasbut continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.
Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.
I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.
The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.
That's not what the PR does, it creates flags per role that set a number of _all default value_ nodes. Doing that is totally possible under this, it would just set the node list to contain N nodes of each role, as opposed to setting it to one entry per role with the replicas set. It won't really be affected by this.
I would expect that changing the number of configured nodes involved editing the file ..?
Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.
There are plenty of tools that can do this sort of thing to config files. I don't think kind needs to own a templating mechanism for it's own nodes.
Template us a better term when discussing this fwiw, "role" already has a specific meaning in kind and kubeadm which is not this.
I don't think we should own a templating DSL.
I think that making this I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.
Except that none of these new settings make much sense in the face of replicas, and I expect more of that. Replicas really only makes sense for default nodes, which are only one line each (- role: worker). The other fields don't really replicate or are cluster wide.
If this turns out to be a real problem for some reason it will be easier to add replicas or something similar back later than it will be to remove it, especially once we try to go beta.
Add a
networkingstruct toCluster
To clarify, this is about configuring the container runtime (e.g. #277)?
@DazWorrall the options in networking outlined currently are cluster-wide settings for the API server, but we can add more fields for other networking features.
initial version done in https://github.com/kubernetes-sigs/kind/pull/373
will follow up to add the networking struct
@BenTheElder with latest changes we are loosing the possibility to create a cluster with an external load balancer and a single control-plane node.
This is useful for the use case when you want to test adding control-plane plane nodes at a later stage.
(There are workarounds, but I wanted to let you know because probably it is quite simple to restore this capability)
@fabriziopandini that is a great point, thanks! We can definitely make this possible again :+1:
Most helpful comment
initial version done in https://github.com/kubernetes-sigs/kind/pull/373
will follow up to add the networking struct