Kind: RFC: v1alpha3 config proposal

Created on 23 Feb 2019 · 17Comments · Source: kubernetes-sigs/kind

I've been thinking for a while now about how to improve the ergonomics of our config and extend it for more use cases. I intend to bring this up at the next meeting. Proposal is as follows:

Proposed changes

Rename the `Config` type to `Cluster`

Rationale

This is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally, config.Cluster is a better library type name than config.Config

Cons

This is a bikeshed. They both work. Users will need to update (but the same is pretty much true for changing the API version).

Allow specifying extra ports on nodes

Specifically allow config of the form:

apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
nodes:
- role: worker
  nodePorts: [8080]
- role: worker
  nodePorts: [9090, 7070]

Rationale

In single node mode this will be especially easy to use and allow serving traffic from workloads with Kubernetes nodePorts.

This should solve https://github.com/kubernetes-sigs/kind/issues/99, which is one of the earliest standing requests.

Cons

For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable. Suggestions are welcome.

This may also not be the most favorable solution long term versus say, implementing a form of LoadBalancer / Ingress, but should be much easier to solve and unblock usage.

Add a `networking` struct to `Cluster`

Specifically:

apiVersion: kind.sigs.k8s.io/v1alpha3
kind: Cluster
networking:
  # 127.0.0.1 if unset, affects all nodes
  listenAddress: '0.0.0.0'
  apiServer:
    # random if unset, applies to the control plane node or load balancer
    listenPort: 6443
nodes:
 ....

Rationale

There's increasing demand for configuring this. Additionally, we need to change the defaut listen address to 127.0.0.1 for security reasons ... users will need a way to configure this.

These fields are similar to kubeadm's config.

Cons

I'm not aware of any. Minor complexity increase?

Drop the `load-balancer` node role

... but keep the node type internally, and mark it as "control-plane-load-balancer" (or something similarly precise)

Rationale

We only actually support the control plane load balancer, which is implicitly necessary for multiple control-plane clusters. Nothing about the control plane is currently configurable, and if it were it would actually be cluster-wide details (like the api server listen address).

We already intend to implicitly add these to v1alpha2 https://github.com/kubernetes-sigs/kind/pull/274

Cons

Users need to stop explicitly specifying load-balancer nodes when they upgrade to v1alpha3.

Drop the `replicas` field from nodes

Rationale

Having nodes internally copied is increasingly confusing as we add host mounts and other configuration knobs to nodes.

Writing one entry per node is more explicit and reduces some internal complexity.

Cons

The config will be slightly more verbose, users need to copy + paste identical nodes.

Make kubeadm config patches explicitly cluster-wide

Specifically move the kubeadmConfigPatches and kubeadmConfigPatchesJson6902 fields off of the nodes in Cluster to the top level.

Rationale

We already only generate one cluster-wide config, and it's unclear that it would ever make sense to have multiple configs in one cluster. If it does wind up making sense, we can add fields back to nodes later with higher precedence than the cluster-wide fields.

This should clarify the behavior.

Cons

None? It will be straightforward to add per-node patches or other improvements if and when the use cases become apparent.

Example Config:

This is an example of a config with several nodes and every knob set after the above changes.

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  # 127.0.0.1 if unset, affects all nodes
  listenAddress: '0.0.0.0'
  apiServer:
    # random if unset, applies to the control plane node or load balancer
    listenPort: 6443
# patch the generated kubeadm config with some extra settings
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta1
  kind: ClusterConfiguration
  metadata:
  name: config
  networking:
    serviceSubnet: 10.0.0.0/16
# patch it further using a JSON 6902 patch
kubeadmConfigPatchesJson6902:
- group: kubeadm.k8s.io
  version: v1beta1
  kind: ClusterConfiguration
  patch: |
    - op: add
      path: /apiServer/certSANs/-
      value: my-hostname
# 3 control plane nodes and 3 workers
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
  # expose extra ports on this node
  nodePorts: [9090, 7070]
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true
    # explicitly set some advanced values to their defaults
    selinuxRelabel: false
    propagation: None
- role: worker
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true
- role: worker
  # add extra bind mounts to this node
  extraMounts:
    # readonly bind mount the kind source code at /kind-source
    containerPath: /kind-source
    hostPath: /Users/bentheelder/go/src/sigs.k8s.io/kind
    readOnly: true

kincleanup lifecyclactive prioritimportant-soon

Source

BenTheElder

Most helpful comment

initial version done in https://github.com/kubernetes-sigs/kind/pull/373

will follow up to add the networking struct

BenTheElder on 12 Mar 2019

❤3 👍3

All 17 comments

/priority important-soon
/assign
cc @neolit123 @fabriziopandini @munnerz @pablochacin

BenTheElder on 23 Feb 2019

This is more precise, it specifically configures a kind cluster, whereas some of the sub objects configure nodes etc. Additionally, config.Cluster is a better library type name than config.Config

support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.

Drop the load-balancer node role

SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.

Drop the replicas field from nodes

doesn't matter that much. slightly less friendly UX config wise.

Make kubeadm config patches explicitly cluster-wide

someone might complain about this eventually and file the request to have them per node.

Example Config:

SGTM.

BenTheElder added this to the 0.2 milestone 19 minutes ago

has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.

neolit123 on 23 Feb 2019

👍1

Thanks for the comment @neolit123

support for multi-cluster topology e.g. clusters sitting behind a load balancer would have been fun, but out of scope so the change SGTM.

We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?

This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.

SGMT, but this drops eventual load-balancer configurations if we want to support such even.
probably not anytime soon.

Those settings can go under networking at the top level? (and we will be allowing setting eg the port)

doesn't matter that much. slightly less friendly UX config wise.

_very_ slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
nodes:
- role: control-plane
- role: worker
- role: worker

someone might complain about this eventually and file the request to have them per node.

If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.

This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).

has the timeline of 0.2 shifted towards the 1.14 release and the 1.0 shifted for the future?
seems optimistic, but doable.

I shifted 0.2 to end of next week optimistically, most of these changes are quite trivial, and many of them like the ports can be added in a later edition of v1alpha3 without breaking change.

1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.

We should discuss these dates in another venue though (perhaps at the meeting)?

I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.

BenTheElder on 23 Feb 2019

we might also want to move retain and wait as part of the plan for this config.

We could probably still do this by having cluster objects set the name metadata and having another object reference them by name?
This does seem out of scope for now though. Kubernetes has no concepts for this AFAIK.

there are definitely multi-cluster topologies in products out there, yet this seems out of scope for the project. extarnal pivoting / tooling can be done to support such a scenario (if someone needs this).

Those settings can go under networking at the top level? (and we will be allowing setting eg the port)

yes, they can.

_very_ slightly I think. it's less confusing in the config format and the code but more verbose. It shouldn't be too unbearable to repeat node specs in your config, most multi-node clusters will just look like:

true.

If / when they do it will be useful to understand the use cases and why, and we can make per-node config override the cluster-global config at the node level.
This should be less confusing and avoid people repeating it for every node when really they just wanted to change something cluster wide anyhow (like say the API server cert SANs).

yes, this can work if there is demand.

1.0 is definitely shifted to the future, this was always an aggressive and overly optimistic timeline to push for core feature design & iteration.
We should discuss these dates in another venue though (perhaps at the meeting)?

sounds good.

I'm also OK to not make this in 0.2, but I think we can at least get the breaking changes done quickly once we've reviewed the proposal.

i don't have a strong preference on the versioning.

neolit123 on 23 Feb 2019

❤1 👍1

@BenTheElder thanks for keeping me in this loop!
The list of proposed changes SGTM; some comments / additional questions:

Rename the Config type to Cluster
Great to see moving cluster setting in a separated set of fields; this is consistend with kubeadm config / cluster api, and it was also in the original issue for multinode config in kind (but then punted in the initial prototypes for sake of simplicity). Now that the number of cluster settings is growing/consolidates, it makes definitely sense to rethink this.

Allow specifying extra ports on nodes

For multi-node we will likely need to label the nodes with the ports or make the node names predictable or configurable.

I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

Add a networking struct to Cluster
What is the rational behind the decision to assigning random ports as a default?
Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...

Drop the load-balancer node role

support for multi-cluster topology e.g. clusters sitting behind a load balancer

is not fully clear to me. Is there an issue to look at?

what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT?

Drop the replicas field from nodes
Cluster API (last time I chechek) was preserving both machine and machine sets, but I think that it is fine for kind dropping the concept of replica ATM.

what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?

fabriziopandini on 24 Feb 2019

❤1 👍1

I think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

As a user needing to schedule a workload to a machine with that nodeport available, I'm supposed to predict the node name...?

There needs to be a mechanism to pin workloads to the right node with the exposed port since they can't all bind to the same port.

What is the rational behind the decision to assigning random ports as a default?

Multiple clusters on the host? To be clear this is the port on the host, not the port inside the container. This is the existing behavior + being able to select a specific one if you want (which will fail if you create multiple with the same, but it's been requested and many will only create one cluster).

Using well know ports for the control plane node or load balancer in the docker internal network makes thinks more easy predictable; nevertheless, there will always be a port port on the host machine...

Yeah this is the host port. Not the internal port. Perhaps we should rename this field.

support for multi-cluster topology e.g. clusters sitting behind a load balancer

I don't think this is a problem for kind to solve. This can be done externally.

what about external-etcd?
In the current config we are ready to manage an external-etcd node, but this is not implemented yet.
This is something that we want to test in CI as part of the kubeadm test matrix, so I would like to see this option preserved, but I think that also this should move at cluster level, similarly to the external lb. WDYT

SGTM, let's make this a cluster control instead too and keep nodes for Kubernetes nodes.

what about image settings?
Now image setting are in CLI (with cluster scope) and in config (with node scope); does it make sense to have in the new Cluster object an image field, to be eventually overriden at node level?

I think that makes sense, and we can make the flag simply override this field 👍

BenTheElder on 24 Feb 2019

think that making node names configurable can open to a new set of problems. -1
Instead, given that we are dropping the replicas field, it si now possible to name nodes according to the order in the config, and this makes this process more predictable / more evident in the logs

As a user needing to schedule a workload to a machine with that nodeport available

how about requiring a node name only if nodePorts: .... is set?

neolit123 on 24 Feb 2019

I'm not sure we should require one ever (we can always autogenerate) or that name should be the mechanism to pin. Extra user supplied node labels could work maybe (that we apply at the end of cluster up).

BenTheElder on 24 Feb 2019

Dropping replicas

I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.

I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?

Drop the load balancer role

I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?

Port naming

I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.

pablochacin on 24 Feb 2019

Dropping replicas

I'm not comfortable with the idea of dropping the replica number. For me, It make sense to have a role with some common attributes, such as the image, and then easily change the number. I find having to add/remove entries in the node list much more inconvenient.

How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).

If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden. The other fields don't work well with repeated nodes and the repeated node logic complicates our code.

If we keep replicas but continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.

Similarly, some other yaml munging tool can generate / manipulate kind configurations and implement repetition, templating over the paths, etc..

I've been digging into the code this weekend to understand what it would take to create a new node role and I haven't identified any issue regarding using replicas. Could you please elaborate?

"roles" are unrelated, this has to do with all other node config, like mounts, ports etc.

I think we may have a terminology mismatch here, "roles" are just the node type, not any of the other fields? Replicas were essentially copying + pasting the rest of the fields, of which only image/role make any sense to copy.

Related: "Roles" are also not a super amazing config UX, we're dropping down to just control-planes and workers. Details like external-etcd and control-plane load-balancer are cluster wide and are not Kubernetes nodes.

Drop the load balancer role

I think it makes sense. I would be possible to allow specifying an alternative image for this node at the cluster level, in case someone want to experiment with another implementation?

We definitely _could_ add a cluster-wide field for the image used for this, but I think I'd rather this experimentation be done by modifying kind's source code. In particular I don't think it's reasonable for us to support multiple kinds of load balancers currently, and we need to know what implementation is used to generate the configuration, copy the certs, etc.

Given this, if someone experiments with different load balancers, they are going to need to touch the internals of kind anyhow.

Port naming

I'm a little bit confused here. The approach we had been discussing so far, and I expected to be discussed the next meeting, is different: Instead of exposing the NodePort on a particular node, create a kind of ingress controller which exposes the ports. I that case, we don't need to decide on which node to expose the ports. Exposing ports on particular nodes seems neither natural nor convenient to me. Maybe I'm missing something in the conversation.

Right. I did some more experimentation, and I think we can unblock simple use cases in a fairly simple way by providing something like this, but it's not ideal.

Let's punt this one until further discussion.

BenTheElder on 24 Feb 2019

How many entries are we talking about? kind doesn't really "scale" beyond a handful of nodes? (at least not without a pretty beefy machine, since they're all on the same host).
If you're only setting the image and the role then copying / removing is only 2 lines per node, that seems like a pretty small burden.

The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.

The other fields don't work well with repeated nodes and the repeated node logic complicates our code.
If we keep replicas but continue to add useful per-node configuration, we're going to wind up needing some form of user-facing templating for the node configs (eg to make mounts have per-node host paths) and I think that logic should live outside of kind.

Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.

I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.

pablochacin on 25 Feb 2019

The point is not the number of nodes, but having to edit the files. There was a proposal for adding command line flags for setting the number of replicas, which would not be possible under this format.

That's not what the PR does, it creates flags per role that set a number of _all default value_ nodes. Doing that is totally possible under this, it would just set the node list to contain N nodes of each role, as opposed to setting it to one entry per role with the replicas set. It won't really be affected by this.

I would expect that changing the number of configured nodes involved editing the file ..?

Maybe we should separate the role (template) from the replica. This would still allow generic settings and per-replica settings.

There are plenty of tools that can do this sort of thing to config files. I don't think kind needs to own a templating mechanism for it's own nodes.

Template us a better term when discussing this fwiw, "role" already has a specific meaning in kind and kubeadm which is not this.

I don't think we should own a templating DSL.

I think that making this I think that making this decision considering current situation is possibly not the best perspective. The framework is growing quickly, new features are been added and it is to expect more settings to be added.

Except that none of these new settings make much sense in the face of replicas, and I expect more of that. Replicas really only makes sense for default nodes, which are only one line each (- role: worker). The other fields don't really replicate or are cluster wide.

If this turns out to be a real problem for some reason it will be easier to add replicas or something similar back later than it will be to remove it, especially once we try to go beta.

BenTheElder on 25 Feb 2019

Add a networking struct to Cluster

To clarify, this is about configuring the container runtime (e.g. #277)?

DazWorrall on 25 Feb 2019

@DazWorrall the options in networking outlined currently are cluster-wide settings for the API server, but we can add more fields for other networking features.

BenTheElder on 25 Feb 2019

initial version done in https://github.com/kubernetes-sigs/kind/pull/373

will follow up to add the networking struct

BenTheElder on 12 Mar 2019

❤3 👍3

@BenTheElder with latest changes we are loosing the possibility to create a cluster with an external load balancer and a single control-plane node.
This is useful for the use case when you want to test adding control-plane plane nodes at a later stage.
(There are workarounds, but I wanted to let you know because probably it is quite simple to restore this capability)

fabriziopandini on 14 Mar 2019

@fabriziopandini that is a great point, thanks! We can definitely make this possible again :+1:

BenTheElder on 14 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

failed to generate kubeadm config content: failed to get kubernetes version from node: file should only be one line, got 2 lines

raphaelauv · 3Comments

Errors compile: version "go1.12.1" does not match go tool version "go1.10.2"

csantanapr · 4Comments

Multi node - worker/control-plane setup fails

patvdleer · 4Comments

Failed to init node with kubeadm on a dind setup

fgimenez · 4Comments

jobs are broken on kubernetes master (build)

BenTheElder · 4Comments

Kind: RFC: v1alpha3 config proposal

Proposed changes

Rename the Config type to Cluster

Rationale

Cons

Allow specifying extra ports on nodes

Rationale

Cons

Add a networking struct to Cluster

Rationale

Cons

Drop the load-balancer node role

Rationale

Cons

Drop the replicas field from nodes

Rationale

Cons

Make kubeadm config patches explicitly cluster-wide

Rationale

Cons

Example Config:

Most helpful comment

All 17 comments

Dropping replicas

Drop the load balancer role

Port naming

Dropping replicas

Drop the load balancer role

Port naming

Related issues

Rename the `Config` type to `Cluster`

Add a `networking` struct to `Cluster`

Drop the `load-balancer` node role

Drop the `replicas` field from nodes