Cluster-api: Remote Node/Object References

Created on 5 Oct 2018  路  12Comments  路  Source: kubernetes-sigs/cluster-api

Problem

MachineStatus contains an optional NodeRef field:

https://github.com/kubernetes-sigs/cluster-api/blob/f31486484d5b33c785540eeaffd47fe57832aef5/pkg/apis/cluster/v1alpha1/machine_types.go#L90

There are a couple of places where the ability to refer to the Node corresponding to a Machine is extremely useful if not essential. Information in Node objects can be used as surrogates for Exists and Ready conditions as well as general node health (cf. https://github.com/kubernetes-sigs/cluster-api/pull/483).

For providers which distinguish between manager and managed clusters, NodeRefs will not be set (they are local to a cluster and optional in the spec). There is a related issue about creating a dynamic link between machines/nodes. This issue is asking for a similar link which will also work in the _managed_ cluster case.

Examples

0) "[The MachineSet controller uses NodeRefs] to determine the ready replicas count for a MachineSet which is in turn required for the MachineDeployment controller." - @alvaroaleman

https://github.com/kubernetes-sigs/cluster-api/blob/f302034cfa525bd57a891532e761f742687e5392/pkg/controller/machineset/status.go#L58

1) The deployer waits for a NodeRef on Machines to determine when a machine is ready. @chuckha

https://github.com/kubernetes-sigs/cluster-api/blob/f31486484d5b33c785540eeaffd47fe57832aef5/clusterctl/clusterdeployer/clusterclient/clusterclient.go#L572

2) The upgrader uses NodeRefs to find the version of kubelet running on a node. This is used to determine if an upgrade is necessary. Some providers have tooling which also uses this information to determine when an upgrade is complete.

https://github.com/kubernetes-sigs/cluster-api/blob/f302034cfa525bd57a891532e761f742687e5392/tools/upgrader/util/upgrade.go#L62

Possible Solution

Add a tuple (APIEndpoint, NodeName) for node objects. More generally (APIEndpoint, Namespace, ObjectName), or maybe (APIEndpoint, UUID), could be used for arbitrary remote object references.

When this has come up in the past @roberthbailey suggested that maybe we should look at the Cluster Registry to see if there is any infrastructure we can share. Looking at it briefly there is an ObjectReference type which might be useful:

https://github.com/kubernetes/cluster-registry/blob/09c490c051fbd24452921a18b366371c221a71d8/pkg/apis/clusterregistry/v1alpha1/types.go#L114

This issue is to explore whether a single solution for these use cases makes sense.

areapi prioritimportant-soon

Most helpful comment

Regarding this issue, I'd propose to tackle this in v1alpha1 by creating a new, backward-compatible, controller that references a secret in a specific location.

  • Create a nodeRef controller in CAPA that watches Machines
  • Check <cluster-name>-kubeconfig secret is available
  • Create a client from kubeconfig
  • Check each remote node and set NodeRef if a matching Node is found

Happy to start working on it if the above sounds good!

/cc @detiber @ncdc

All 12 comments

@detiber @hardikdr @oneilcin

One suggestion is can't we continue to have the NodeRef object be associated with the Machine Object for the remote cluster as well? After all the NodeRef as of today simply has the following:

corev1.ObjectReference {
    Kind: "Node",
    Name: node.ObjectMeta.Name,
    UID:  node.UID,
}

Now whenever any controller wants to use this NodeRef info, they simply need to refer to the Cluster this Machine belongs to, get the right kubeconfig for the cluster (either one pointing to the remote cluster, or the local cluster) and then interact with this Node object.

It's a good point. The generic controllers (e.g. MachineSet) do not currently have a way to do this though.

/assign @davidewatson

Have we decided on what we want to do here for v1alpha1?

I think we should punt this from v1alpha1. We have the controllers we have for this release. The main consequence for this repo is that MachineSet controller health checks will only work when the controller is run within the same cluster it manages.

SGTM.
/milestone Next

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

/area api

Regarding this issue, I'd propose to tackle this in v1alpha1 by creating a new, backward-compatible, controller that references a secret in a specific location.

  • Create a nodeRef controller in CAPA that watches Machines
  • Check <cluster-name>-kubeconfig secret is available
  • Create a client from kubeconfig
  • Check each remote node and set NodeRef if a matching Node is found

Happy to start working on it if the above sounds good!

/cc @detiber @ncdc

@vincepri sgtm

Was this page helpful?
0 / 5 - 0 ratings

Related issues

invidian picture invidian  路  5Comments

wfernandes picture wfernandes  路  5Comments

timothysc picture timothysc  路  6Comments

vincepri picture vincepri  路  4Comments

chuckha picture chuckha  路  4Comments