Cluster-api: Remote Node/Object References

Created on 5 Oct 2018 · 12Comments · Source: kubernetes-sigs/cluster-api

Problem

MachineStatus contains an optional NodeRef field:

https://github.com/kubernetes-sigs/cluster-api/blob/f31486484d5b33c785540eeaffd47fe57832aef5/pkg/apis/cluster/v1alpha1/machine_types.go#L90

There are a couple of places where the ability to refer to the Node corresponding to a Machine is extremely useful if not essential. Information in Node objects can be used as surrogates for Exists and Ready conditions as well as general node health (cf. https://github.com/kubernetes-sigs/cluster-api/pull/483).

For providers which distinguish between manager and managed clusters, NodeRefs will not be set (they are local to a cluster and optional in the spec). There is a related issue about creating a dynamic link between machines/nodes. This issue is asking for a similar link which will also work in the _managed_ cluster case.

Examples

0) "[The MachineSet controller uses NodeRefs] to determine the ready replicas count for a MachineSet which is in turn required for the MachineDeployment controller." - @alvaroaleman

https://github.com/kubernetes-sigs/cluster-api/blob/f302034cfa525bd57a891532e761f742687e5392/pkg/controller/machineset/status.go#L58

1) The deployer waits for a NodeRef on Machines to determine when a machine is ready. @chuckha

https://github.com/kubernetes-sigs/cluster-api/blob/f31486484d5b33c785540eeaffd47fe57832aef5/clusterctl/clusterdeployer/clusterclient/clusterclient.go#L572

2) The upgrader uses NodeRefs to find the version of kubelet running on a node. This is used to determine if an upgrade is necessary. Some providers have tooling which also uses this information to determine when an upgrade is complete.

https://github.com/kubernetes-sigs/cluster-api/blob/f302034cfa525bd57a891532e761f742687e5392/tools/upgrader/util/upgrade.go#L62

Possible Solution

Add a tuple (APIEndpoint, NodeName) for node objects. More generally (APIEndpoint, Namespace, ObjectName), or maybe (APIEndpoint, UUID), could be used for arbitrary remote object references.

When this has come up in the past @roberthbailey suggested that maybe we should look at the Cluster Registry to see if there is any infrastructure we can share. Looking at it briefly there is an ObjectReference type which might be useful:

https://github.com/kubernetes/cluster-registry/blob/09c490c051fbd24452921a18b366371c221a71d8/pkg/apis/clusterregistry/v1alpha1/types.go#L114

This issue is to explore whether a single solution for these use cases makes sense.

areapi prioritimportant-soon

Source

davidewatson

Most helpful comment

Regarding this issue, I'd propose to tackle this in v1alpha1 by creating a new, backward-compatible, controller that references a secret in a specific location.

Create a nodeRef controller in CAPA that watches Machines
Check <cluster-name>-kubeconfig secret is available
Create a client from kubeconfig
Check each remote node and set NodeRef if a matching Node is found

Happy to start working on it if the above sounds good!

/cc @detiber @ncdc

vincepri on 10 Jun 2019

👍4

All 12 comments

@detiber @hardikdr @oneilcin

davidewatson on 5 Oct 2018

One suggestion is can't we continue to have the NodeRef object be associated with the Machine Object for the remote cluster as well? After all the NodeRef as of today simply has the following:

corev1.ObjectReference {
    Kind: "Node",
    Name: node.ObjectMeta.Name,
    UID:  node.UID,
}

Now whenever any controller wants to use this NodeRef info, they simply need to refer to the Cluster this Machine belongs to, get the right kubeconfig for the cluster (either one pointing to the remote cluster, or the local cluster) and then interact with this Node object.

sidharthsurana on 8 Nov 2018

It's a good point. The generic controllers (e.g. MachineSet) do not currently have a way to do this though.

davidewatson on 10 Jan 2019

/assign @davidewatson

roberthbailey on 23 Jan 2019

Have we decided on what we want to do here for v1alpha1?

ncdc on 5 Mar 2019

I think we should punt this from v1alpha1. We have the controllers we have for this release. The main consequence for this repo is that MachineSet controller health checks will only work when the controller is run within the same cluster it manages.

davidewatson on 6 Mar 2019

SGTM.
/milestone Next

ncdc on 6 Mar 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Jun 2019

/remove-lifecycle stale

detiber on 4 Jun 2019

/area api

vincepri on 10 Jun 2019

Regarding this issue, I'd propose to tackle this in v1alpha1 by creating a new, backward-compatible, controller that references a secret in a specific location.