Cluster-api: What is a useful product of top-level Cluster API repository?

Created on 6 Feb 2019 · 22Comments · Source: kubernetes-sigs/cluster-api

It's not very clear what does this repository deliver at the end of the day.

Right now it's actually confusing to someone that is new to this. This doesn't appear to be a library, neither it's a deployable component or an abstract API spec.

It would help to answer what this project is aimed to deliver, so that anyone will to contribute or use this project can see clearly what's here for them to contribute to or use, and what exactly is a value of it.

It'd be nice if whatever the product is, it can be worked on independently of any provider implementations.

help wanted lifecyclstale prioritimportant-longterm

Source

errordeveloper

👍1

All 22 comments

Here's one route that I'd like us to consider:

the product is a general Cluster API controller
the controller's purpose is to
- keep tracks all running in a context of a single management cluster
- keep track of all metadata related to objects that each of providers manages separately
there exists an API for providers to register with the controller at runtime
there exists a common representation of metadata about the cluster and nodes (if any), which would consists of
- name and labels (which may include: provider-specific locators (account, region, zone, rack etc))
- names of nodes associated with the cluster, labels (as above, these may include provider-specific locators)
- statuses of nodes and cluster
- basic information required to connect a client to this cluster (e.g. by constructing kubeconfig):
  - API endpoint URL(s)
  - (optional) CA certificate
  - (optional) auth plugin description (e.g. EKS, GKE)

So with that in mind, there would be two main products:

a deployable component
a spec of high-level cross-provider API

A notion of a Cluster API controller and API structure like suggested above:

would make providers free to define their own cluster/node objects/classes that matter to them and their users
lets top level project maintain only a simple set of common APIs
allow users to answer simple questions about their clusters, independently of their knowledge of any of the providers in use
and enable higher-level tools with a simple and reliable API

errordeveloper on 6 Feb 2019

I can answer in more details later today, we have an open proposal which would rebrand cluster-api into a framework.

vincepri on 6 Feb 2019

I like the idea of providing a generic implementation for the reconciliaton logic separated from the provider/actuator logic. In particular, I think that having deployable components should be a preferable target (over a library), to prevent in-tree dependencies. I'm however not sure about trying to find some commonality across- providers as suggested.

Achieving this decoupling would require, besides the registration API mentioned before:

A formal specification of the API to allow controllers and actuators to communicate. This is actually the proposed approach in the Cluster API, but is not formalized.
Introducing objects that represent the Provider specific data. instead of using a "blob" inside the Cluster-API objects. This objects should basically point to the provider (which in turn will allow finding the appropriated actuator) and the corresponding representation of the cluster object (cluster, machineset, machine)

pablochacin on 6 Feb 2019

The idea of commonality in the original comment extends only to cluster commonality, not provider/cloud commonality as we've already decided there is almost nothing shared across all providers.

chuckha on 6 Feb 2019

👍1

I like the idea of providing a generic implementation for the reconciliaton logic separated from the provider/actuator logic.

It's not really about that, as @chuckha pointed out. At least it's certainly not about reconciliation as such, as that actually falls into provider's own domain.

A formal specification of the API to allow controllers and actuators to communicate.

Sound about right.

Introducing objects that represent the Provider specific data. instead of using a "blob" inside the Cluster-API objects.

Yes, although it's secondary to what is being proposed here and may or may not turn out to be the case. What's being suggested above is about separating out the metadata.

errordeveloper on 7 Feb 2019

I like the idea of providing a generic implementation for the reconciliaton logic separated from the provider/actuator logic.

It's not really about that, as @chuckha pointed out. At least it's certainly not about reconciliation as such, as that actually falls into provider's own domain.

Well, this confirm your initial comment about "this being confusing" :-)

I understood that by

keep track of all metadata related to objects that each of providers manages separately

You meant the objects of the cluster-api, and from there my interpretation that

The product is a general Cluster API controller

means a generic controller for such objects.

But considering your further comment

What's being suggested above is about separating out the metadata.

Does It means a meta-meta data? Describe the entities that manage the cluster-api meta-data?

What confuses me is how this links to the existing cluster-api objects. Maybe a concrete example could clarify

pablochacin on 7 Feb 2019

@pablochacin I'll provide an example that is not the suggested implementation as it's just an example to provide clarity on some of the ideas being presented in this issue:

Let's say we have a cluster running, our management cluster. This cluster has some CRDs installed. Those CRDs would be Cluster objects and Provider objects. Cluster objects are smaller in scope than what we have today but are very similar conceptually. They might have a cluster-id of some kind that can be used to link provider components to clusters. They have a provider name so that when we say "kubectl get clusters" we can see a list of all clusters and which providers those clusters live on. We could also say "kubectl get providers" and see which providers we can create clusters on.

The provider object defines what extra CRDs the provider will supply. When you install a provider CRD you also get the provider's infra/machine/cluster custom CRDs. These CRDs will be custom to each provider. At that point a user can write some YAML (not embedded) that creates a cluster and provider objects.

Note: all of the above is hypothetical and is only to be used as an example of what I believe is being discussed in the original issue.

chuckha on 11 Feb 2019

Cluster objects are smaller in scope than what we have today but are very similar conceptually.

The provider object defines what extra CRDs the provider will supply. When you install a provider CRD you also get the provider's infra/machine/cluster custom CRDs. These CRDs will be custom to each provider. At that point a user can write some YAML (not embedded) that creates a cluster and provider objects.

These two paragraphs seams like you are proposing even a thinner api layer on top a fully provider-specific one. I'm not quite sure I like this idea as it clearly conflicts with the idea of having any sort of generic controllers.

I like however, the idea of having a provider object and a set or provider specific CRD which are referenced from the existing generic CRD (cluster, machine) instead of the current blob like specs.

pablochacin on 12 Feb 2019

I'm quite certain that @chuckha and I are on the same page.

@pablochacin it's possible you are thinking of a few things that probably belong to a different issue/thread.

errordeveloper on 1 Mar 2019

@errordeveloper I was confused about the scope of your proposal. Initially (when you pitched it at the office hours meeting) I wrongly understood you proposed to maintain the existing API but replacing the provider specific spec as references to provider specific CRDs. That was the source of my confusion about the generic cluster controller and the scope of data and metadata.

Now I understand you are proposing a high level provider api for the cluster controller to communicate with the providers. Also, you propose to move the specs to the provider specific objects and leave only metadata in the cluster-api objects, as is shown in the examples in the gist you shared some time ago (btw, I think it would be valuable to share some of it as a concrete example of your proposal) Is this interpretation correct?

All that said, I'm still think replacing specs with references to provider specific CRDs is a more natural evolution from exiting API than creating this high level api, and still offers a better decoupling.

pablochacin on 1 Mar 2019

Interesting, would this look a bit like this:

// cluster api
package v1alpha1
type Machine struct {
    AWSMachineProviderSpec *awsv1alpha1.MachineSpec
    GCPMachineProviderSpec *gcpv1alpha1.MachineSpec
    // ... and so on
}

Then the idea would be to have something check each one for a non-nil value?

chuckha on 1 Mar 2019

@chuckha that would require a code change to add support for new providers, though. I don't think we want to do that.

ncdc on 1 Mar 2019

@ncdc I wasn't suggesting we do this particular change. I want to confirm with an example what @pablochacin is suggesting.

chuckha on 1 Mar 2019

Gotcha. It doesn't need to be a union struct like that. We could use object references instead:

type Machine struct {
  ProviderSpec metav1.ObjectReference
}

ncdc on 1 Mar 2019

👍1

I think owner reference would be a great idea to explore more specifically,
I alluded to that earlier also.

On Fri, 1 Mar 2019, 6:58 pm Andy Goldstein, notifications@github.com
wrote:

Gotcha. It doesn't need to be a union struct like that. We could use
object references instead:

type Machine struct {
ProviderSpec metav1.ObjectReference
}

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/cluster-api/issues/733#issuecomment-468772651,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAPWS9IaT78Xj2ECFdX_Rm4DJrdUuLsPks5vSXg6gaJpZM4alEEl
.

errordeveloper on 2 Mar 2019

@ncdc Exactly. Having the specs as separated objects allows them to have their own controllers, instead of been passed from the controller to the actuator via some synchronous call. This idea was briefly discussed in the slack channel. However, I still haven't explored in deept the implications of this model in terms of the relationship between the generic/specific controllers.

pablochacin on 2 Mar 2019

I am not sure what the deliverable is for this issue (i.e. what actions would allow us to close it?). I propose an update to the README.md clarifying these products:

a deployable component

a spec of high-level cross-provider API

davidewatson on 5 Mar 2019

@davidewatson yes, those would be good, if we agree those are the things we want to see as the product :)

errordeveloper on 5 Mar 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Jun 2019

/remove-lifecycle stale

On Mon, 3 Jun 2019, 11:30 pm fejta-bot, notifications@github.com wrote:

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually
close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta
https://github.com/fejta.
/lifecycle stale

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/cluster-api/issues/733?email_source=notifications&email_token=AAB5MS5QOG6GNXAOLWDWCNTPYWLQ5A5CNFSM4GUUIES2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW237PQ#issuecomment-498450366,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAB5MS7BCY5C7R45XXC42JLPYWLQ5ANCNFSM4GUUIESQ
.

errordeveloper on 4 Jun 2019

Discussed in grooming call today - we can't easily identify an actionable outcome for this particular issue. It is important to keep in mind what the purpose of this repo is, and we'll be opening a new issue to focus on this for v1alpha2.

/close

ncdc on 21 Jun 2019

@ncdc: Closing this issue.

In response to this:

Discussed in grooming call today - we can't easily identify an actionable outcome for this particular issue. It is important to keep in mind what the purpose of this repo is, and we'll be opening a new issue to focus on this for v1alpha2.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.