Cluster-api: CAPD improvement plan

Created on 20 Mar 2020 · 11Comments · Source: kubernetes-sigs/cluster-api

User Story
CAPD is becoming a very useful part of this project. In order to improve robustness, portability and design for the future.

Detailed Description

Planning for the future

The largest flaw in the design is that in general the code was built around execing to docker directly. This made the dependency graph pretty weird and pushed all the exec calls out to the leaves instead of having a single client to interact with docker. In order to restructure the code, we need to introduce a client to docker to centralize the dependency. This PR adds the Moby client and creates types in order for other container runtime client libraries to implement. This would let us use other runtimes, assuming the hacks built into the kind cluster for docker can be modified to work with other runtimes (namely, mounting the socket into the container).

This will also reduce the size of the CAPD images by removing the need for the docker binary, but this isn't terribly important.

I also have a suspicion this will improve runtime based on some samples of timing docker exec on prow. Each one takes ~0.5 - ~1.2 seconds for extremely simple commands (cat, chmod, chown). I suspect we will see a decent speed up when a client is implemented, but that is TBD and there is no data available to support this hypothesis.

Improving tests to be more realistic

The biggest shortcoming here is a lack of a real cloud provider. In an ideal world we'd have a docker cloud provider that does things like set the providerID on the nodes. Right now we have to exec into a pod and run kubectl to interact with the workload kubernetes-apiserver to apply the provider ID.

Structure of the tests

The e2e tests are poorly structured as identified by #2654 and need improvement.

Deprecate or improve developer experience

The developer experience is a bit lackluster right now. Folks run into a lot of problems and those problems take time to fix. CAPD could absolutely be a good developer tool to understand and play with Cluster API constructs, but it will require more guides and developer time to really nail down. If that's not something that can happen I think it makes more sense to completely deprecate the developer experience and use CAPD as an e2e testing tool.

/cc @akutz @randomvariable

/kind feature

help wanted lifecyclstale prioritawaiting-more-evidence

Source

chuckha

👍3

Most helpful comment

Should we break these out into separate issues?

randomvariable on 27 Oct 2020

👍2

All 11 comments

We're looking for new maintainer(s) for CAPD - please reach out if you're interested!

ncdc on 25 Mar 2020

i like this idea. i'm finding the docker provider to be invaluable for exploring cluster-api in a local/debug manner.

the idea of having some sort of plugin for the backend sounds great to me, i think it would be amazing to see a variety of container runtime providers.

elmiko on 25 Mar 2020

After leaving this project for 8 months and returning to China. I really want to do more contribution to this project again. @chuckha , very glad to see you again.

yuzhang17 on 26 Mar 2020

@ncdc I am interested

evalsocket on 31 Mar 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Jun 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 30 Jul 2020

/lifecycle frozen

vincepri on 30 Jul 2020

/assign @fabriziopandini
/priority awaiting-more-evidence

To re-triage and possibly close if most of these have been solved

vincepri on 22 Oct 2020

@vincepri
In my opinion, only two topics over four are already addressed

Structure of the tests (CAPD E2E test were fully replaced by the new CAPI E2E tests)
Developer experience (CAPD artifacts are now part of the release + parity with other providers in the quick start + other docs)

The other two points still make sense to:

Having a docker cloud provider, so the CAPD provisioning workflow fully aligns with other providers
Centralize all the interaction with docker behind one interface, move away from shelling out to docker in favor of the Moby client, eventually add support different container runtimes (similar to what is happened in kind)

fabriziopandini on 22 Oct 2020

Should we break these out into separate issues?

randomvariable on 27 Oct 2020

👍2

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 25 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings