Cluster-api: Expand test framework to include upstream k8s testing

Created on 31 Mar 2020 · 29Comments · Source: kubernetes-sigs/cluster-api

⚠️ Cluster API maintainers can ask to turn an issue-proposal into a CAEP when necessary, this is to be expected for large changes that impact multiple components, breaking changes, or new large features.

ie. Use CAPI to test Kubernetes

Goals

Use CAPX as deployer to test upstream k/k changes
Use CAPX as deployer to test k8s-sigs projects such as out-of-tree cloud providers
Run upstream k8s conformance tests against CAPI
Encourage reuse across different infra providers instead of maintaining bash scripts in each in provider repo (right now there are scripts in CAPG, CAPA, and CAPZ, with significant overlap). We intend to extend the current test/framework to allow this proposal to be implemented there.

Non-Goals/Future Work

Add a kubetest deployer for CAPI.
Run the tests as PR gates on k/k.

User Story

As a developer I would like to run k/k E2E tests on a CAPI cluster to test changes made to a k8s component.

Detailed Description

NOTE : this is a very rough draft based on working group meeting (recording here). Will evolve this as we continue the discussion with the wider community and come up with implementation details. Just hoping to get the discussion started with this issue.

Build k/k binaries (e2e.test, kubectl, ginkgo)
(optionally) build k8s component images from private SHA (if the images aren't already available on a registry)
Create a cluster with custom k8s binaries & container images
In order to use a custom k8s build (example k/k master), there are a few different options:
Build a custom image with image-builder as part of CI and use that image in the cluster
- pros: can reuse the image for multiple nodes
- cons: time consuming, building a VM image with packer takes ~20 minutes
Use an existing image (possibly with a different k8s version) and in the KubeadmConfig pass in a PreKubeadmCommand script to replace the k8s version with the one we want.
- pros: doesn't require building an image, faster
- cons: we have to do this for every vm, hacky (bash script might be error prone), different from user experience with capi (with images)
Modify capi infra providers to take custom k8s component images
- pros: can be reused more easily by users not familiar with the project and CI, doesn't require the preKubeadm "hack" script, or reusing a VM image.
  - cons: more work and changes involved.
Run tests suite: k/k E2E, cloud provider E2E, other k8s-sigs E2E, etc.

related to #2141, which might overlap in implementation details but has different objectives: #2141 aims to test if capi/capz/capa/capv/capg/etc passes k8s conformance whereas this proposal would be to use CAPI as a dev tool to test k8s and k8s-sigs changes.

/kind proposal

cc @dims @vincepri @alexeldeib @fabriziopandini @ritazh @chewong @randomvariable @rbitia

aretesting help wanted kincleanup kinfeature

Source

CecileRobertMichon

Most helpful comment

hey @randomvariable, just an update to the comment i made previously in this thread. i have started to hack on an experiment where i have broken out the autoscaler tests from upstream and started to make them into a library.

the general idea is that currently the upstream autoscaler tests are heavily tied into the gce/gke provider. i am working towards rewriting the provider interface so that it could be used generically (ie more widely applicable abstractions). the end result from this would be a series of tests that can be consumed as a lirbary with the user passing in a provider during their compile, in essence providing a generic suite of tests that can be consumed from the outside (no included providers).

i certainly don't think you should wait for me, but i wanted to let you know what i've been hacking on.

elmiko on 18 Jun 2020

❤2

All 29 comments

Thanks for the write-up Cecile!

I'd expand the 4th goal mentioning we intend to extend the current test/framework to allow this proposal to be implemented there.

Modify capi infra providers to take custom k8s component images

Would this entail changes to image-builder to take some custom scripts that can setup images in a custom fashion? We should probably tackle this separately, it's a really interesting idea and would make the images generic, although I assume folks will probably need internet access so it might not work in every environment.

vincepri on 31 Mar 2020

👍1

Modify capi infra providers to take custom k8s component images
pros: can be reused more easily by users not familiar with the project and CI, doesn't require the preKubeadm "hack" script, or reusing a VM image.

+1 ^ this is generally more useful for testing, but I still see a problem with the kubelet. You can override almost everything else, but the kubelet running on the base OS built by the image builder is not easily replaced unless you combined a rpm/deb update/install on cloud-init.

timothysc on 31 Mar 2020

@vincepri @timothysc what I meant by

Modify capi infra providers to take custom k8s component images

Is that instead of using preKubeadmCommand to pass in the script that overrides the k8s version, we add a new property, maybe under a feature gate, to pass in a "custom" k8s version (what we call CI_VERSION in the script above, or custom k8s components images, and run a script to install that version on the VMs before running the bootstrap script or as part of the bootstrap script.

A better place for this might actually be the bootstrap provider, not the infra providers now that I think about it. @vincepri I don't think this entails changes to image builder as I'm not talking about building any new images but rather using cloud init to install k8s components during provisioning. This does require internet access but so does our current preKubeadmCommand solution. The advantage here is that it would be more reusable and we could use it with a combination of a user preKubeadmCommand.

@timothysc for kubelet we'd need to do a systemctl restart kubelet after installing the desire kubelet binary, just like we do it in the preKubeadmCommand right now.

The other possibility is to change kubeadm to allow passing in custom component images (if it's not already supported, I don't think it is from what I've seen). So your kubeadm config would look something like:

kubeadmConfigSpec:
    initConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data["local_hostname"] }}'
        customKubeletVersion: v1.19.0-alpha.1.175+7b1a531976be0d
        kubeletExtraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
    joinConfiguration:
      nodeRegistration:
        name: '{{ ds.meta_data["local_hostname"] }}'
        customKubeletVersion: v1.19.0-alpha.1.175+7b1a531976be0d
        kubeletExtraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
    clusterConfiguration:
      apiServer:
        timeoutForControlPlane: 20m
        customImage: myDockerHubUser/custom-api-server-build:v1.19.0-dirty
        extraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
        extraVolumes:
           - [...]
      controllerManager:
        customImage: myDockerHubUser/custom-controller-manager-build:v1.19.0-dirty
        extraArgs:
          cloud-provider: azure
          cloud-config: /etc/kubernetes/azure.json
          allocate-node-cidrs: "false"
        extraVolumes:
          - [...]

And have kubeadm pull the right images / components before init/join in cloud init. Basically I'm just trying to think of ways we can build a k8s cluster with custom builds of various k8s components installed without having to build VM images in every test.

CecileRobertMichon on 31 Mar 2020

FYI, in 1.16 kubeadm started supporting Kustomize patches (-k flag) on the static manifests. Might be useful:

alexeldeib on 31 Mar 2020

I kind of like the idea of adding support to the bootstrap provider (and hiding it behind a feature gate). It would allow us to recreate the existing functionality in a more centralized and re-usable way than exists today.

If nothing else it would provide a good stopgap until we can better define an automated pipeline where we could consume images that are automatically built using image-builder from the latest k8s artifacts.

detiber on 31 Mar 2020

FYI, in 1.16 kubeadm started supporting Kustomize patches (-k flag) on the static manifests. Might be useful:

We would either need to validation against k8s requested version and kustomize patches or wait until we are ready to declare that we are only willing to support workload clusters >= v1.16 if we go down that path.

detiber on 31 Mar 2020

I'd expand the 4th goal

+1 to this, I would like also to consider the idea of having Cluster APi conformance tests (as a next step for the work started with https://github.com/kubernetes-sigs/cluster-api/issues/2753)

The other possibility is to change kubeadm to allow passing in custom component images

This should be already possible, I can give examples if required.

fabriziopandini on 31 Mar 2020

@fabriziopandini would love examples if you have them

CecileRobertMichon on 31 Mar 2020

just to add an extra layer to this conversation, i am looking at contributing some e2e tests for the kubernetes autoscaler that use cluster-api. although we will start by using the docker provider to help keep the resources low, i think it would not be difficult to have these tests also use cloud providers at some point.

elmiko on 1 Apr 2020

👍2

/milestone v0.3.x

vincepri on 1 Apr 2020

@vincepri with the new v1alpha3+ roadmap should this be 0.3.x or 0.4.x?

CecileRobertMichon on 17 Apr 2020

@CecileRobertMichon This could be added to v0.3.x in a backward compatible way. I'm unclear though if we have folks interested in working on it.

vincepri on 17 Apr 2020

is it okay to mark this with help wanted? I can probably help with some of it but I don't think I have bandwidth to work on it full time right away.

CecileRobertMichon on 17 Apr 2020

👍1

/help

vincepri on 17 Apr 2020

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 17 Apr 2020

/kind cleanup

vincepri on 27 Apr 2020

/assign

CAPA conformance was giving me grief so i kind of started doing it.

/lifecycle active

randomvariable on 18 Jun 2020

i certainly don't think you should wait for me, but i wanted to let you know what i've been hacking on.

elmiko on 18 Jun 2020

❤2

/milestone v0.4.0

vincepri on 31 Jul 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Oct 2020

/remove-lifecycle stale

fabriziopandini on 30 Oct 2020

@CecileRobertMichon might be worth to check if there is still work to be done here now then #3652 is merged

fabriziopandini on 30 Oct 2020

👍1

/area testing

fabriziopandini on 12 Nov 2020

@fabriziopandini @randomvariable I believe the last thing remaining here is allowing to build custom k8s versions in order to be able to use CAPI to test k8s PRs (something like https://github.com/kubernetes/test-infra/commit/9778b6a1462f27b869832241354249a8207e7004#diff-27a83c428d1eeb41626127495412ac1986b79e109606aacde96ac4c5b3f896a2R840). Not sure if CAPI framework is the best place to do this in or if this should be a test-infra helper.

CecileRobertMichon on 4 Dec 2020

@CecileRobertMichon IMO this problem has some variants because not only it is required to build custom k8s versions but it is required to get them the into the machine's images.
For CAPD, I'm leveraging on kind build to get a custom node image before starting all the tests (so I can keep the test phase consistent/working with limited resources).
For the other providers, @randomvariable implemented a solution that downloads CI artifacts into each machines using a pre-kubeadm script, might be helpful to link here an example of this can be used from CAPA

fabriziopandini on 9 Dec 2020

@fabriziopandini I would expect the solution of loading the artifacts on each machine to be the same. The only difference would be, instead of getting an existing CI version at https://dl.k8s.io/ci/latest.txt and using the already built image stored on gcr, we would need to build the image from source and load that onto the machines.

CecileRobertMichon on 9 Dec 2020

@CecileRobertMichon FYI in kubeadm we are using images from CI only, because we determined that having a small delay from them tip of Kubernetes is not a problem, especially given code freeze near release.
So I personally think CAPD - with its own build from source code - as an exception, not the rule we have to follow

fabriziopandini on 10 Dec 2020

Trying to understand my options and roadmap for running upstream k8s e2e tests using CAPI (starting to look at this for Windows). Looks like we have a few options.

This is attempt to summarize where we are at right now:

capi kubetest (https://github.com/kubernetes-sigs/cluster-api/pull/3652)
- meets goals 2 and 4 and partially 1 (doesn't support building from tip)
- used to enable conformance tests
- This doesn't actually use kubetest but calls k8s e2e.test directly
- Will this be superseded once kubetest2 support is finished?
kubetest2 (https://github.com/kubernetes-sigs/cluster-api/pull/4041)
- has potential to meet all goals
- is this the direction moving forward?
- support from building k8s is in roadmap?

in CAPZ we also have

ci-entrypoint.sh (custom script that builds k8s and runs those binaries)
- this could be removed once we have the kubetest2 with the ability to support --build?

Is the end goal to be able to support all this functionality fully via kubetest2 deployer?

jsturtevant on 8 Jan 2021

@jsturtevant IMO #3652 and #4041 are dealing with two different goals:

Ensure Cluster API stability (Test Cluster API itself)
Test Kubernetes using Cluster API (Cluster API as a release blocker in Kubernetes)

I think that for this specific issue 2/#4041 is the most appropriate answer but I'm not sure if/how the two things could converge.
WRT to this might be that using kubetest as a package name in #3652 wasn't a good choice...

fabriziopandini on 12 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Bootstrap cluster cleaned up despite failed pivot

dlipovetsky · 5Comments

clusterctl pivots to internal cluster, make this optional

oneilcin · 6Comments

Godoc for the api types need curation

timothysc · 6Comments

Running clusterctl init twice results in an error

wfernandes · 5Comments

node deletion can't delete node

chuckha · 4Comments