Kind: use kind for upgrade tests (deep dive)

Created on 29 Jan 2019  路  14Comments  路  Source: kubernetes-sigs/kind

Upgrade testing is a well known pain point

  • the current solution implemented in Kuberentes e2e test grid is unreliable, hard to debug in case of a problem, hard to maintain across releases
  • CI testing for other workloads (e.g. add-ons/applications)聽against Kubernetes upgrades is even more complex
  • upgrades complexity slow down dev/test iterations done by kubeadm maintainers and blocks new contributors in this area

Kind, being self-contained, simple, fast and already fit both for CI testing and for running locally is a good candidate for addressing some or all those problems.

Wdyt?

/cc @kubernetes/sig-cluster-lifecycle-pr-reviews
for collecting additional opinions/ideas on this...


Adding some initial ideas about how to address this

IMO Basically, only three things are required to make kind support upgrades:

  1. "build":
    define a way to add in a well know location of the node-image the debs/images of
    the target Kubernetes version

Assuming that the second Kubernetes version will be created out of band, there is already a simple,
solution for the "build" part docker file with FROM kindest/node and some ADD <src> <dest>.
I know it might seems hacky, but it allows to immediately unblock following changes.

Instead, the target solutuin for "build" could leverage on the existing builds bits mechanism, allowing
usage of apt, bazel, make also for the second Kuberentes version (some changes required, but
IMO not big refactors)

  1. "action":
    implement a new action automating the upgrade workflow

    A new action is required, but all the necessary framework for allowing the upgrade workflow to
    adapt to current topology is already in place

  2. "UX":
    Define the UX for triggering the upgrade action

    There are several options. my preferred is to trigger this action via the config file initially, but
    let see if a better proposal pops up 馃槈

Wdyt?


Adding a proposal for next steps:

  • Define what is "core" of "non-core" (this depends on https://github.com/kubernetes-sigs/kind/issues/255)
  • Start implementig the "action" and the "UX" parte to get a first MVP using the docker file build
  • Externd the "build part"

Wdyt?

Most helpful comment

@BenTheElder great to hear that!
You can count on my help both for eliminating tech debt and for working on what comes next!
Feel free to assign me to issues or to ping me on slack

All 14 comments

There are probably multiple ways we can do pieces in an incremental fashion, that may be beneficial to kind.

I think a reasonable end user story that kind may want to support would be:

"To allow for the CI upgrade scenario(s) for folks creating custom k8s integrations on such as operators."
As a k8s-integrator, I would like to leverage a tool in CI that allows me to test my integrations piece (App, Operator, etc) across an upgrade cycle.

If needed, there are other routes through cluster-api that might also be possible.

Either way, let's take one change at a time and go from there.

I want to be very careful about how we define any sort of upgrade support in kind. It will increase the complexity of how we ship kubernetes and support those images.

I think we can support upgrade testing at least for the Kubernetes project without making any changes to the existing build, config, etc.

Additionally when people that a cluster tool supports in-place upgrades that suggests long lived clusters, which is also not something people should be doing, we need to be clear about that.


  1. Would complicate how we lay out the images, how we identify where k8s is, and make the build CLI more confusing to use. I don't like this and I don't think we need it. Instead we can add add / obtain the upgrade version binaries / images at runtime on top of a normal image.

If we ship images with 2 k8s versions in them, they will also be massive. I don't want to sacrifice the benefits we get from how we currently ship k8s to support this.

  1. Doesn't _really_ need to happen. Something just needs to talk to the node containers and run the upgrade logic, that shouldn't need special "Action" support. We just need to generally support tooling talking to the cluster. Upgrades will happen after creating a cluster, and can be scripted by code that doesn't even live in the main CLI potentially.

  2. Upgrades shouldn't live in the cluster config. An upgrade is something you do to an existing cluster.

[we're discussing on slack https://kubernetes.slack.com/archives/CEKK1KTN2/p1548917656593500]

I think this is doable, maybe even in the main tool, but I hesitate to blow the project scope so quickly and I want to have a well thought out design before we go implementing this. We already have a fair bit of technical debt hiding in the code to go pay down.

@BenTheElder Do you have a planning document which outlines what your major goals are also who the stakeholders are?

As mentioned in the grooming docs, SCL projects go through a planning session where we duke out what we consider important for a cycle is, what the priorities are, and who plans to work on them. I know you are currently working hard trying to get to a 1.0 release.

But...

  • Who are the stakeholders?
  • What are the priorities?

I'd love to be involved in the next planning session and plan to attend more regularly now. I'd also like to avoid possible fragmentation.

/cc @spiffxp

Thanks, I agree. We're starting to do this at the meetings now that we have that setup properly.

I think we got off on the wrong foot on this one and should do upgrade testing with kind eventually. xref: https://github.com/kubernetes-sigs/kind/issues/255#issuecomment-463737008

The current description is probably not the way to go though, this needs more thought / design.

  1. Assuming that the second Kubernetes version will be created out of band, there is already a simple,
    solution for the "build" part docker file with FROM kindest/node and some ADD <src> <dest>.
    I know it might seems hacky, but it allows to immediately unblock following changes.

Statements like this do not inspire confidence in the current limited plan. This basically involves hacking and exposing a bunch of organic internals that are not part of the current user / developer interface like "what actually happens inside the Kubernetes build".

I'm actively working to root out and eliminate technical debt like that before we depend on it further. Some of those goals are currently outlined at https://kind.sigs.k8s.io/docs/design/principles/

  • Who are the stakeholders?
  • What are the priorities?

Absolutely needs to be written down in one place. Some of this is in the milestones, roadmap, and design, but not all... Will fix that today. I actually have a rough version of this collected up but I haven't PRed it yet.

@BenTheElder great to hear that!
You can count on my help both for eliminating tech debt and for working on what comes next!
Feel free to assign me to issues or to ping me on slack

Thanks, will do 馃榾
I plan to go through and file a fresh round of tracking issues soon. Today I added a draft of writing down scoping etc. more with @munnerz and @neolit123 and looked more at borrowing from CRI to lower coupling.

Expect to see a doc about saner build, up, test, down etc. orchestration (IE not using kubetest) tomorrow as well, just finished a draft I think 馃槄

@BenTheElder

with the proposal you and @munnerz made that third parties should preferably consume the backend of kind and create a separate CLI, the kubeadm project is convinced that this is the right thing to do short term.

long term we could have integration of phases and more CLI commands in kind that well help with kubeadm testing.

at this point we are leaning towards creating a new CLI tool. "kinder" or kubeadm-kind" the name doesn't not matter much as it won't be recommended for users and will have experimental scope.
we can host such a tool in the kubeadm repo (in a sub-folder somewhere).

but this creates a complication. such a tool would not be able to utilize the kubetest deployer for kind as the deployer is bound to the kind binary and expects the source code from the kind repo to be at the root of the tree (path aliases?).

some options i could think of are:

  • write a new deployer for this special tool.
  • modify the existing kind deployer to support both.
  • use podutils to call bash scripts from a repo - e.g. the same way the e2e test currently hosted in the kind repo is done.

also i know/hope kubetest2 will facilitate writing a deployer, yet without a round map for kubetest2 it could delay our plans for the kubeadm upgrade and x-on-y tests. the plan on our side is to start integrating such tests early next cycle.

thoughts?

@neolit123 can you add this to the agenda for Monday and I'll try to come prepared to discuss.

Definitely want to unblock this ASAP 馃檹

Also agree on long term goal with phases and supporting upgrades built-in...
I'm currently (as in right now) drafting a plan to re-arrange how create is coupled internally to make this easier to implement re-entrantlty (IE today create is all one-shot, you can't run some phases and then run more even if we did expose it externally).

FWIW I'm also moderately hopeful that we can stop using kubetest v1 and its image for kind within the next month, kind is less needy than other deployers and hacking on v1 is painful and that image is messy too. once we add some critical bits like junit output it should be trivial to implement the kind deployer for kubestest2.

can you add this to the agenda for Monday and I'll try to come prepared to discuss.

ok, i have added it as an agenda item.

Definitely want to unblock this ASAP

we have roughly one month, so there is time.

Also agree on long term goal with phases and supporting upgrades built-in...
I'm currently (as in right now) drafting a plan to re-arrange how create is coupled internally to make this easier to implement re-entrantlty (IE today create is all one-shot, you can't run some phases and then run more even if we did expose it externally).

:+1:

FWIW I'm also moderately hopeful that we can stop using kubetest v1 and its image for kind within the next month, kind is less needy than other deployers and hacking on v1 is painful and that image is messy too. once we add some critical bits like junit output it should be trivial to implement the kind deployer for kubestest2.

the way kubetest2 is shaping, it will be a big improvement over kubetest1.
i'm not sure how it will be integrated in test-infra, so that jobs can be run against it, but i can help where i can.

kind vs kinder
My personal concern is to avoid fragmentation in the ecosystem and confusion in user, and as a consequence, my personal opinion is that we should agree on a split of responsibilities between kind and kinder. eg.:

  • kind is for CI (as it is already great for this and it is going to support kubeadm upgrade soon)
  • kinder is for developers (it won't be recommended for users and will have an experimental scope)

And on a long term conversion roadmap, even if this could be reconsidered in future eg.

  • As soon as kind consolidates the first set of priorities/when use case for developers consolidates, we should check if it is possible to backport few, very well-scoped "developer friendly" features in kind, and ideally we should get rid of kinder in the long term.

kind, kinder, kubetest
According to ^^^, we should avoid the option to develop a kubetest deployer for kinder, but this depends on the timing for kubetest2/upgrades in kind.
Let me know if there are things I can work on to make this happen soon!

[we discussed this in today's meeting]

TLDR was roughly:

  • we can continue to develop the actual upgrade logic either out of tree or move to something roughly like kind alpha upgrade (kind experimental upgrade??), punting that decision to @neolit123 & @fabriziopandini

  • we will also need some amount of logic to orchestrate the tests against this, the hope is to do this under kubetest2 soon and this portion may need to be another small program or script depending on the route we go

  • not quite necessarily required for this, but, we should enable creating clusters that stop short of invoking kubeadm in the immediate future to make iterating on kubeadm testing easier

Regarding kubetest2 specifically, since it came up, I am about to start on the kind deployer after I finish going through some more of my review backlog. It's still extremely early but the direction seems promising for this kind of work 馃槈, it should be possible to vendor kubetest2 and build a deployer from another repo even if we need...

Once it matures a bit more I'll consider a formal request to break it out into another small k-sigs repo.

we're doing this in kinder now, tracking there and we should open fresh issues here if/when we're ready to move that over to here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mitar picture mitar  路  49Comments

vincepri picture vincepri  路  83Comments

hjacobs picture hjacobs  路  31Comments

nilebox picture nilebox  路  40Comments

BenTheElder picture BenTheElder  路  30Comments