Kubeadm: 1.14-1.15 Test Infra Changes.

Created on 29 Nov 2018 · 23Comments · Source: kubernetes/kubeadm

There is a large amount of technical debt that needs to get paid down to eliminate issues in CI.

[x] Move Kops to periodic and release blocking
[ ] Make KIND a PR blocking job
edit: there is a KIND pre-submit now, but non-blocking yet. can be called on demand with /test pull-kubernetes-e2e-kind
[ ] Move kubeadm-kind jobs to release blocking
[x] Implement upgrade and skew tests for kubeadm-kind
[ ] Add cluster-api aws and release blocking for kubeadm HA verification

/cc @kubernetes/sig-cluster-lifecycle
/assign @fabriziopandini @timothysc @neolit123

arereleasing aretesting lifecyclactive prioritimportant-longterm

Source

timothysc

👍4

All 23 comments

@timothysc I'm working on making kind support more use cases, but it would be nice to clarify a little bit better expectation/requirements.

As a short term goal, I'm targeting having kind ready for testing few variations of init - join > test workflows, that is the minimum required for statrting k/a repleacement.
Then I have in mind more complex variations (HA, external etcd, upgrades)

fabriziopandini on 29 Nov 2018

👍1

Make KIND a PR blocking job

it would be interesting to define what the PR blocking job would be, some examples:

1 master (conformance)
1 master + 3 workers (non conformance??)
1 master + 3 workers (conformance)

currently @BenTheElder has kind passing conformance on a periodic for single master pretty consistently.
https://k8s-testgrid.appspot.com/sig-testing-kind#conformance,%20master%20(dev)

also serial vs non-serial is an interesting topic:

the non-serial runs are 20 minutes only.
while the serial runs are ~1:30 minutes.

but we should probably loop more folks from sig-testing on this topic and/or create an issue in test-infra for the above.

neolit123 on 29 Nov 2018

I think we should have kind as a PR blocking job right now and move kops to periodic to unblock the community.

timothysc on 29 Nov 2018

HA w/kind is a nice-to-have, but not a requirement. TBH I think it would be weird.

timothysc on 29 Nov 2018

it feels like the only good option we have in terms of testing HA without a CP.

neolit123 on 29 Nov 2018

some notes:

technically multi-node is necessary to fully pass conformance, else one test is "skipped", but otherwise they do generally pass.
- running the bulk of them in parallel is _much_ faster with ~15 minute CI jobs, but recently has gotten flakier on k8s master branch :/
lots of e2e tests that are not conformance do hacky stuff that may need some work to port / support

I think we should have kind as a PR blocking job right now and move kops to periodic to unblock the community.

I've spoken to @justinsb about this, I think we're both onboard there. I've just finally cut a binary release of the kind CLI yesterday as things are fairly stable, modulo changes from @fabriziopandini related to HA / multi-node etc. We should be able to create jobs that use stable released versions now.

There might be some concerns from others though regarding the presubmits ...

Personally I would like to see more tests be post-submit based and release blocking rather than presubmit, and move cloud providers there as they move out of tree. That requires wider buy-in and better ownership of CI signal though I think ...

BenTheElder on 29 Nov 2018

Remove Bazel generation of .spec and .deb artifacts with standard .in files that can override build variables.

Can you elaborate on what you mean by this?

ixdy on 29 Nov 2018

/watching topic

rdodev on 30 Nov 2018

/assign @liztio @rdodev

timothysc on 30 Nov 2018

@ixdy - Moving build details here https://github.com/kubernetes/kubernetes/issues/71677

timothysc on 4 Dec 2018

Move Kops to periodic and release blocking

ticked this item in the OP.
kops-aws was removed from PR blocking and release blocking due to an AWS account issue.
my understanding is that it might make it back in release blocking.

Make KIND a PR blocking job

there is some chance that this can happen this cycle.
but this depends on sig-testing decisions.

in terms of our dashboard we are going to have kind jobs.

neolit123 on 1 Feb 2019

Guys can you help me understand where I can find are all tests pass in k8s.io/kubernetes/cmd/kubeadm/app/util/config ?

Currently I have next broken

    --- FAIL: TestConfigFileAndDefaultsToInternalConfig/incompleteYAMLToDefaultedv1beta1 (0.00s)
        initconfiguration_test.go:123: the expected and actual output differs.
                in: testdata/defaulting/master/incomplete.yaml
                out: testdata/defaulting/master/defaulted.yaml
                groupversion: kubeadm.k8s.io/v1beta1
                diff: 
            --- expected
            +++ actual
            @@ -115,7 +115,6 @@
               imagefs.available: 15%!
            (MISSING)   memory.available: 100Mi
               nodefs.available: 10%!
            (MISSING)-  nodefs.inodesFree: 5%!
            (MISSING) evictionPressureTransitionPeriod: 5m0s
             failSwapOn: true
             fileCheckFrequency: 20s

And I wonder is it problem with local env and I missed something ?

miry on 7 Feb 2019

@miry
i tried the latest master branch and this package passes for me.

$ go test ./cmd/kubeadm/app/util/config/...
ok      k8s.io/kubernetes/cmd/kubeadm/app/util/config   9.554s
ok      k8s.io/kubernetes/cmd/kubeadm/app/util/config/strict    0.018s

a couple of points:

make sure you have the latest master and go1.11.x
the diff library that is failing in this test is known to be buggy.

cc @chuckha

neolit123 on 7 Feb 2019

we have more CI work for kind going on (thanks @neolit123!), kops is still borked everywhere and removed due to billing account issues.

BenTheElder on 7 Feb 2019

@neolit123 Thank you for you help. I found why tests are not working for me: https://github.com/kubernetes/kubernetes/pull/67709/files , because I use MacOS. So it explains - nodefs.inodesFree: 5%

miry on 7 Feb 2019

this is sort of unrelated to this ticket @miry could you please file an issue in k/k and ping the author of that PR so that an implementation for another OS is added?
thanks.

neolit123 on 7 Feb 2019

👍1

Make KIND a PR blocking job

EDIT: my mistake i though i was reading release blocking.

~~this will hopefully happen next week.~~
this week we were able to move k-a jobs outside of blocking into release-informing / all dashboards and kind jobs to release-informing / all dashboards too.

https://github.com/kubernetes/test-infra/pull/11562
https://k8s-testgrid.appspot.com/sig-release-master-blocking

neolit123 on 28 Feb 2019

re:

Implement upgrade and skew tests for kubeadm-kind

@fabriziopandini @timothysc
yesterday we discussed briefly what is the plan.
here is the summary again.

we need upgrade and skew tests in 1.15 as we are no longer going to use kubernetes-anywhere.

summary:

kind out of the box does not support what we need for e2e testing.
major blocker is that we cannot stop the creation of the k8s cluster from the config or the command line of the kind binary. also things like kubeadm reset that we really want to test.
kind has a kubetest deployer in test infra but it's bound to a kind binary.
kinder is great for our needs but we cannot use it with that kubetest deployer.

we have multiple options (which is better than having none):

1) extend kind to support all our needs

i don't think this will happen in 1.15 (or at least the first half of the cycle) due to:

what we are demanding from kind is out of scope (e.g. kubeadm reset).
the kind alpha (or another scoped) command is ideal for us but it's invasive and will create noise in the kind repo. potential block on features we want to add.
putting pressure on the kind maintainers, who are busy most of the time.

2) add built-in support for kinder in test-infra

having built-in support for that in kubetest1 is probably a bad idea and i don't want to put effort into that. kubetest1 is beyond hard to maintain at this point. also this is political. kubetest2 (WIP) is more flexible, but kubetest2 needs work and it might take a while before we get it hooked in prow jobs.

this idea is mostly unclear.

3) bypass the kubetest deployer process completely until kubetest2 is ready and use kinder with a custom deployment process.

the whole idea of deployers is to facilitate your testing process --up --test --down flags etc.
this is great if you want to test using kind but is very limiting in our quite demanding use case.

current kind jobs for sig-testing still run using a bash script:
https://github.com/kubernetes-sigs/kind/blob/master/hack/ci/e2e.sh

we can use the same mechanics and in such a bash script we can execute kinder or possibly even pre-cook a temporary up,down,test deployer.

at this point my vote goes for 3, because i don't like the risks from 1 and 2 in terms of timing of the 1.15 cycle...

neolit123 on 14 Mar 2019

👍1

is a fair option imho :+1:
is possible but also perhaps not as fast as we might want / land it otherwise.
kubetest2 is low priority and not ready. if you want to add kinder to kubetest1 _or_ 2 I don't think there are any political blockers (?) but kubetest is a mess and kubetest2 is still an MVP. :grimacing:

I would recommend not using the bash for too much longer though, we should at least get some of the kind tests over to kubetest(2) soon, if not everything else. It will be easier to do complex testing in Go.

BenTheElder on 14 Mar 2019

ok, 3 it is then.
but i only see it as a temporary option and i will think about a way to bring the bash to a minimum.

neolit123 on 14 Mar 2019

I think it's still a little early on 1.15 to pull the trigger on options.

If we get phases in kind we can either wrap with scripts or build macro commands in kinder.

timothysc on 15 Mar 2019