Test-infra: The cross-build should be branched, not always against master

Created on 6 Sep 2017  路  32Comments  路  Source: kubernetes/test-infra

In order to have a good CI signal for the release, we should branch the crossbuild to also run for the release-1.8 branch.

See this:

lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest.txt
v1.9.0-alpha.0.379+7be29bd9b6913e
lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest-1.8.txt
v1.8.0-alpha.3.673+73326ef01d2d7c
lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest-1.7.txt
v1.7.0-alpha.4.914+b9e8d2aee6d593
lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest-1.6.txt
v1.6.0-alpha.3.352+7d6eba69848528
lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest-1.5.txt
v1.5.0-alpha.2.651+c80acb4cb8c54f
lucas@THEGOPHER:~/luxas/k8s/test-infra$ curl -sSL dl.k8s.io/ci-cross/latest-1.4.txt
<?xml version='1.0' encoding='UTF-8'?><Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message></Error>

On every beta cut, the new alpha.0 is created, which means that the crossbuild will use jump on an use master in order to be usable for the next release. But directly when master and the release branch diverge (around the rc cut), we don't have any coverage if PRs targeted at the release branch break something. Only way to find out is manually or things break when we're releasing.

To fix this, we should make a cross-build-1.8 and add that to the release-1.8-blocking tab.

@jdumars @calebamiles @wojtek-t @vishh @krzyzacy @fejta @ixdy

kinbug lifecyclrotten sirelease

All 32 comments

/assign

I can add one

All of the release-branch build jobs (including release-1.8 job) do a full cross-build. See https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/ci-kubernetes-build-1.8/1?log#log.

Though the ci-cross/ vs. ci/ thing is a weird anomaly.

Also, the release-1.8 job doesn't push images to gcr.io. That was new as of 1.8, so we should probably add it.

Aha, good to know.
Would it make sense to branch cross-build anyway...?
Otherwise one can't get binaries from the cross-build like normal (the images aren't pushed either)

I think it's easiest to have both. The cross-build is "special" and it would be even more confusing to try to convert some of those things to the "normal" ci of the release branch. WDYT?

cc @kad

I guess this is still an issue for folks, and I'm not sure how to resolve it.

Originally all build jobs were cross-builds, and they pushed binary artifacts to gs://kubernetes-release-dev/ci. At some point, we switched the master-branch build job to be a quick-release (amd64 only), and created a new master-branch cross-build job, which pushes its binary artifacts to gs://kubernetes-release-dev/ci-cross, so it wouldn't collide with the quick-release job.

Release-branch build jobs still do a full cross-build, and they still push their binary artifacts to gs://kubernetes-release-dev/ci. Our e2e tests read only the artifacts from gs://kubernetes-release-dev/ci, since we only test amd64.

kubeadm apparently uses only gs://kubernetes-release-dev/ci-cross, which is now causing problems because the release-1.8 build job doesn't push any artifacts there. I'm not sure how to resolve this - pushing to ci-cross instead would break assumptions made by our e2e jobs, along with a number of scripts in kubernetes/kubernetes.

Pushing to both places is not great, either.

I would propose to do following:

  • ci/* should be place where full builds are published, regardless of the branch.
  • ci/latest*.txt updated only if builds are successfully finished for all branches.
  • quick-release (amd64 only) which is used for e2e to be pushed into separate directory. e.g. ci-e2e/ ?

This way we will get ci/* be always consistent for all branches and all architectures and would match official releases that are done in gs://kubernetes-release/release.

Images from all CI builds can go to the one registry that is currently in use, however I'm curious, how it is currently happening in case master branch and two build jobs that produce images with same name/build id for amd64 arch ? one get ignored or overwritten ?

sounds like a reasonable proposal. currently I believe the build job aborts if it sees preexisting artifacts with the same build version.

I unfortunately don't have bandwidth for this right now.

anyone able to help @kad? @krzyzacy @BenTheElder or others from the community?

Why not just run the cross build on release branches as well? That will populate ci-cross and push images as well (and never push images for ci/)
I'd prefer to touch the minimum amount of moving parts here
@ixdy sounds reasonable?

@luxas from explanation that @ixdy wrote, my understanding that release branches are always cross-builds and populates everything, but into ci/* directories.
It was only master branch which is special (quick-build in ci/, cross in ci-cross/) and we were "lucky" to step on it with our planning for support CI builds in kubeadm.

So, amount of moving parts in options:

  • ci/* to be all cross:

    • move master quick to somewhere else

    • teach e2e to use that another place

    • remove "crutch" in kubeadm to use ci-cross instead of ci/ for URLs.

  • another option: ci/* to be incomplete (quick-builds)

    • move all branch full build jobs to be populating ci-cross/

    • e2e will be only viable user of ci/.

    • kubeadm will forever do behind the scenes URL translation (ci/ -> ci-cross/).

I think I lean towards option two:

another option: ci/* to be incomplete (quick-builds)
move all branch full build jobs to be populating ci-cross/*
e2e will be only viable user of ci/*.
kubeadm will forever do behind the scenes URL translation (ci/ -> ci-cross/).

With the exception that we run the dedicated crossbuild job for release branches as well; even though ci on release branches do make release.
Would that make sense? That's the most obvious solution IMO

I'm ok with both variants. the more important thing that it will be consistent. one way or another.

eh, I did not noticed this is reopened - anything we still need?

@krzyzacy IMO, we should run the crossbuild as a CI job on the release branches as well. That's the easiest solution in order to get dl.k8s.io/ci-cross/latest-1.X.txt properly updated.
And then make the normal CI build do a crossbuild, but not push images

I may take a look at this after we're done eliminating jenkins and have a better solution to bin-packing these jobs (https://github.com/kubernetes/test-infra/issues/5436). Cross pretty much consumes an entire build node for ~ one hour.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Reading back through this the desired end-state is still unclear, but this is clearly a pain point for kubeadm in particular that we should fix.

/cc @kubernetes/sig-release-bugs @dims @jdumars - The weird mechanizations around kubeadm-1.9 -1.10 upgrade job failures are rooted here. I'm not certain who owns and what the path forwards is...

Could this be a topic for the test standup? @fejta ?

@jdumars This came up at sig-testing weekly today.

One additional note: the cross-building jobs also push images to gcr.io/kubernetes-ci-images (master and release branches). The master branch fastbuild job does not push any images.

This is why the master-branch kubeadm job reads from ci-cross/, rather than ci/.

The master branch fastbuild job is the real anomaly here.

Perhaps we should normalize the master branch CI job to not be a fastbuild then and to push to the same location?

We already have a significantly faster CI quick release alternative with the bazel build+caching now that other jobs can consume.

@ixdy but it looks that gcr.io/kubernetes-ci-images has some images from latest 1.10 builds from ci/ bucket.

@kad right, I believe the ci-kubernetes-build-beta release-1.10 branch job is producing those.
https://k8s-testgrid.appspot.com/sig-release-1.10-all#build-1.10

@ixdy can you comment here the conclusion of the other day's conversation about this?

Is this related to https://k8s-testgrid.appspot.com/sig-release-1.9-blocking#gce-kubeadm-1.7-on-1.9?

This is a release blocking test suite and failing for a while. Should we just remove it from release blocking tests?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cblecker picture cblecker  路  4Comments

spzala picture spzala  路  4Comments

benmoss picture benmoss  路  3Comments

BenTheElder picture BenTheElder  路  4Comments

fen4o picture fen4o  路  4Comments