Test-infra: ProwJobs should have unique names (don't allow branch sharded variants to share a name)

Created on 30 Oct 2018 · 28Comments · Source: kubernetes/test-infra

ref: https://github.com/kubernetes/test-infra/pull/8466, https://github.com/kubernetes/test-infra/issues/8852

Currently it is impossible to look up the configuration for a presubmit or postsubmit job based on just the job name because job names are not required to be unique. We use this to specify different variants of a job for different branches, but still report under the same job name. However, this is what the presubmit context field is for and postsubmits do not report to GitHub.

Are there any reasons to continue allowing duplicate job names or can we consider enforcing unique job names?

cc @fejta @stevekuznetsov @sebastienvas @munnerz @adrcunha @BenTheElder @krzyzacy
/area prow

areprow kinfeature lifecyclrotten sitesting

Source

cjwagner

All 28 comments

Are there any reasons to continue allowing duplicate job names or can we consider enforcing unique job names?

There might be some slight disconnect for the user when the job link does not match the context name. Other than that, just a _large_ config churn to fix this for prow.k8s.io, which I assume you're signing up for :^)

We'd also want to consider the impact on flakiness data. I think it will be mostly positive to collect flakiness data per-branch, but not all jobs are sharded, so this may skew the results a bit..

BenTheElder on 30 Oct 2018

we should solve the original issue, which is a job should be able to trigger on different branches with different images? aka https://github.com/kubernetes/test-infra/issues/2547?

krzyzacy on 30 Oct 2018

I think I'm for this -- we have unique names anyway as we generate our jobs and use the name field as the identifier, context as the user-facing name. Flakiness data should be better after this change as flakes between branches aren't necessarily related.

stevekuznetsov on 27 Nov 2018

👍1

a non-sharded job will appear to flake more often though with naive flake collection

edit: not a reason to not do it! just something to consider, we might want to improve the flake metrics

BenTheElder on 4 Dec 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Mar 2019

/remove-lifecycle stale

stevekuznetsov on 4 Mar 2019

would it be unique across configuration or unique across org/repo ?

sebastienvas on 4 Mar 2019

would it be unique across configuration or unique across org/repo ?

I think we'd want them to be globally unique. Having to qualify with org/repo would add an extra step to looking up a job and that information should probably be in the job name anyways.

cjwagner on 4 Mar 2019

You cannot ask a customer to set a unique job name based on knowledge of other jobs that have nothing to do with his.

sebastienvas on 4 Mar 2019

In practice we already do this today. The only job names that collide are variations of the same job for different branches (which have the same authors/owners).
Global uniqueness would be enforced with presubmit config validation.

cjwagner on 4 Mar 2019

👍1

+1 for code simplicity. -1 for usability.

sebastienvas on 5 Mar 2019

How so? The user-focused field is the context, not the job name. The job name is entirely internal to the config and not surfaced to any user of Prow. Why does enforcing this make it less usable?

stevekuznetsov on 5 Mar 2019

I think we have multiple users here.

The User you are referring to is the PR author. And yes the context matters and while it can be generated from the name, it can also be set to something that does not need to be unique.
The other user is someone adding / editing configurations, that needs to come up with a unique name - As this user I don't understand why my presubmit name need to differ from a presubmit from another org repo, same goes for postsubmit.

In your case configurations are generated, which in my opinion shows how bad they are to manage. In Istio, we actually decided to manage config per org/repo/branch. That way we don't really care if one job is using a new image in master, only master will be updated and older branch will remain using their old version of the image.

I guess what I am trying to say here, is like @krzyzacy I would like to see a better story for config management than the one that we have today. Users in general (istio, internal GKE users) are very confused about the configurations and we end up managing them. I don't think we should.

Making job name unique is a breaking change so here is why I think it is not great for usability. It is OK if you have one group managing the configs, but that's another burden for us - but yeah I feel like it will simplify the code and therefore reduce the bug surface area is always a good thing.

sebastienvas on 5 Mar 2019

Thanks for explaining that, what you're saying makes a lot of sense. For us, we actually name by $org-$repo-$branch-$context, so we're effectively using a union of other fields as a unique identifier and making that the name field. Perhaps we need to get rid of the name field as a settable thing in general, there's no real use to it.

stevekuznetsov on 5 Mar 2019

Thanks for understanding. branch can be a regex right ?

There is one use case where setting unique name will make sense: If you can define your job separately from your triggers such that you can refer to it, and I think that's where @cjwagner is getting at.

sebastienvas on 5 Mar 2019

👍1

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 Jun 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 4 Jul 2019

/remove-lifecycle rotten

Are we ready to do this? Branch-sharded variants with different names should be simple.

stevekuznetsov on 9 Jul 2019

I think the first step would be to log a warning message during config load for any job name collisions.

cjwagner on 9 Jul 2019

/area config/jobs
/sig testing
/kind feature

spiffxp on 6 Aug 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Nov 2019

/remove-lifecycle stale

stevekuznetsov on 5 Nov 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 Feb 2020

/remove-lifecycle stale

stevekuznetsov on 3 Feb 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 May 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 2 Jun 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 2 Jul 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.