ref: https://github.com/kubernetes/test-infra/pull/8466, https://github.com/kubernetes/test-infra/issues/8852
Currently it is impossible to look up the configuration for a presubmit or postsubmit job based on just the job name because job names are not required to be unique. We use this to specify different variants of a job for different branches, but still report under the same job name. However, this is what the presubmit context field is for and postsubmits do not report to GitHub.
Are there any reasons to continue allowing duplicate job names or can we consider enforcing unique job names?
cc @fejta @stevekuznetsov @sebastienvas @munnerz @adrcunha @BenTheElder @krzyzacy
/area prow
Are there any reasons to continue allowing duplicate job names or can we consider enforcing unique job names?
There might be some slight disconnect for the user when the job link does not match the context name. Other than that, just a _large_ config churn to fix this for prow.k8s.io, which I assume you're signing up for :^)
We'd also want to consider the impact on flakiness data. I think it will be mostly positive to collect flakiness data per-branch, but not all jobs are sharded, so this may skew the results a bit..
we should solve the original issue, which is a job should be able to trigger on different branches with different images? aka https://github.com/kubernetes/test-infra/issues/2547?
I think I'm for this -- we have unique names anyway as we generate our jobs and use the name field as the identifier, context as the user-facing name. Flakiness data should be better after this change as flakes between branches aren't necessarily related.
a non-sharded job will appear to flake more often though with naive flake collection
edit: not a reason to not do it! just something to consider, we might want to improve the flake metrics
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
would it be unique across configuration or unique across org/repo ?
would it be unique across configuration or unique across org/repo ?
I think we'd want them to be globally unique. Having to qualify with org/repo would add an extra step to looking up a job and that information should probably be in the job name anyways.
You cannot ask a customer to set a unique job name based on knowledge of other jobs that have nothing to do with his.
In practice we already do this today. The only job names that collide are variations of the same job for different branches (which have the same authors/owners).
Global uniqueness would be enforced with presubmit config validation.
+1 for code simplicity. -1 for usability.
How so? The user-focused field is the context, not the job name. The job name is entirely internal to the config and not surfaced to any user of Prow. Why does enforcing this make it less usable?
I think we have multiple users here.
In your case configurations are generated, which in my opinion shows how bad they are to manage. In Istio, we actually decided to manage config per org/repo/branch. That way we don't really care if one job is using a new image in master, only master will be updated and older branch will remain using their old version of the image.
I guess what I am trying to say here, is like @krzyzacy I would like to see a better story for config management than the one that we have today. Users in general (istio, internal GKE users) are very confused about the configurations and we end up managing them. I don't think we should.
Making job name unique is a breaking change so here is why I think it is not great for usability. It is OK if you have one group managing the configs, but that's another burden for us - but yeah I feel like it will simplify the code and therefore reduce the bug surface area is always a good thing.
Thanks for explaining that, what you're saying makes a lot of sense. For us, we actually name by $org-$repo-$branch-$context, so we're effectively using a union of other fields as a unique identifier and making that the name field. Perhaps we need to get rid of the name field as a settable thing in general, there's no real use to it.
Thanks for understanding. branch can be a regex right ?
There is one use case where setting unique name will make sense: If you can define your job separately from your triggers such that you can refer to it, and I think that's where @cjwagner is getting at.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle rotten
Are we ready to do this? Branch-sharded variants with different names should be simple.
I think the first step would be to log a warning message during config load for any job name collisions.
/area config/jobs
/sig testing
/kind feature
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.