What would you like to be added:
Make it possible to use the inrepoconfig feature with the prow.k8s.io instance
Why is this needed:
Today, writing presubmit jobs has to major hurdles:
Today as an oncall member I can trivially:
How do I do those things with inrepoconfig?
Audit all prowjobs by inspecting the source of truth in test-infra
One option could be to upload the job definition to GCS which would also allow us to easely answer the question "What config was this job run with?" after the prowjob CR is cleaned up.
File PRs (with approval) to change or remove any prowjob for any sort of misbehavior (due to the OWNERS heirarchy)
Maybe make the approval plugin approve all PRs that touch .prow.yaml from ppl that are approvers in test-infra?
Regenerate the config maps easily when they are broken (with the boostrapper)
This problem does not exist, as the jobs from inrepoconfig are never persisted to a configmap, they are fetched from the repo on demand and only exist in memory/as Prowjob CR.
Enforce presubmits are run against prowjob configs for basic sanity checks (via test-infra presubmits)
inrepoconfig already executes jobs from both the central config and the .prow.yaml file. It is strongly suggested to setup checkconfig in the central repo to vet the config in order to avoid issues. One problem here remains thought, namely that there is no way to validate the jobs before they get executed (except for the allowed_clusters: setting).
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
I am strongly opposed to adopting in-repo config for prow.k8s.io:
It makes it much harder for us to centrally manage test jobs. It is frequently valuable to make observe and make sweeping, automated changes to all jobs. This would become much harder without all the jobs being in one place for observation. Losing the ability to make bulk updates, disable/enable batches of poorly-behaved jobs on a whim, or use our batch tooling (e.g. the job fork and rotation tools) is a non-starter for me.
It further lowers the trust bar for running new jobs: now we don't even need jobs to be commited; you can simply create a PR and your arbitrary job will be run. This might make it harder to make future changes like, for instance, ensuring only jobs in certain repos can have a particular service account.
We can try to preemptively handle every case that might come up, but I don't think the result will be understandable or effective.
Additionally, it is not terribly clear to me what the real benefit here is — being able to test your jobs without committing any record of what that job was is potentially nice (if maybe risky), but I'm not sold on the real value of keeping jobs in their own repos. Editing multiple repos with major changes is a fact of life in a multi-repo world regardless.
Additionally, it is not terribly clear to me what the real benefit here is — being able to test your jobs without committing any record of what that job was is potentially nice (if maybe risky), but I'm not sold on the real value of keeping jobs in their own repos. Editing multiple repos with major changes is a fact of life in a multi-repo world regardless.
It's not just being able to test jobs, its also:
It further lowers the trust bar for running new jobs: now we don't even need jobs to be commited; you can simply create a PR and your arbitrary job will be run. This might make it harder to make future changes like, for instance, ensuring only jobs in certain repos can have a particular service account.
If you are a trusted user, yes. Most jobs are already based on executing some script in the target
repo, so being able to manipulate the job itself doesn't change much about the trust model. Additionally, the GCS reporter will upload the job to GCS.
Generally, it would be quite easy to extend the validation to include whatever is relevant. If you want to easily have any prow-instance specific validations, the validation could be moved into a webservice that is called prior to creating jobs.
It makes it much harder for us to centrally manage test jobs. It is frequently valuable to make observe and make sweeping, automated changes to all jobs. This would become much harder without all the jobs being in one place for observation. Losing the ability to make bulk updates, disable/enable batches of poorly-behaved jobs on a whim, or use our batch tooling (e.g. the job fork and rotation tools) is a non-starter for me.
Some of the problems here only exist because of the central repo, like the config forking. Others like the changes could be solved by using preset. Disabling jobs based on any kind of property could be part of the validation, this would also be a better way to communicate this back than just having them vanish.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
repo, so being able to manipulate the job itself doesn't change much about the trust model. Additionally, the GCS reporter will upload the job to GCS.
It substantially does, you can completely alter the podspec etc.
I don't care much about the script run in most cases (we do in some), I do nearly always care about the definition of the job including resource requests, mounts, k8s service account, etc., not-intentionally-malicious resource misuse and abuse is a bigger concern.
That said we do have clusters where we do NOT allow a single job in them to be configured with a script from another repo, and users will need to continue to use our tools in them for security reasons, where things are in fact relatively locked down.
That is enforced via test-infra configuration validation presubmit & review today.
Most projects also rely on images from this repo, which we control rollout for by automated config changes.
In any case, if anyone wants this change, I strongly suggest you contribute to the wg-k8s infra effort to move prow.k8s.io to community infrastructure, at which time the group of active infrastructure maintainers can decide this.
Currently this work is being done near exclusively by @spiffxp with reviews by myself.
The existing oncall rotation maintaining the existing infrastructure previously discussed this issue and was against it for the time being.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Most helpful comment
It's not just being able to test jobs, its also:
If you are a trusted user, yes. Most jobs are already based on executing some script in the target
repo, so being able to manipulate the job itself doesn't change much about the trust model. Additionally, the GCS reporter will upload the job to GCS.
Generally, it would be quite easy to extend the validation to include whatever is relevant. If you want to easily have any prow-instance specific validations, the validation could be moved into a webservice that is called prior to creating jobs.
Some of the problems here only exist because of the central repo, like the config forking. Others like the changes could be solved by using preset. Disabling jobs based on any kind of property could be part of the validation, this would also be a better way to communicate this back than just having them vanish.