/area testgrid
/area prow
/area prow/pod-utilities
prow/pod-utils/gcs makes a large pile of dubious claims about deprecation. We need to update this library (as well as initupload and sidecar) to output the same fields as bootstrap.py
Basically all of this is wrong, and most likely because we didn't understand the previous format when creating this structure.
I was hoping we can simply upload prowjob.yaml and get rid of ALL of the metadata from started/finished.json.
(aka, imho move towards trying to produce the same fields like bootstrap is a step backwards)
Can we instead, please update tools to expect a full-featured:
in the artifacts folder? I can work next on making artifact-uploader push those up
/assign
https://github.com/kubernetes/test-infra/issues/10544#issuecomment-454562161 would also be nice either way ... 99% of the time when triaging a job the only thing I really need besides the actual job artifacts is the commit(s) and maybe the start / end time.
Right now we lack the commits, and the YAMLs are going to show EG "master" as the ref for periodics etc.
We use an ugly bandaid for a few jobs today to write out a minimal metadata.json with the primary repo commit at least to get around this.
https://github.com/kubernetes/test-infra/blob/ed6a7c4f28a2f87de404f8844a14a512e4c8f3d6/images/bootstrap/runner.sh#L106-L111
https://testgrid.k8s.io/sig-release-master-blocking#verify-master
No one uses artifact-uploader. We run 20,000 builds a day, most of them still use bootstrap, and we have at least a year and a half of this historical data. We should continue uploading the fields that bootstrap.py does that make sense, rather than arbitrarily changing them for no good reason (to revision, for example).
And while it is certainly the most verbose solution, I am unconvinced that uploading the pod and prowjob yaml is correct and/or most useful.
We should resolve https://github.com/kubernetes/test-infra/issues/10544 1000000% - when whoever has some spare cycles.
We run 20,000 builds a day, most of them still use bootstrap
We will migrate them to podutils :-)
most likely because we didn't understand the previous format when creating this structure.
Yep - that's also the problem when we making podutils isn't it? 😏
We should continue uploading the fields that bootstrap.py does that make sense, rather than arbitrarily changing them for no good reason
IMHO version and job-version are bad names, and bootstrap fields are not very well documented to begin with. (Yes we might not fully understand the meaning of each field, but all of them can be found in prowjob.yaml right?) We control downstream applications, if we have a clean definition, like today we created the struct, we can easily make things like gubernator/spyglass to support extracting more info from prowjob.yaml (while still support all the old fields, for a while, sure).
Also:
// Node is deprecated.
Node string `json:"node,omitempty"`
// Repos is deprecated.
Repos map[string]string `json:"repos,omitempty"`
// Pull is deprecated.
Pull string `json:"pull,omitempty"`
I don't think anyone is actually consuming these fields today?
bootstrap fields are not very well documented
We should do better than this. The code that outputs these fields is open source, in this repo, and created by the same people who work on prow.
It is fine to migrate fields, but we should be more intentional about the reasons why we should do this, rather just because we failed to read the python file or ask someone how to generate the fields correctly.
version and job-version are bad names
These are different things that we intentionally decoupled:
Originally we waited a long time to upload started.json, and this was the responsibility of the job itself (along with cloning any repos). Eventually we migrated this out of the jobs and into bootstrap, which became more general purpose (for any repo) rather than just k/k. Eventually we migrated this out of the test container entirely thanks to pod-utils.
Once this became a general purpose checkout+metadata utility, computing the kubernetes/kubernetes job version:
Basically this value does not belong in a general purpose utility, however since it is deeply useful for the k/k community we kept it for our ci and k/k pull jobs.
We created repo-version for the general purpose bootstrap and pod-utils to upload.
Notice how the job-version is v1.13.3 for https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-gce-conformance-stable-1-13/345.
or for a dev version it will be something like v1.14.0-alpha.2.219+30566b990aa139 - https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/34487
This is the wrong value to add into the revision field.
Yes we might not fully understand the meaning of each field, but all of them can be found in prowjob.yaml right?
We should be intentional about the metadata we expose and write these values correctly, not just throw up a pod and hope that our tooling can figure it out.
I don't think anyone is actually consuming these fields today?
Pull is the deprecated version of repos when we only supported cloning one repo. Repos was intended to specify what version of which repos were cloned, which pod-utils stuck in a separate file.
Node we should fix to report the physical node on which testing occurred, as this is important debugging information. It reported this value back when we ran on VMs, but jobs started reporting the container name instead as we migrated them to prow
Semi-related note: for TestGrid at least other tools produce some of this metadata too including kubetest and the conformance uploader used to federate results. For the latter, uploading a prowjob spec would be ... odd.
I also don't see the specs covering things like finish time....?
Also note that public testgrid has its own variant of these fields:
Google's internal testgrid has its own interpretation of things too
Also note that public testgrid has its own variant of these fields:
afaik the OSS updater is not being used?
Google's internal testgrid has its own interpretation of things too
I don't think that uses these fields from started.json either?
I'd still want to keep the existing started.json and finished.json - but I also want to have the additional prowjob.yaml to have more debug information.
Fields like: Node - which computes from https://github.com/kubernetes/test-infra/blob/master/jenkins/bootstrap.py#L577-L593, has been not working for a while... can be found from pod.yaml.
Pull, Repos can be found from Refs in prowjob.yaml.
Aka - I agree that we should provide as much information for info & debugging purposes... If you have strong feeling that these fields in started.json should not be deprecated, feel free to PR, even re-design, and maintain them :-)
In the meantime I also want to see we start to upload pod.yaml and prowjob.yaml, which is orthogonal.
Sure, sidecar and initupload (or artifact-uploader) should absolutely read these fields from the pod and prowjob to populate them, but pod and prowjob is the wrong abstraction for a build result.
This does not mean that we should require testgrid, resultstore, gubernator, etc to understand the internals of prow and kubernetes in order to do their work. There should be a (Result) layer of abstraction between the job runner (whether prow, circleci, travis, etc) and what displays these results to the end user.
I'm mostly sad about every time I feel like I understand the fields from bootstrap but I was not :-) (for example, repo-version was obviously only work for k/k)
Have a unified result object from a job sounds reasonable - but I feel that's different than trying to back-fill the old fields in started.json? (For example, currently Pull only reveals SHA from the main repo, for a proper result object it need to show all repos?) - Maybe there are other fields we are interested in?
(I just don't convinced these bespoke fields suddenly becomes the source of truth for all our jobs :-) Especially they have gone through the Jenkins -> Bootstrap -> Podutil migrations)
for a proper result object it need to show all repos
Repos is where we put info about all the repos we checked out. Pull specifies which PR it is testing:
https://github.com/kubernetes/test-infra/blob/ed6a7c4f28a2f87de404f8844a14a512e4c8f3d6/jenkins/bootstrap.py#L327-L328
these bespoke fields suddenly becomes the source of truth for all our jobs :-) Especially they have gone through the Jenkins -> Bootstrap -> Podutil migrations
We went through the jenkins -> containerized jenkins -> bootstrap containerized jenkins -> bootstrap prow while maintaining backwards compatibility.
We should do the same for the ongoing podutil migration.
So My request is that:
See also:
https://github.com/istio/istio/blob/master/bin/ci2gubernator.sh
https://github.com/istio/test-infra/tree/master/toolbox/ci2gubernator
Which appear to be used to populate gubernator + testgrid from circleCI, alongside Prow based results.
I think Kubeflow is also populating GCS:
https://github.com/kubeflow/kubeflow/blob/76107ff3aa7f9fde6fd7f15f2ec0a69a35874efa/testing/run.sh
kubeflow has a homebrew'd bash which ends up clone test-infra and run bootstrap.py, and I believe it's the same case for istio... (aka they are not hand-crafting started/finished.json)
See above: istio is not using bootstrap to do this on CircleCI, they have their own go program to write our metadata format.
hummmmm I see.. https://github.com/istio/test-infra/blob/8feb3634980f971ed0e027e0f6a8d5e672d6bf91/sisyphus/ci.go#L52-L76 makes me 😢
ok I'll agree with the issue as long as all the fields are well documented and we know what each fields mean, and they will be maintained in the future.
Why shouldn't tools just read the ProwJob or Pod? I guess for computed fields like duration of you don't want to do ProwJob.Status.FinishedTimestamp - ProwJob.Status.StartedTimestamp on every client, maybe, but Refs should be presented identically IMO - no "job version" or otherwise.
The actual k8s objects are stable and well documented, they've got helpers and serialization targets that you can import in go, it's clear what fields may or may not be set...
Should everything uploading to testgrid be coupled to prow jobs..? We have
EG OpenStack's CI uploading CI signal for their cloud provider integration
against Kubernetes.
Right now it's easy to target the minimal metadata fields without even
depending on test-infra.
On Mon, Feb 4, 2019, 08:07 Steve Kuznetsov <[email protected] wrote:
The actual k8s objects are stable and well documented, they've got helpers
and serialization targets that you can import in go, it's clear what fields
may or may not be set...—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/11100#issuecomment-460305568,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bqy6An7EqrScTTHsJjvWu6JSR7rE1ks5vKFq2gaJpZM4afa6K
.
Hmm okay that's valid, didn't know we had non prow uploads. Let's create a reference implementation of the metadata digestion from ProwJobs and Pods and use that to keep us documented etc
Refs should be presented identically IMO - no "job version" or otherwise.
The reason we split version into job-version and repo-version is so that we can do exactly this for repo-version and repos, which is owned by the checkout tool, not something the job (aka test container) has the right to control.
On the other hand job-version is owned by the job internals, which can set it to whatever it wants. Our k/k jobs have some complicated shenanigans that involve bash, reading extracted files and/or git tags to produce v1.14.0-alpha.2.219+30566b990aa139, and we expect to find this value after the job completes in testgrid, gubernator, etc.
Sure, but repo-version should be the Refs struct and _only_ the Refs struct as derived from the ProwJobSpec, yeah?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
I was hoping we can simply upload prowjob.yaml and get rid of ALL of the metadata from started/finished.json.
(aka, imho move towards trying to produce the same fields like bootstrap is a step backwards)