I'm trying to understand the GCS data layout and content and am having some trouble understanding the started.json and finished.json files. For instance, I'm looking at this job which has:
started.json{
"timestamp": 1499200104,
"node": "agent-pr-35",
"jenkins-node": "agent-pr-35",
"pull": "master:5d2139056128c86660e20edfb3c458517e02bb2d,48470:ea8c6004c6d1320f3f4c255fecb44a3b9afba77b",
"version": "v1.8.0-alpha.1.743+77b0ad83c8f012",
"repos": {
"k8s.io/kubernetes": "master:5d2139056128c86660e20edfb3c458517e02bb2d,48470:ea8c6004c6d1320f3f4c255fecb44a3b9afba77b"
},
"repo-version": "v1.8.0-alpha.1.743+77b0ad83c8f012"
}
finished.json{
"timestamp": 1499201081,
"version": "v1.8.0-alpha.1.743+77b0ad83c8f012",
"result": "FAILURE",
"passed": false,
"job-version": "v1.8.0-alpha.1.743+77b0ad83c8f012",
"metadata": {
"repo": "k8s.io/kubernetes",
"repos": {
"k8s.io/kubernetes": "master:5d2139056128c86660e20edfb3c458517e02bb2d,48470:ea8c6004c6d1320f3f4c255fecb44a3b9afba77b"
},
"repo-commit": "77b0ad83c8f0124d02151fdbdd72674522d82ecc"
}
}
I have a number of questions here:
version, repo-version and job-version fields?pull and/or repos.k8s.io/kubernetes fields fully encode the input to the git state in the job, why are the above version fields necessary?repo-commit which is I assume the commit generated by merging the two parents as configured by Prow?result and passed in finished.json?node and jenkins-node? Is this the node on which the job is scheduled versus the master that is owning the job?finished.json? would it ever change from started.json?/cc @kargakis
@fejta @spxtr @ixdy who can help out with these questions?
Still trying to dig through this -- the "jenkins-node" key seems to be only used by a utility in k8s.io/contrib and nothing in that repo actually calls that function. The "repo_commit" key seems entirely unused as well -- would love some historical insight into what these keys were for and if we can safely ignore them, etc.
jenkins-node was useful when we had misbehaving jenkins agents and wanted to track down failures to a bad machine. it's less relevant now.
Ah missed the issue here - repo_commit is basically we run a git rev-parse HEAD at cwd, if we checks out k8s.
Looking back at these, I really feel like we should have:
The first three items fully identify a job, it's configuration and run-time data. Anything else should be optional.
I feel like this and the GCS layout documented in https://github.com/kubernetes/test-infra/pull/4767 belong together and would very close to specifying the data produced/consumed by the test-infra stack. There is more than gubernator involved, eg: testgrid, kettle
What I would like to see is how each component depends on other components, if any. There are subtle dependencies today, eg. we require the gubernator and prow configs to be synchronized, testgrid is doing some sort of validation on the prow config IIRC, etc.
Hardening the API will only make it easier for things other than Gubernator to touch it. Right now this API -- and I think we should consider the GCS layout as a public API -- is very ad-hoc and frustrating to deal with.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
/kind documentation
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
See also #7364
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stal
/lifecycle frozen
maybe, since this keeps going stale? (or maybe we should just fix it. :)
One year anniversary! :confetti_ball:
/milestone v1.14
IIRC we're trying to make sure podutils provide the appropriate metadata, can we maybe document it while we're doing so?
so, we've agreed on https://github.com/kubernetes/test-infra/blob/master/testgrid/metadata/job.go#L19-L73 being the source of truth going forward - though they can still be improved, like make metadata fields more detail.
@stevekuznetsov shall we close this?
External documentation would be the last bit to close this one, I think
/milestone v1.15
/priority important-soon
/milestone v1.16
/assign @fejta
Based on the TODO in https://github.com/kubernetes/test-infra/blob/master/testgrid/metadata/job.go#L19-L73
That said I'm happy to write docs for this if nobody else wants to
/sig testing
... two year anniversary?
way to let the three year anniversary go by without commemorating it
/cc @MushuEE
since you've been working on kettle lately
https://github.com/kubernetes/test-infra/issues/14643 I uncovered some of the assumptions while trying to describe why flake data doesn't show up for decorated jobs
I wrote up a document covering a lot of these questions but would like more input. @fejta I see that you had put several TODO, I would like to take over the cleanup if we decide that there is some action we can take here. I am not sure how to go about finding everything that consumes these .json files.
I know it is out dated but
what are the differences between version, repo-version and job-version fields?
A: None, they are all the same for Bootstrap but all deprecated for RepoCommit
when the pull and/or repos.k8s.io/kubernetes fields fully encode the input to the git state in the job, why are the above version fields necessary?
A: OPEN
is it significant to keep track of the repo-commit which is I assume the commit generated by merging the two parents as configured by Prow?
A: I believe this is the SHA of the main k8s REF HEAD
why is there a difference between result and passed in finished.json?
A: Result has more states than passed but I don't think anything uses those states. However there is an ABORTED state in pod-utils
what are the differences between node and jenkins-node? Is this the node on which the job is scheduled versus the master that is owning the job?
A: this is gone for jenkins but Node is the machine executing the job
why is version info also in finished.json? would it ever change from started.json?
A: It should not need to be and will be the same
@MushuEE can you move your doc to markdown in the repo?
@BenTheElder Sure let me consolidate, where would the best home be for that doc?
IMO it'd be worth putting in /docs...
Most helpful comment
... two year anniversary?