Test-infra: go.k8s.io/triage cluster data is much larger than usual

Created on 1 Sep 2020  Â·  10Comments  Â·  Source: kubernetes/test-infra

What happened:
I visited go.k8s.io/triage and it said it was downloading 100's of MB of data

What you expected to happen:
I'm used to seeing it claim to download ~20-40MB of data (e.g. https://storage.googleapis.com/k8s-gubernator/triage/index.html?date=2020-08-23)

Please provide links to example occurrences, if any:

Anything else we need to know?:
We had a brief outage from 2020-08-23 - 2020-08-25 for unrelated reasons (ref: https://github.com/kubernetes/test-infra/issues/17625#issuecomment-681108961), so we're not quite sure what happened within that window.

Suspect this is pathological data / clustering and not the go rewrite, but don't know for sure.

$ gsutil ls -lh gs://k8s-gubernator/triage/history | tail -n20
  5.22 MiB  2020-08-10T23:31:25Z  gs://k8s-gubernator/triage/history/20200810.json
  4.94 MiB  2020-08-11T23:33:57Z  gs://k8s-gubernator/triage/history/20200811.json
  4.91 MiB  2020-08-12T23:42:54Z  gs://k8s-gubernator/triage/history/20200812.json
  4.89 MiB  2020-08-14T00:00:06Z  gs://k8s-gubernator/triage/history/20200813.json
  4.44 MiB  2020-08-14T19:59:20Z  gs://k8s-gubernator/triage/history/20200814.json
  4.39 MiB  2020-08-17T23:51:08Z  gs://k8s-gubernator/triage/history/20200817.json
  4.36 MiB  2020-08-18T23:39:30Z  gs://k8s-gubernator/triage/history/20200818.json
  4.23 MiB  2020-08-19T23:41:14Z  gs://k8s-gubernator/triage/history/20200819.json
   4.2 MiB  2020-08-20T23:57:37Z  gs://k8s-gubernator/triage/history/20200820.json
  4.15 MiB  2020-08-21T23:56:54Z  gs://k8s-gubernator/triage/history/20200821.json
  4.04 MiB  2020-08-22T23:57:30Z  gs://k8s-gubernator/triage/history/20200822.json
  3.85 MiB  2020-08-23T18:39:04Z  gs://k8s-gubernator/triage/history/20200823.json
 32.41 MiB  2020-08-25T22:43:55Z  gs://k8s-gubernator/triage/history/20200825.json
 35.09 MiB  2020-08-26T19:53:01Z  gs://k8s-gubernator/triage/history/20200826.json
 14.75 MiB  2020-08-27T22:51:06Z  gs://k8s-gubernator/triage/history/20200827.json
 27.63 MiB  2020-08-28T23:06:57Z  gs://k8s-gubernator/triage/history/20200828.json
 25.94 MiB  2020-08-29T22:50:59Z  gs://k8s-gubernator/triage/history/20200829.json
 18.95 MiB  2020-08-30T23:20:06Z  gs://k8s-gubernator/triage/history/20200830.json
 13.17 MiB  2020-08-31T23:36:15Z  gs://k8s-gubernator/triage/history/20200831.json

/area triage

aretriage kinbug lifecyclrotten prioritimportant-soon

All 10 comments

but… how will we triage the issue…

/me wanders away, lost

/priority critical-urgent
Well, now it's frozen at Loading... parsing 0MB. after climbing to somewhere above 520 MB

So, as of now, jobs are taking ~11 minutes and producing ~50MB of data, which is normal.

Something happened at midnight between 9/07 and 9/08, with runtimes dropping from ~40 minutes to the current ~11 and output size dropping from ~250MB to the current ~50. A similar drop happened between 9/06 and 9/07, and between 9/05 and 9/06.

Since Triage runs over data from the past 2 weeks starting from midnight, I would guess this was some issue with the input data. I'd note that global clustering time did not scale up proportionally with local clustering time, meaning that it may be more susceptible to bad inputs, whatever they may be.

/remove-priority critical-urgent
/priority important-soon
We should figure out what the culprit is and filter for it, but I'll drop priority since the tool is more usable now

The drops for posterity
Screen Shot 2020-09-09 at 4 37 49 PM

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/close
It hasn't climbed to similar levels since august, going to defer investigating.

@spiffxp: Closing this issue.

In response to this:

/close
It hasn't climbed to similar levels since august, going to defer investigating.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fejta picture fejta  Â·  4Comments

stevekuznetsov picture stevekuznetsov  Â·  3Comments

sjenning picture sjenning  Â·  4Comments

cjwagner picture cjwagner  Â·  3Comments

cblecker picture cblecker  Â·  4Comments