Test-infra: k8s-ci-robot result comment grid has stale results while new jobs are running

Created on 18 Sep 2017 · 44Comments · Source: kubernetes/test-infra

When someone issues a /retest the result matrix has a link to failed results from the past even when a new build of that job is running.

/area prow
/kind bug
/cc @kargakis

areprow kinbug lifecyclrotten

Source

stevekuznetsov

👍3

Most helpful comment

On Mon, Dec 4, 2017 at 3:34 PM, Cole Wagner notifications@github.com
wrote:

Furthermore, the user's /test or /retest comment (or new commit push)
would have to appear after the bot notification on the PR so it should be
obvious that the results are not new.

However the comment is re-issued after just one test has failed, while
other retests are still running and so their results in the table are
stale. The comment makes it look like all of the tests have run again and
failed again, when actually only one did.

sosiouxme on 5 Dec 2017

👍3

All 44 comments

Sometimes the stale results are useful to me. Say I may want to check some more details for the failed jobs when the new jobs are running. It's hard to find the link if there is no result comment.

xiangpengzhao on 18 Sep 2017

You should always be able to see historical results in the Gubernator link. Right now we have a lot of new users being very confused by issuing /retest and seeing immediately failures when the new jobs are still running.

stevekuznetsov on 18 Sep 2017

Hmm...I'm on the fence... :smile:

xiangpengzhao on 18 Sep 2017

I don't know. The comment, as a statement of "this is the current state of testing for your PR," is incorrect today. New users are constantly being confused by this. We already have a mechanism to show historical data. At the very least we need to be clear that the test failed but we still are re-running it. Developers are not getting the right feedback for their /retest action today.

stevekuznetsov on 18 Sep 2017

Agree that it's confused for some new contributors to see the failure existing when a re-test is running. But it's not easy to find historical data for the tests if there isn't a link shown somewhere. Most people can't remember the link. Fortunately, most people don't care about the historical failures, either:)

xiangpengzhao on 18 Sep 2017

The link to Gubernator should be in the same comment, just a couple lines lower.

stevekuznetsov on 18 Sep 2017

If all failed tests are re-running, as is expected in this issue, the comment will be removed, right? So I still can't find the link at that very time.

xiangpengzhao on 18 Sep 2017

I'm on the fence about this. The current state is shown in the status lines at the bottom of the PR.

So I still can't find the link at that very time.

You can if you know how to go from the details link.

spxtr on 18 Sep 2017

You can if you know how to go from the details link.

So I can't if I don't know :) and maybe this is one of the reasons why we want that comment :smile:

xiangpengzhao on 18 Sep 2017

I think it's exactly the fact that the comment and the status are incongruous that is the problem @spxtr ... if the test is running, it has not failed.

stevekuznetsov on 18 Sep 2017

I guess I should re-frame this. Right now the dual messaging is:

confusing new users (and the fact that old users are used to it and not confused is not a good thing either)
not giving developers the feedback they need when they push /retest

We should find a way to address that.

stevekuznetsov on 18 Sep 2017

Maybe we can modify the comment to tell contributors which results are stale and which are fresh.

xiangpengzhao on 18 Sep 2017

I was thinking about that, but we already have a place to put stale results, and that's the historical grid in Gubernator. I don't see a compelling reason to invent a new place for it but we could add a column I guess.

stevekuznetsov on 18 Sep 2017

SGTM :)

xiangpengzhao on 18 Sep 2017

/cc @bparees

stevekuznetsov on 23 Sep 2017

In case we need it... more devs confused by this behavior: https://github.com/openshift/openshift-ansible/pull/6328#issuecomment-349006756

stevekuznetsov on 4 Dec 2017

/cc @sosiouxme

stevekuznetsov on 4 Dec 2017

@cjwagner do you have any thoughts on this one?

stevekuznetsov on 4 Dec 2017

I think it's exactly the fact that the comment and the status are incongruous that is the problem

The only case in which these are not the same is when the status contexts say that tests are running and the comment shows previous test results. This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests. Furthermore, the user's /test or /retest comment (or new commit push) would have to appear after the bot notification on the PR so it should be obvious that the results are not new.

I don't feel strongly about this though. Adding a column to the comment to show the job start or end times might be the best way to minimize confusion.

cjwagner on 4 Dec 2017

I don't feel strongly about this though. Adding a column to the comment to show the job start or end times might be the best way to minimize confusion.

I would note that the start times are already somewhat confusing since some of our tooling considers start to be when the job was scheduled (deck at least) and other parts consider whatever started.json claims, which can often have large skew when we're at capacity.

BenTheElder on 4 Dec 2017

The only case in which these are not the same is when the status contexts say that tests are running and the comment shows previous test results. This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests.

it's not so obvious when you get an email(github notification) from the bot telling you what tests failed and you just click the links from the email instead of going to the PR (because you're on your phone).

bparees on 4 Dec 2017

👍1

This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests.

I think it's a lot easier for us, as people pretty intimate with the code and the system in general, to understand this. We've pretty consistently seen developers get confused about this exact distinction ever since we moved to using Prow. I think it's pretty clear that there is a messaging gap we should bridge here for developers that are more concerned about tests getting run and being green than how Prow works or the intricacies of the comment.

It seems like to some extent we're re-creating the status signal in this comment as it's a way to ping everyone involved when a state transition changes (as opposed to the real status API which is silent). If there is not a strong NACK to just not having out-of-date rows in the table, I will open a PR to do that. If someone wants to see historical results, they can always view Gubernator.

stevekuznetsov on 4 Dec 2017

👍1

On Mon, Dec 4, 2017 at 3:34 PM, Cole Wagner notifications@github.com
wrote:

Furthermore, the user's /test or /retest comment (or new commit push)
would have to appear after the bot notification on the PR so it should be
obvious that the results are not new.

However the comment is re-issued after just one test has failed, while
other retests are still running and so their results in the table are
stale. The comment makes it look like all of the tests have run again and
failed again, when actually only one did.

sosiouxme on 5 Dec 2017

👍3

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 5 Mar 2018

I for one would still like this to be fixed. Very confusing.
/remove-lifecycle stale

sosiouxme on 5 Mar 2018

+1, i think we can do better. I'd also love to see a direct link to the historical test runs included in the comment. Right now I always struggle to navigate there.

bparees on 5 Mar 2018

I'd also love to see a direct link to the historical test runs included in the comment.

We already include the history of the PR into the comment linked as Full PR test history. Are you thinking of something else?

kargakis on 5 Mar 2018

We already include the history of the PR into the comment linked as Full PR test history. Are you thinking of something else?

whoops, sorry, no that's exactly what i was looking for.

bparees on 5 Mar 2018

Seems like the messaging could be made clearer, though!

stevekuznetsov on 5 Mar 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 3 Jun 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot on 3 Jul 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 2 Aug 2018

/remove-lifecycle rotten
/reopen
this is still true and still confuses developers

BenTheElder on 20 Apr 2019

@BenTheElder: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen
this is still true and still confuses developers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 20 Apr 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 19 Jul 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 18 Aug 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 17 Sep 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot on 17 Sep 2019

/remove-lifecycle rotten
/reopen

BenTheElder on 17 Sep 2019

@BenTheElder: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

k8s-ci-robot on 17 Sep 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 16 Dec 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 15 Jan 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 14 Feb 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot on 14 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Prow suggests approver from wrong OWNERS file

chaosaffe · 3Comments

submit-queue flags have outdated names

BenTheElder · 3Comments

Interval-based Periodic Jobs Triggered Constantly

stevekuznetsov · 4Comments

fix keyword should not close other PRs, only Issues

sjenning · 4Comments

gke-ubuntustable1-k8sstable2-ingress misbehaving?

MrHohn · 4Comments