Test-infra: k8s-ci-robot result comment grid has stale results while new jobs are running

Created on 18 Sep 2017  路  44Comments  路  Source: kubernetes/test-infra

When someone issues a /retest the result matrix has a link to failed results from the past even when a new build of that job is running.

/area prow
/kind bug
/cc @kargakis

areprow kinbug lifecyclrotten

Most helpful comment

On Mon, Dec 4, 2017 at 3:34 PM, Cole Wagner notifications@github.com
wrote:

Furthermore, the user's /test or /retest comment (or new commit push)
would have to appear after the bot notification on the PR so it should be
obvious that the results are not new.

However the comment is re-issued after just one test has failed, while
other retests are still running and so their results in the table are
stale. The comment makes it look like all of the tests have run again and
failed again, when actually only one did.

All 44 comments

Sometimes the stale results are useful to me. Say I may want to check some more details for the failed jobs when the new jobs are running. It's hard to find the link if there is no result comment.

You should always be able to see historical results in the Gubernator link. Right now we have a lot of new users being very confused by issuing /retest and seeing immediately failures when the new jobs are still running.

Hmm...I'm on the fence... :smile:

I don't know. The comment, as a statement of "this is the current state of testing for your PR," is incorrect today. New users are constantly being confused by this. We already have a mechanism to show historical data. At the very least we need to be clear that the test failed but we still are re-running it. Developers are not getting the right feedback for their /retest action today.

Agree that it's confused for some new contributors to see the failure existing when a re-test is running. But it's not easy to find historical data for the tests if there isn't a link shown somewhere. Most people can't remember the link. Fortunately, most people don't care about the historical failures, either:)

The link to Gubernator should be in the same comment, just a couple lines lower.

If all failed tests are re-running, as is expected in this issue, the comment will be removed, right? So I still can't find the link at that very time.

I'm on the fence about this. The current state is shown in the status lines at the bottom of the PR.

So I still can't find the link at that very time.

You can if you know how to go from the details link.

You can if you know how to go from the details link.

So I can't if I don't know :) and maybe this is one of the reasons why we want that comment :smile:

I think it's exactly the fact that the comment and the status are incongruous that is the problem @spxtr ... if the test is running, it has not failed.

I guess I should re-frame this. Right now the dual messaging is:

  • confusing new users (and the fact that old users are used to it and not confused is not a good thing either)
  • not giving developers the feedback they need when they push /retest

We should find a way to address that.

Maybe we can modify the comment to tell contributors which results are stale and which are fresh.

I was thinking about that, but we already have a place to put stale results, and that's the historical grid in Gubernator. I don't see a compelling reason to invent a new place for it but we could add a column I guess.

SGTM :)

/cc @bparees

In case we need it... more devs confused by this behavior: https://github.com/openshift/openshift-ansible/pull/6328#issuecomment-349006756

/cc @sosiouxme

@cjwagner do you have any thoughts on this one?

I think it's exactly the fact that the comment and the status are incongruous that is the problem

The only case in which these are not the same is when the status contexts say that tests are running and the comment shows previous test results. This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests. Furthermore, the user's /test or /retest comment (or new commit push) would have to appear after the bot notification on the PR so it should be obvious that the results are not new.

I don't feel strongly about this though. Adding a column to the comment to show the job start or end times might be the best way to minimize confusion.

I don't feel strongly about this though. Adding a column to the comment to show the job start or end times might be the best way to minimize confusion.

I would note that the start times are already somewhat confusing since some of our tooling considers start to be when the job was scheduled (deck at least) and other parts consider whatever started.json claims, which can often have large skew when we're at capacity.

The only case in which these are not the same is when the status contexts say that tests are running and the comment shows previous test results. This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests.

it's not so obvious when you get an email(github notification) from the bot telling you what tests failed and you just click the links from the email instead of going to the PR (because you're on your phone).

This doesn't seem like a confusing state to me, the comment shows test results so it seems apparent that any results displayed while test are running refer to the previous round of tests.

I think it's a lot easier for us, as people pretty intimate with the code and the system in general, to understand this. We've pretty consistently seen developers get confused about this exact distinction ever since we moved to using Prow. I think it's pretty clear that there is a messaging gap we should bridge here for developers that are more concerned about tests getting run and being green than how Prow works or the intricacies of the comment.

It seems like to some extent we're re-creating the status signal in this comment as it's a way to ping everyone involved when a state transition changes (as opposed to the real status API which is silent). If there is not a strong NACK to just not having out-of-date rows in the table, I will open a PR to do that. If someone wants to see historical results, they can always view Gubernator.

On Mon, Dec 4, 2017 at 3:34 PM, Cole Wagner notifications@github.com
wrote:

Furthermore, the user's /test or /retest comment (or new commit push)
would have to appear after the bot notification on the PR so it should be
obvious that the results are not new.

However the comment is re-issued after just one test has failed, while
other retests are still running and so their results in the table are
stale. The comment makes it look like all of the tests have run again and
failed again, when actually only one did.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

I for one would still like this to be fixed. Very confusing.
/remove-lifecycle stale

+1, i think we can do better. I'd also love to see a direct link to the historical test runs included in the comment. Right now I always struggle to navigate there.

I'd also love to see a direct link to the historical test runs included in the comment.

We already include the history of the PR into the comment linked as Full PR test history. Are you thinking of something else?

We already include the history of the PR into the comment linked as Full PR test history. Are you thinking of something else?

whoops, sorry, no that's exactly what i was looking for.

Seems like the messaging could be made clearer, though!

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

/remove-lifecycle rotten
/reopen
this is still true and still confuses developers

@BenTheElder: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen
this is still true and still confuses developers

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/remove-lifecycle rotten
/reopen

@BenTheElder: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings