Odo: Improve the test run time on PRs

Created on 18 Jun 2020  ·  39Comments  ·  Source: openshift/odo

  • we are considering running all tests on later stages of the PRs
  • we are considering running the travis CI and latest version openshift tests in the PR and rest in the periodic job
  • we are considering running the e2e tests ( not integration tests ) in the periodic jobs and omit them from the PR tests altogether - this was done in https://github.com/openshift/odo/issues/2931
aretesting lifecyclrotten prioritLow

Most helpful comment

IMO we should run the pr test on latest and upcoming releases of openshift because in case we hit any test failure on the upcoming release then it will be easy for us to fix it in PR level without waiting for periodic job to verify it. And ofcourse the PR should not break the latest release. So, These two cluster version we should consider in pr level.

Not sure if we should care if odo is failing on OpenShift that is not released yet. It might be failing due to some issues in OpenShift, after all it is not stable OpenShift release, we won't be able to fix it and it will block our PR queue.

Testing against pre-released version nightly is ok

All 39 comments

cc @metacosm

This should be worked on as quickly as possible, in my opinion as it would help speed up the rest of the development.

we are considering running all tests on later stages of the PRs

@metacosm This deserves a long description, so that we can analyze the requirement and ofcourse we need to check it (if complicated) with test platform team as our pr run is controlled by the prow bot.

we are considering running the travis CI and latest version openshift tests in the PR and rest in the periodic job

Just want to understand, what is the motive here ? Test time optimization ? it won't help much because CI takes almost the same time for spinning up and running test against each version of cluster. If we are looking for AWS resource optimization then yes, number of AWS resource will be reduced.

we are considering running the e2e tests ( not integration tests ) in the periodic jobs and omit them from the PR tests altogether

+1, but we will keep both test type until we make sure that our test does not hit flakes.

The motivation is to move faster PRs faster from submitted to merged as well as reduce resources usage.

Right now, lots of time/resources is spent on running tests that are not needed: integration tests should only be run when a PR is approved and considered ready to be merged by reviewers. There is no point in running all the tests while the review process is not finished and only unit tests / smoke tests should be run during that period.

This would have the following benefits:

  • reduce resource usage by not running tests that are not needed
  • free resources for PRs that require these tests to be run (i.e. the ones that we attempt to merge)
  • reduce the noise in the PR review loop by removing test output that no one cares about
  • hopefully keep people engaged in PRs instead of letting them unattended for too long because some tests are failing

Another aspect of this is obviously the unbelievably high occurrence of flakes, which should also be addressed. If a test cannot reliably run then it is useless because if it fails, you don't know if it's positive signal of an issue with the code and if it passes, you're not sure it somehow worked when maybe it shouldn't have…

@metacosm So basically we are looking for flow like

Screen Shot 2020-07-09 at 4 20 49 PM

I need to check with platform team if this is possible with pre submit test type that we are using, otherwise we explore other possibilities.

It is highly unlikely that such a workflow will work. Better to confirm with test platform team
/cc @openshift/openshift-team-developer-productivity-test-platform

Assuming it does not, we could also look to split minimum tests (ut + int) into presubmit and have e2e or maybe even all run as post submit. Doing so might also allow us to reliably increase the delay in periodic tests as well

Also how about removing all but the latest and latest-1 version of cluster tests from pr tests and have the other versions in periodic

No it is not possible to have tests that only get triggered after approval and are a merge requirement. I personally also don't think it makes sense, because you are wasting your reviewers time by having them review something for which CI can tell you that its not going to work.

Well I guess we are going with what I recommended then, splitting the tests across presubmit, postsubmit and periodic.

wdyt @alvaroaleman

Yeah, sounds like a good idea, you just need to make sure that someone looks at them and acts if there are failures, otherwise its pretty much useless

@alvaroaleman a review should not be contingent on test results so I don't see that as a waste of the reviewer's time. The converse is also true: it doesn't make sense to test something that will be changed because of reviews that's why, imo, only unit tests should always be run before a PR is approved.

Tests have little associated costs, humans have very high associated cost so running a couple more tests if that saves human time is always worth it. Happy for you if you have the capacity to do so many reviews, I rarely look at PRs that have failing tests.

It would be conceptionally very hard to add something like this to Prow and it opens up a nice set of failure modes (how to handle test failures on approved PRs? We currently assume they are flakes and will retest forever).

While I agree with you in principle, that's just not true for odo. The integration tests take several hours to complete and are highly unreliable (lots of flakes) so if you're waiting for tests to pass to perform a review, you will never review anything… :)

I'm not saying that this should be added to Prow, just saying that the odo tests should be set up in such a way that only unit tests (which should cover enough to get a good idea of the soundness of the PR) are run until the PR is approved. If integration tests fail, then the onus is on the author to get them fixed, possibly asking for another review afterwards if needed.

Finally, failing tests pollute the review with lots of useless messages making the review process harder than it needs to be.

Nice little discussion there @metacosm @alvaroaleman. I will need to think about this before putting my points :)

However, there is another way we of splitting testing that could improve testing time while avoiding invisibility to an extent.

First, we should probably remove all but latest and latest-1 openshift clusters from pre submit with complete testing relegated to post submit and periodic.

One we have done that we will free up a bunch of clusters we use which could then be used to split tests.

For eg we can do int-1 and int-2 which will run half and half of the integration tests for both versions. So we can end up with 1 test that does unit and then 2 for integration and 1 for e2e per cluster. We can split more of course depending on size of tests and resource availability. Wdyt?

I think the cluster provision time is a constant that will happen once across the board.

That aside the fact does remain that the cleanest way to increase test speed here is to - eliminate provision unless you are testing something that needs it, but that will likely need cluster pooling of some sort which comes with its own associated costs.

The fact remains that the biggest time consuming part is the cluster provision witch takes atleast 40-45 mins on good days.

However, there is another way we of splitting testing that could improve testing time while avoiding invisibility to an extent.
First, we should probably remove all but latest and latest-1 openshift clusters from pre submit with complete testing relegated to post submit and periodic.

+1, this same ideas has been discussed in one of the weekly standup call for reducing the number of cluster and will use the same test type as it is.

@mohammedzee1000 But for pr testing post submit job is not the best fit IMO because it will unnecessarily increase the effort. For example all the time there will be a fear of broken master code and it will increase the consistent monitoring cost on master after each pr we merge through periodic job.

One we have done that we will free up a bunch of clusters we use which could then be used to split tests.
For eg we can do int-1 and int-2 which will run half and half of the integration tests for both versions. So we can end up with 1 test that does unit and then 2 for integration and 1 for e2e per cluster. We can split more of course depending on size of tests and resource availability. Wdyt?

yes, we can go with this. It will reduce the test time upto 50%.

That aside the fact does remain that the cleanest way to increase test speed here is to - eliminate provision unless you are testing something that needs it, but that will likely need cluster pooling of some sort which comes with its own associated costs.

It's a long term goal, may be we need to create an separate jira to track this.

I am creating the task list

  • [ ] Use only two cluster version (latest release and upcoming release) for pr validation
  • [ ] Configure the rest supported cluster version for periodic run
  • [ ] Split the test for pr validation. Run half of test on one cluster version and other half on other cluster.
  • [ ] Use only two cluster version (latest release and upcoming release) for pr validation

+1 I would even consider just using the latest release

  • [ ] Split the test for pr validation. Run half of test on one cluster version and other half on other cluster.

-1 this makes me a little bit worried I would rather run a full test suite on each cluster.

There should be one additional point:

  • [ ] Clean up integrations tests - remove duplicated tests, optimize tests, minimalize external dependencies

I would also add: investigate and fix or remove unstable tests (flakes) :)

  • [ ] Use only two cluster version (latest release and upcoming release) for pr validation

+1 I would even consider just using the latest release

  • [ ] Split the test for pr validation. Run half of test on one cluster version and other half on other cluster.

@amitkrout @kadel I did not mean that. I meant doing all the tests on every version that we test but splitting them as instances like test1-4.5 runs 1 half and test2-4.5 runs second half for example

-1 this makes me a little bit worried I would rather run a full test suite on each cluster.

There should be one additional point:

  • [ ] Clean up integrations tests - remove duplicated tests, optimize tests, minimalize external dependencies

test1-4.5 runs 1 half and test2-4.5 runs second half for example

@mohammedzee1000 do you mean we need two 4.5 cluster to in CI, right ?

Yes one running one half and other running second half.

Don't worry we will get some clusters back by not testing a version or 2.

While this may not balance things out, it will still be an overall win

We could do for eg 4.5 and 4.4 or even just 4.5 in regular pr ci

But split tests for what we test, thereby achieving a semblance of parallelism. Remember cluster provision will be constant time

I still think that integration tests should only be run after the PR is approved by default, though it should be possibly to manually trigger such a run if needed on a case by case basis.

IMO we should run the pr test on latest and upcoming releases of openshift because in case we hit any test failure on the upcoming release then it will be easy for us to fix it in PR level without waiting for periodic job to verify it. And ofcourse the PR should not break the latest release. So, These two cluster version we should consider in pr level.

As part of CI time is concerned I feel we should not compromise the pr test verification in the name of reducing the CI time. Yes, we are optimising the resources while running the pr. May be less resource utilisation speeds up our CI timing to some extent.

So the task associated with this issue I am finalising with considering the suggestion and concerns discussed above

Clean up integrations tests Individually - remove duplicated tests, minimize external dependencies

AFAIU test optimization is a kind of clean up thing, organizing the test in a proper way and ofcourse removing the duplicate steps. Anything else we are looking for optimization then?

NOTE: As part of adoc decision is concerned we can pick corresponding task in the same issue once it is finalised.

@prietyc123 Note that I'm not asking for integration tests to not run at all on PRs. I just think that there is no point in running said tests before a PR is approved because the code might (and, more often than not, does) change during the review process. What I'm asking is for integration tests to only run by default once the PR is approved (i.e. it has gone through the review process). This has not much to do with CI time, more about not wasting resources running tests which results are going to be ignored when some other PRs (the approved ones) might be waiting on these resources to be merged. This would also help the review process by not polluting the stream with useless CI information that will be ignored.

@prietyc123 Note that I'm not asking for integration tests to not run at all on PRs. I just think that there is no point in running said tests before a PR is approved because the code might (and, more often than not, does) change during the review process. What I'm asking is for integration tests to only run by default once the PR is approved (i.e. it has gone through the review process). This has not much to do with CI time, more about not wasting resources running tests which results are going to be ignored when some other PRs (the approved ones) might be waiting on these resources to be merged. This would also help the review process by not polluting the stream with useless CI information that will be ignored.

@metacosm Yes, I understand. The concern/flow https://github.com/openshift/odo/issues/3385#issuecomment-656058081 you were talking is one of the task of this issue and its resolution is not possible AFAIU from the above thread, however I will file pr for rest of the task as mentioned in the issue description.

I can find a thread in testplatform by @amitkrout asking the same question as per your latest comment
Copy paste from the channel

I want each new pr will trigger the CI run only if it has /approve label. May be I can review the code and
apply /lgtm label but the pr will be triggered only when it finds approve label. Is it possible through any
configuration file ?

May be later he will record the conversation here in this issue.

IMO we should run the pr test on latest and upcoming releases of openshift because in case we hit any test failure on the upcoming release then it will be easy for us to fix it in PR level without waiting for periodic job to verify it. And ofcourse the PR should not break the latest release. So, These two cluster version we should consider in pr level.

Not sure if we should care if odo is failing on OpenShift that is not released yet. It might be failing due to some issues in OpenShift, after all it is not stable OpenShift release, we won't be able to fix it and it will block our PR queue.

Testing against pre-released version nightly is ok

@metacosm Yes, I understand. The concern/flow #3385 (comment) you were talking is one of the task of this issue and its resolution is not possible AFAIU from the above thread, however I will file pr for rest of the task as mentioned in the issue description.

I don't know the test infrastructure at all so maybe what I'm suggesting is indeed not feasible if, for example, the bot that handles lgtm/approve label is tied to the testing infrastructure. Otherwise, another option would be to have a bot specifically used to trigger tests based on different conditions. I have no idea how complex that would be to put in place, though…

Otherwise, another option would be to have a bot specifically used to trigger tests based on different conditions. I have no idea how complex that would be to put in place, though…

This needs good amount of research. Lets plan it after v2

The task associated with this issue:

Clean up integrations tests Individually for regular and devfile - remove duplicated tests, minimize external dependencies

I don't know the test infrastructure at all so maybe what I'm suggesting is indeed not feasible if, for example, the bot that handles lgtm/approve label is tied to the testing infrastructure. Otherwise, another option would be to have a bot specifically used to trigger tests based on different conditions. I have no idea how complex that would be to put in place, though…

As far as I know prow doesn't support that. We would have to write our own plugin to achieve that.

Writing a custom Prow plugin should be pretty simple, you can listen to the events on PRs you care about and comment with /test <name> to trigger things. The PRs will sit around, requiring tests that have not yet run, until that happens, though.

I don't know the test infrastructure at all so maybe what I'm suggesting is indeed not feasible if, for example, the bot that handles lgtm/approve label is tied to the testing infrastructure. Otherwise, another option would be to have a bot specifically used to trigger tests based on different conditions. I have no idea how complex that would be to put in place, though…

As far as I know prow doesn't support that. We would have to write our own plugin to achieve that.

Writing a custom Prow plugin should be pretty simple, you can listen to the events on PRs you care about and comment with /test to trigger things. The PRs will sit around, requiring tests that have not yet run, until that happens, though.

I am lost on what was the ask and what are we heading to

@stevekuznetsov The initial ask was we don't want to run our CI after the pr is triggered but CI can run only if /approve label is applied. I understand that through plugin you can control the job run but does it help us to hold the CI run when the the pr is triggered for first time ? Does it has side effect like after each commit i need to apply the label /test <name> to trigger the job.

I don't know the test infrastructure at all so maybe what I'm suggesting is indeed not feasible if, for example, the bot that handles lgtm/approve label is tied to the testing infrastructure. Otherwise, another option would be to have a bot specifically used to trigger tests based on different conditions. I have no idea how complex that would be to put in place, though…

As far as I know prow doesn't support that. We would have to write our own plugin to achieve that.

Writing a custom Prow plugin should be pretty simple, you can listen to the events on PRs you care about and comment with /test to trigger things. The PRs will sit around, requiring tests that have not yet run, until that happens, though.

I am lost on what was the ask and what are we heading to

@stevekuznetsov The initial ask was we don't want to run our CI after the pr is triggered but CI can run only if /approve label is applied. I understand that through plugin you can control the job run but does it help us to hold the CI run when the the pr is triggered for first time ? Does it has side effect like after each commit i need to apply the label /test <name> to trigger the job.

@stevekuznetsov @amitkrout @mohammedzee1000 Can you please share the Prow plugin documentation.

we are considering running the travis CI and latest version openshift tests in the PR and rest in the periodic job
 we are considering running the e2e tests ( not integration tests ) in the periodic jobs and omit them from the PR tests altogether

This was done in #2931 👍

  • we are considering running all tests on later stages of the PRs

To be honest I don't think that this part is that important right now.
Our biggest problem is test stability.
We need to invest some time in fixing flaky tests!
I have a feeling that we are just figuring out new ways how to avoid properly fixing our tests.

/priority low
/remove-priority high

we are considering running the travis CI and latest version openshift tests in the PR and rest in the periodic job - done in #2931

Executing only the latest version of openshift CI in the PR and rest in periodic jobs are done with this issue itself PR - openshift/release#10312 and openshift/release#10346

Test clean up still in progress and working on it individually. Tests covered till now

- create test: test-cmd-devfile-create #3618
- debug test: #3636
- catalog test: #3642
- push test: #3645

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings