Amphtml: Greenify / speed up E2E tests on Travis

Created on 25 Apr 2019 · 15Comments · Source: ampproject/amphtml

Currently, the E2E tests job on Travis takes ~20 minutes, with several flaky / failing tests that take a long time to time out.

https://travis-ci.org/ampproject/amphtml/jobs/524602478#L1541

Two questions:

From what we can tell, is the list of failing tests deterministic?
While this is worked on, can we skip broken tests to speed up Travis?

Even though the E2E tests job is non-blocking when it fails, it still delays PR jobs by about 10 minutes each because Travis doesn't return a status to Github until all jobs complete (whether or not they are successful).

Bug

Source

rsimha

All 15 comments

/to @cvializ @estherkim

rsimha on 25 Apr 2019

Happy to take a look. With regards to the job delaying the overall PR build status, Travis's fast_finish: true setting should have been preventing that. I'm tracking an issue about this on Travis's side, but it could also be something on our side with how the GitHub check is implemented. Is there any way I could look at that?

estherkim on 29 Apr 2019

Interesting. I'm sure you're right about the fast_finish mechanism not kicking in correctly. This likely has nothing to do with GitHub integration, since Travis has full control over when it changes a check's status from "in progress" to "complete".

Meanwhile, I think we should prevent failing E2E tests from running for too long. I chatted with @cvializ about changing the timeout from 20 seconds to 10, but I'll leave it to you to decide if that's the best way forward.

rsimha on 29 Apr 2019

IMO the best way to speed up E2E tests is to change the way tests fail. Currently we use implicit waits with our custom await expect() so that all tests that are bound to fail have to timeout. I'd like to take that out and use explicit waits where needed, but it may require a bit of refactoring.

estherkim on 1 May 2019

👍1

By explicit waits, do you mean something like await sleep(5000) // dialog takes at most this long to appear? I would caution against explicit waits of that kind, because it will always wait for 5000ms even if the value it waits for changes faster, and if the test fails it still takes exactly that long to fail.

The E2E tests currently have two levels of timeouts: 5000ms for finding elements, and an overall test timeout that I believe is currently 20000ms. @rsimha and I discussed decreasing to 10000ms. I would say we can add default timeouts or optional timeout length parameters to FunctionalTestController methods to allow them to timeout individually, or to the expect function. We should prefer those over adding explicit waits as described above.

cvializ on 1 May 2019

I don't think we need the overall test timeout at all because for most cases, an element condition won't change long after user interaction occurs. It doesn't make sense for expect to poll an element if it won't change.

There's the case where element conditions do change, like in carousel autoadvance, which is where I think we should explicitly wait. Not await sleep(5000) but something like...

driver.wait(until.element-condition-that-signifies-end-of-automation, 5000);
expect(await driver.findElement(element-condition-that-you-expect-at-end-of-automation));

which uses the optional timeout parameter like you described for individual tests. The explicit wait is more verbose but I think it's easier to read/debug/maintain, and then we get fast failures and faster test jobs.

estherkim on 1 May 2019

Interesting, I could get behind not relying on an overall test timeout, since the individual timeouts can be effective on their own and could fail after their short timeout instead of one long one.

The current implementation of implicit waits already calls driver.wait, and this is how expect "polls". It's not creating any setIntervals and polling in the usual sense, it's polling how Selenium polls with driver.wait. And there isn't anything special about until, since until.js creates Condition instances to pass to driver.wait, which we do the same in expectCondition and waitFor.

Selenium until creating a Condition instance:
https://github.com/SeleniumHQ/selenium/blob/2cba710ab06c88af8d4a64db6c44e339e42135ba/javascript/node/selenium-webdriver/lib/until.js#L131-L139

SeleniumWebdriverController creating a Condition instance
https://github.com/ampproject/amphtml/blob/22928a784372db988f238e08ea7e3e2e7f5f9a64/build-system/tasks/e2e/selenium-webdriver-controller.js#L41-L48

SeleniumWebdriverController calling driver.wait
https://github.com/ampproject/amphtml/blob/22928a784372db988f238e08ea7e3e2e7f5f9a64/build-system/tasks/e2e/selenium-webdriver-controller.js#L60-L68

Your example above duplicates the condition element-condition-that-you-expect-at-end-of-automation, when the expectation already contains it. The SeleniumWebdriverController code passes the condition from the expectation to the waitFor function without duplicating it.

I can understand wanting to remove the indirection from the framework, but my concern is that we are going to end up with lots of custom wait for x to happen logic in each test like we currently have in our integration tests, making them difficult to read debug and maintain. The goal is to trap more complexity in the framework so that the tests themselves are as simple and consistent as possible.

cvializ on 2 May 2019

The two conditions in my example above are different and I think it's important to keep them separate. The first determines that animation is complete. The second is an assertion specific to your test. The second condition being true does not mean the first is true as well, and I don't think we should make that assumption.

We benefit by keeping them separate - you can see if your test fails in the animation or in the assertion. It's more helpful than a generic timeout error.

Also by taking away the asynchronous part of our assertions, the tests themselves become modular, and an async error from a previous test doesn't affect the next one. Currently I see examples of bleed over where a passing test's beforeEach will time out if the test before it fails.

Agreed that we don't want a lot of custom waits in our tests. Thoughts on above? Maybe others can chime in? @sparhami @cathyxz @rsimha

edit - animation, not automation!

estherkim on 2 May 2019

I think I disagree with the statement that the two conditions are actually separate. From my perspective, waiting for the end of the animation is a proxy for the expectation being true. Let me make sure I understand the example correctly:

1. the hypothetical test we're considering is an animation or other ongoing action that needs to complete, and where a separate value needs to be verified afterward.
1. the value to be verified may change over the course of the animation where it matches and does not match the expectation one or more times, and the end value is the only one that's important

For 1 I feel that the animation is a proxy, and for 2 I feel that if the value changes between the expected and not-the-expected, the value being tested is not a good value to test against, and a different one should be chosen. If one is not naturally available, a classname for testing should be added to verify the behavior is correct. This could be resolved through documentation for test writing.

I agree, Promises failing in a test that already passed shouldn't fail new tests. Bleed-over between tests seems more like a bug with how e2e/describes.js is implemented and not a problem with the controller framework. Let's sync on that when you have time @estherkim so we can walk through the examples you've seen.

Chime-ins welcome! Let's make a strong test framework!! ✅🔍🐛👞

cvializ on 3 May 2019

Revisiting this, I'd like to add that I would support the FunctionalTestController API having a waitFor or similarly named method that is framework-agnostic, and writing a small-to-medium set of tests with it and monitoring its flakiness compared to existing tests. That way we can evaluate the performance of separate-vs-combined waiting mechanisms and make an educated choice.

cvializ on 7 May 2019

I think waitFor would be okay, I just don't want to give up the readability / power of the current assertions we have. For example, a lot of the carousel tests do something like:

    await expect(controller.getElementRect(lastSlide)).to.include({
      'x': 0,
    });

If this times out, I currently get a very clear message on what failed and why (what the value of x was). I don't want to write custom logic like:

driver.wait(() => {
  // some custom JS to check the rect
});

I think the current pattern is very readable and easy to be consistent with, so I would like to maintain parity with that. I don't see a huge difference if we had:

await driver.wait(until(controller.getElementRect(lastSlide)).includes({
  'x': 0,
});

and it printed out an informative error message with what failed and why on timeout. I feel like we would end up re-implementing a lot of what Chai does (maybe we could leverage it?).

One minor concern would be that while people already know Chai, and it is documented, doing our own thing would probably lead to tests that are less well written and more ad-hoc, with a higher learning curve.

But I think the gains from changing things will be minimal (unless I am misunderstanding something). You would only save at most 5 seconds when a test times out (assuming the tests did the equivalent thing, where you wait for all the same async things). In any case, tests shouldn't be failing or timing out, so if they are, we should fix that.

sparhami on 7 May 2019

Piling on to this discussion with my 2¢:

I think at a high level, a testing library of this sort should try to ensure three things:
1) No test step should wait longer than it needs to. Based on the discussion that ended with https://github.com/ampproject/amphtml/issues/21986#issuecomment-490163013, it seems like the plan is to write a bunch of waitFor* functions that implicitly wait for different conditions to become true, trivially succeed if the condition is already true, and return a success or failure, correct?
2) When a test fails, the reason should be clear from the Travis logs. This could be done in the waitFor* function by re-asserting on the condition when a timeout is detected, so that the reason for failure returned is the actual condition. Seems like this is covered by https://github.com/ampproject/amphtml/issues/21986#issuecomment-490206787.
3) Avoid the need for copy-pasting code across lots of tests. This could be done through a combination of two things:
a) If a waitFor* function is generic enough to be used across multiple test suites, add it to the library.
b) If it is restricted to a single suite, then define it as a helper at the top of the suite, and reuse it across different tests.
With this, tests will be easy to write, debug, and maintain.

Thanks for working on this!

rsimha on 7 May 2019

@estherkim Bumping this with a request for two quick fixes while the deeper issues discussed above are worked on:

Fix the bug in push builds where the E2E tests are downloading minified build output, but serving unminified code. (Minified code is generally faster. If the E2E tests don't work on minified code, use unminified code for now.)
I'm now seeing the same ~6 failures during PR builds. Let's skip these tests and make the E2E tests a blocking check. (If needed, add a skipOnTravis() config, so the failing tests can be run locally.)

rsimha on 15 May 2019

@rsimha Ok to close?

estherkim on 24 May 2019