Currently, the E2E tests job on Travis takes ~20 minutes, with several flaky / failing tests that take a long time to time out.
https://travis-ci.org/ampproject/amphtml/jobs/524602478#L1541
Two questions:
Even though the E2E tests job is non-blocking when it fails, it still delays PR jobs by about 10 minutes each because Travis doesn't return a status to Github until all jobs complete (whether or not they are successful).
/to @cvializ @estherkim
Happy to take a look. With regards to the job delaying the overall PR build status, Travis's fast_finish: true setting should have been preventing that. I'm tracking an issue about this on Travis's side, but it could also be something on our side with how the GitHub check is implemented. Is there any way I could look at that?
Interesting. I'm sure you're right about the fast_finish mechanism not kicking in correctly. This likely has nothing to do with GitHub integration, since Travis has full control over when it changes a check's status from "in progress" to "complete".
Meanwhile, I think we should prevent failing E2E tests from running for too long. I chatted with @cvializ about changing the timeout from 20 seconds to 10, but I'll leave it to you to decide if that's the best way forward.
IMO the best way to speed up E2E tests is to change the way tests fail. Currently we use implicit waits with our custom await expect() so that all tests that are bound to fail have to timeout. I'd like to take that out and use explicit waits where needed, but it may require a bit of refactoring.
By explicit waits, do you mean something like await sleep(5000) // dialog takes at most this long to appear? I would caution against explicit waits of that kind, because it will always wait for 5000ms even if the value it waits for changes faster, and if the test fails it still takes exactly that long to fail.
The E2E tests currently have two levels of timeouts: 5000ms for finding elements, and an overall test timeout that I believe is currently 20000ms. @rsimha and I discussed decreasing to 10000ms. I would say we can add default timeouts or optional timeout length parameters to FunctionalTestController methods to allow them to timeout individually, or to the expect function. We should prefer those over adding explicit waits as described above.
I don't think we need the overall test timeout at all because for most cases, an element condition won't change long after user interaction occurs. It doesn't make sense for expect to poll an element if it won't change.
There's the case where element conditions do change, like in carousel autoadvance, which is where I think we should explicitly wait. Not await sleep(5000) but something like...
driver.wait(until.element-condition-that-signifies-end-of-automation, 5000);
expect(await driver.findElement(element-condition-that-you-expect-at-end-of-automation));
which uses the optional timeout parameter like you described for individual tests. The explicit wait is more verbose but I think it's easier to read/debug/maintain, and then we get fast failures and faster test jobs.
Interesting, I could get behind not relying on an overall test timeout, since the individual timeouts can be effective on their own and could fail after their short timeout instead of one long one.
The current implementation of implicit waits already calls driver.wait, and this is how expect "polls". It's not creating any setIntervals and polling in the usual sense, it's polling how Selenium polls with driver.wait. And there isn't anything special about until, since until.js creates Condition instances to pass to driver.wait, which we do the same in expectCondition and waitFor.
Selenium until creating a Condition instance:
https://github.com/SeleniumHQ/selenium/blob/2cba710ab06c88af8d4a64db6c44e339e42135ba/javascript/node/selenium-webdriver/lib/until.js#L131-L139
SeleniumWebdriverController creating a Condition instance
https://github.com/ampproject/amphtml/blob/22928a784372db988f238e08ea7e3e2e7f5f9a64/build-system/tasks/e2e/selenium-webdriver-controller.js#L41-L48
SeleniumWebdriverController calling driver.wait
https://github.com/ampproject/amphtml/blob/22928a784372db988f238e08ea7e3e2e7f5f9a64/build-system/tasks/e2e/selenium-webdriver-controller.js#L60-L68
Your example above duplicates the condition element-condition-that-you-expect-at-end-of-automation, when the expectation already contains it. The SeleniumWebdriverController code passes the condition from the expectation to the waitFor function without duplicating it.
I can understand wanting to remove the indirection from the framework, but my concern is that we are going to end up with lots of custom wait for x to happen logic in each test like we currently have in our integration tests, making them difficult to read debug and maintain. The goal is to trap more complexity in the framework so that the tests themselves are as simple and consistent as possible.
The two conditions in my example above are different and I think it's important to keep them separate. The first determines that animation is complete. The second is an assertion specific to your test. The second condition being true does not mean the first is true as well, and I don't think we should make that assumption.
We benefit by keeping them separate - you can see if your test fails in the animation or in the assertion. It's more helpful than a generic timeout error.
Also by taking away the asynchronous part of our assertions, the tests themselves become modular, and an async error from a previous test doesn't affect the next one. Currently I see examples of bleed over where a passing test's beforeEach will time out if the test before it fails.
Agreed that we don't want a lot of custom waits in our tests. Thoughts on above? Maybe others can chime in? @sparhami @cathyxz @rsimha
edit - animation, not automation!
I think I disagree with the statement that the two conditions are actually separate. From my perspective, waiting for the end of the animation is a proxy for the expectation being true. Let me make sure I understand the example correctly:
For 1 I feel that the animation is a proxy, and for 2 I feel that if the value changes between the expected and not-the-expected, the value being tested is not a good value to test against, and a different one should be chosen. If one is not naturally available, a classname for testing should be added to verify the behavior is correct. This could be resolved through documentation for test writing.
I agree, Promises failing in a test that already passed shouldn't fail new tests. Bleed-over between tests seems more like a bug with how e2e/describes.js is implemented and not a problem with the controller framework. Let's sync on that when you have time @estherkim so we can walk through the examples you've seen.
Chime-ins welcome! Let's make a strong test framework!! ✅🔍🐛👞
Revisiting this, I'd like to add that I would support the FunctionalTestController API having a waitFor or similarly named method that is framework-agnostic, and writing a small-to-medium set of tests with it and monitoring its flakiness compared to existing tests. That way we can evaluate the performance of separate-vs-combined waiting mechanisms and make an educated choice.
I think waitFor would be okay, I just don't want to give up the readability / power of the current assertions we have. For example, a lot of the carousel tests do something like:
await expect(controller.getElementRect(lastSlide)).to.include({
'x': 0,
});
If this times out, I currently get a very clear message on what failed and why (what the value of x was). I don't want to write custom logic like:
driver.wait(() => {
// some custom JS to check the rect
});
I think the current pattern is very readable and easy to be consistent with, so I would like to maintain parity with that. I don't see a huge difference if we had:
await driver.wait(until(controller.getElementRect(lastSlide)).includes({
'x': 0,
});
and it printed out an informative error message with what failed and why on timeout. I feel like we would end up re-implementing a lot of what Chai does (maybe we could leverage it?).
One minor concern would be that while people already know Chai, and it is documented, doing our own thing would probably lead to tests that are less well written and more ad-hoc, with a higher learning curve.
But I think the gains from changing things will be minimal (unless I am misunderstanding something). You would only save at most 5 seconds when a test times out (assuming the tests did the equivalent thing, where you wait for all the same async things). In any case, tests shouldn't be failing or timing out, so if they are, we should fix that.
Piling on to this discussion with my 2¢:
I think at a high level, a testing library of this sort should try to ensure three things:
1) No test step should wait longer than it needs to. Based on the discussion that ended with https://github.com/ampproject/amphtml/issues/21986#issuecomment-490163013, it seems like the plan is to write a bunch of waitFor* functions that implicitly wait for different conditions to become true, trivially succeed if the condition is already true, and return a success or failure, correct?
2) When a test fails, the reason should be clear from the Travis logs. This could be done in the waitFor* function by re-asserting on the condition when a timeout is detected, so that the reason for failure returned is the actual condition. Seems like this is covered by https://github.com/ampproject/amphtml/issues/21986#issuecomment-490206787.
3) Avoid the need for copy-pasting code across lots of tests. This could be done through a combination of two things:
a) If a waitFor* function is generic enough to be used across multiple test suites, add it to the library.
b) If it is restricted to a single suite, then define it as a helper at the top of the suite, and reuse it across different tests.
With this, tests will be easy to write, debug, and maintain.
Thanks for working on this!
@estherkim Bumping this with a request for two quick fixes while the deeper issues discussed above are worked on:
skipOnTravis() config, so the failing tests can be run locally.)@rsimha Ok to close?
@rsimha Ok to close?
Yep! #22477 can close out this one.