Following the initiative in #1865 to move our CI jobs to GitHub Actions, the next biggest gain we can make is by moving our E2E tests over.
_Do not alter or remove anything below. The following sections will be managed by moderators only._
.travis.ymlcancel.yml workflow by adding the workflow file names of each E2E workflow to the list provided to workflow_id .github/workflows for each E2E jobe2e-test-wp-latest.ymle2e-test-wp-oldest.ymle2e-test-wp-4-9-gutenberge2e-test-wp-nightly.github/workflows/zips.yml for the closest to what we use in E2E) based on the current .travis.ymlnpm cinpm run build:testnpm run env:startnpm run test:e2eworkflow_idtest:e2e:ci command from package.jsontest:e2e:ci but this command actually isn't necessary because wp-scripts test-e2e runs them with runInBand already so this command is redundant_WordPress nightly test is allowed to failRename .github/workflows/visual-regression.yml to tests.yml
Add a new set of E2E jobs which all depend on build-e2e ...
I would suggest keeping all these workflows in separate files to be able to re-run it separately. So, instead of having one giant YAML file, we will have five files: vrt, wp-4.7, wp-4.9, wp-latest, and wp-nightly.
I know it's a kind of burden to maintain the same setup across all these files, but it's a one-time job: once configured correctly, we won't need to get back to it. On the other side, it will give us additional flexibility to skip unnecessary steps or configure the environment differently.
Note: this may be possible to extract to a reusable partial(s) using YAML features like anchors and map merging to avoid duplicating in many jobs
As far as I know, it is not supported by GitHub yet. So, let's remove this from IB.
run the E2E commands:
npm run env:start
I think we should use the services feature of GitHub actions to set up MySQL and WordPress containers for e2e tests. Once again it seems like additional stuff to maintain, but it just needs to be configured once and we can call it a day. Using services is faster than re-creating environments from scratch for every e2e job.
I would suggest keeping all these workflows in separate files to be able to re-run it separately
That's something I've been thinking about too and is one of the big features that GHA currently lacks. My thought was that many of the tests could share artifacts/cache between jobs, or depend on faster running tests to avoid running long jobs on a commit which already failed other workflows. I'm not sure we should have them split into individual workflows though ๐ค That would offer the most flexibility in re-rerunning specific runs, but I'm not sure if that's something we should optimize for. More often than not we don't have to rerun any (ideally we'd fix the instability since it's only a test or two which require us to re-run them at all ๐ ), so I think a single workflow with all the jobs would be ideal for the average case.
As far as I know, it is not supported by GitHub yet. So, let's remove this from IB.
Unless we can use a matrix for the differences between jobs, I think it would be well worth using this (if possible of course). Regarding the capability, I can test this pretty easily with a separate repo and follow-up here.
I think we should use the services feature of GitHub actions to set up MySQL and WordPress containers for e2e tests. Once again it seems like additional stuff to maintain, but it just needs to be configured once and we can call it a day. Using services is faster than re-creating environments from scratch for every e2e job.
I think this is a good idea, particularly for mysql since this is the one we have to wait for more than the others. However, this only exposes the service via the network so using things like wp-cli in our env setup script wouldn't be possible on the wordpress service container. Since the E2E setup involves things like installing the AMP plugin or Gutenberg with different versions, we can't simply use a database dump to speed things up if that's what you were thinking. As most of the time spent in starting the environment is in the provisioning script, how were you thinking of using services to improve this?
Note: this may be possible to extract to a reusable partial(s) using YAML features like anchors and map merging to avoid duplicating in many jobs
As far as I know, it is not supported by GitHub yet.
how were you thinking of using services to improve this?
I'll create a POC to show it.
I think we _should_ be able to use a matrix to define differing variables between each E2E test. If we go with the "all jobs in one file" we should use a matrix for parallelism anyway.
I'm torn on the "each file for each type of E2E" test approach. I like the idea that everything is in one "E2E Test" job/section we could easily see from PRs. Being able to click on it for more info (when you want to see which one failed) is fine if they're consistent, but we don't need those in every PR list by default.
But _if_ things remain as inconsistent as they've been on Travis we should move to the per-file approach, to ease our paint of doing restarts.
The eternal optimist, I'm desperately hopeful that moving away from Travis will fix some performance issues causing our inconsistent test results. Is it tough to try for the single-file approach first but move to multiple ones later if it's still bad? ๐คท๐ป
(I think IB is largely โ from me, but I'll let others weigh in, especially after Eugene's proof-of-concept. ๐๐ป )
If we go with the "all jobs in one file" we should use a matrix for parallelism anyway.
GHA runs them in parallel based on their job dependencies already, only steps are run in order. So if there was a common build step that they shared they would all wait for that like they do in the storybook or zip workflows but after that they would all run in parallel ๐
Being able to click on it for more info (when you want to see which one failed) is fine if they're consistent, but we don't need those in every PR list by default.
Each job will show up in the PR list with the status, same as it does now. In the checks section you'll be able to see the output from each job separately, even if they're all defined in the same workflow. (Easier to switch between as well)
The eternal optimist, I'm desperately hopeful that moving away from Travis will fix some performance issues causing our inconsistent test results.
I don't think that migrating to GHA will fix the instability we're seeing now. I think that has more to do with missing waits and race conditions which are possible as a result when scripting browser interactions 100x faster than a human can do them ๐
FWIW the instability is limited to a few specific tests, so in that regard it is consistent. For that reason, I'm hopeful we can fix the root cause.
Is it tough to try for the single-file approach first but move to multiple ones later if it's still bad?
It's not tough, it's just less ideal IMO. It may be a bit more verbose as much of the workflow definition will be the same which isn't great, but mostly because we wouldn't be able to leverage a shared build step because jobs can only depend on other jobs in the same workflow. Even if the overall job length isn't faster, I think we should leverage shared steps where possible to reduce concurrent jobs and total build time.
how were you thinking of using services to improve this?
I'll create a POC to show it.
Unfortunately, I haven't been able to create a solid PoC. :(
@aaemnnosttv @eugene-manuilov @tofumatt Regarding the file structure, I think for the additional flexibility and the ability to re-run tests more individually it makes sense to go with separate files per test. With only one file I'm pretty sure this would make day-to-day usage more painful because of the occasional failures. Today, if a test job fails, we run that one again. If we go with a single GH workflow for all tests, we would be required to run all again, which has a higher chance of randomly failing on something that previously already passed. So I'd suggest to use separate files, even if this means compromising on some performance benefits like reusing artifacts etc.
For the usage of services, I'd prefer to revisit this separately as a follow-up. Sticking to how our e2e tests currently work probably makes the transition smoother. Let's look at whether and how usage of services can improve this once we've migrated.
One thing that I think we need to add to the IB is to make sure we run e2e tests when a PR is ready for review (not in the draft status):
if: github.event.pull_request.draft == false
IB โ
@aaemnnosttv could you please take a look at this ticket? I can't make e2e tests pass in GHA :(
In the beginning, I tried to use our docker-compose approach, but all tests failed with timeout errors (see results here). Then I tried to install a new WP instance on the virtual machine itself using wp-cli commands but kept failing... You can see my changes in #2372.
Assigning back to you, maybe you can come up with a better idea.
QA โ
E2E tests running on actions, with nightly not required:
