Site-kit-wp: Move E2E tests to GitHub Actions

Created on 30 Oct 2020 · 12Comments · Source: google/site-kit-wp

Feature Description

Following the initiative in #1865 to move our CI jobs to GitHub Actions, the next biggest gain we can make is by moving our E2E tests over.

_Do not alter or remove anything below. The following sections will be managed by moderators only._

Acceptance criteria

All E2E jobs should be created in GitHub Actions using the same criteria for PHP version, WP version, and additional plugins used
- Each E2E job should have its own dedicated workflow so that each one can be restarted independently
All E2E jobs, and E2E-only variables, and script steps should be removed from .travis.yml
Previous runs of the workflow should be cancelled automatically by newer runs using our cancel.yml workflow by adding the workflow file names of each E2E workflow to the list provided to workflow_id

Implementation Brief

Add new workflow files to .github/workflows for each E2E job
- See here about setting environment variables for jobs or steps as the travis env format below needs to be updated to a yaml key/value format
- e2e-test-wp-latest.yml
  - name: E2E Tests (WordPress latest)
- e2e-test-wp-oldest.yml
  - name: E2E Tests (WordPress 4.7)
    
    env: E2E=1 WP_VERSION=4.7.19 AMP_VERSION=1.5.5
- e2e-test-wp-4-9-gutenberg
  - name: E2E Tests (WordPress 4.9, Gutenberg 4.9)
    
    env: E2E=1 WP_VERSION=4.9.16 GUTENBERG_VERSION=4.9.0
- e2e-test-wp-nightly
  
  _This job should be allowed to fail as it was on Travis. Since it will be in its own workflow, nothing special needs to happen for this, it only needs to not be marked as a required check in GitHub._
  - name: E2E Tests (WordPress nightly)
    
    env: E2E=1 WP_VERSION=nightly
- Note: WordPress versions for 4.7 and 4.9 jobs above have been updated to reference the latest patch versions available
Use a common template for setting up each test (see .github/workflows/zips.yml for the closest to what we use in E2E) based on the current .travis.yml
Basic outline of steps
- checkout
- setup composer cache
- composer install
- setup node using version defined in nvm
- setup npm cache
- npm ci
- npm run build:test
Run the E2E tests
- npm run env:start
- npm run test:e2e
Add the workflow file names of each E2E workflow to the list provided to workflow_id
https://github.com/google/site-kit-wp/blob/07d95eddab386c3c5b9f8d1dad1f123d7cb33dae/.github/workflows/cancel.yml#L23
Remove the test:e2e:ci command from package.json
_Travis used test:e2e:ci but this command actually isn't necessary because wp-scripts test-e2e runs them with runInBand already so this command is redundant_

Test Coverage

The new E2E jobs should be marked as required checks that must pass for a PR to be merged

Visual Regression Changes

QA Brief

E2E Tests should run successfully on GitHub Actions
- The WordPress nightly test is allowed to fail

Changelog entry

P2 Eng Rollover Enhancement

Source

aaemnnosttv

All 12 comments

Rename .github/workflows/visual-regression.yml to tests.yml

Add a new set of E2E jobs which all depend on build-e2e ...

I would suggest keeping all these workflows in separate files to be able to re-run it separately. So, instead of having one giant YAML file, we will have five files: vrt, wp-4.7, wp-4.9, wp-latest, and wp-nightly.

I know it's a kind of burden to maintain the same setup across all these files, but it's a one-time job: once configured correctly, we won't need to get back to it. On the other side, it will give us additional flexibility to skip unnecessary steps or configure the environment differently.

Note: this may be possible to extract to a reusable partial(s) using YAML features like anchors and map merging to avoid duplicating in many jobs

As far as I know, it is not supported by GitHub yet. So, let's remove this from IB.

run the E2E commands: npm run env:start

I think we should use the services feature of GitHub actions to set up MySQL and WordPress containers for e2e tests. Once again it seems like additional stuff to maintain, but it just needs to be configured once and we can call it a day. Using services is faster than re-creating environments from scratch for every e2e job.

eugene-manuilov on 30 Oct 2020

I would suggest keeping all these workflows in separate files to be able to re-run it separately

That's something I've been thinking about too and is one of the big features that GHA currently lacks. My thought was that many of the tests could share artifacts/cache between jobs, or depend on faster running tests to avoid running long jobs on a commit which already failed other workflows. I'm not sure we should have them split into individual workflows though 🤔 That would offer the most flexibility in re-rerunning specific runs, but I'm not sure if that's something we should optimize for. More often than not we don't have to rerun any (ideally we'd fix the instability since it's only a test or two which require us to re-run them at all 😄 ), so I think a single workflow with all the jobs would be ideal for the average case.

As far as I know, it is not supported by GitHub yet. So, let's remove this from IB.

Unless we can use a matrix for the differences between jobs, I think it would be well worth using this (if possible of course). Regarding the capability, I can test this pretty easily with a separate repo and follow-up here.

I think we should use the services feature of GitHub actions to set up MySQL and WordPress containers for e2e tests. Once again it seems like additional stuff to maintain, but it just needs to be configured once and we can call it a day. Using services is faster than re-creating environments from scratch for every e2e job.

I think this is a good idea, particularly for mysql since this is the one we have to wait for more than the others. However, this only exposes the service via the network so using things like wp-cli in our env setup script wouldn't be possible on the wordpress service container. Since the E2E setup involves things like installing the AMP plugin or Gutenberg with different versions, we can't simply use a database dump to speed things up if that's what you were thinking. As most of the time spent in starting the environment is in the provisioning script, how were you thinking of using services to improve this?

aaemnnosttv on 30 Oct 2020

Note: this may be possible to extract to a reusable partial(s) using YAML features like anchors and map merging to avoid duplicating in many jobs

As far as I know, it is not supported by GitHub yet.

It isn't 😞

aaemnnosttv on 31 Oct 2020

how were you thinking of using services to improve this?

I'll create a POC to show it.

eugene-manuilov on 2 Nov 2020

I think we _should_ be able to use a matrix to define differing variables between each E2E test. If we go with the "all jobs in one file" we should use a matrix for parallelism anyway.

I'm torn on the "each file for each type of E2E" test approach. I like the idea that everything is in one "E2E Test" job/section we could easily see from PRs. Being able to click on it for more info (when you want to see which one failed) is fine if they're consistent, but we don't need those in every PR list by default.

But _if_ things remain as inconsistent as they've been on Travis we should move to the per-file approach, to ease our paint of doing restarts.

The eternal optimist, I'm desperately hopeful that moving away from Travis will fix some performance issues causing our inconsistent test results. Is it tough to try for the single-file approach first but move to multiple ones later if it's still bad? 🤷🏻

(I think IB is largely ✅ from me, but I'll let others weigh in, especially after Eugene's proof-of-concept. 👍🏻 )

tofumatt on 3 Nov 2020

If we go with the "all jobs in one file" we should use a matrix for parallelism anyway.

GHA runs them in parallel based on their job dependencies already, only steps are run in order. So if there was a common build step that they shared they would all wait for that like they do in the storybook or zip workflows but after that they would all run in parallel 🙂

Being able to click on it for more info (when you want to see which one failed) is fine if they're consistent, but we don't need those in every PR list by default.

Each job will show up in the PR list with the status, same as it does now. In the checks section you'll be able to see the output from each job separately, even if they're all defined in the same workflow. (Easier to switch between as well)

The eternal optimist, I'm desperately hopeful that moving away from Travis will fix some performance issues causing our inconsistent test results.

I don't think that migrating to GHA will fix the instability we're seeing now. I think that has more to do with missing waits and race conditions which are possible as a result when scripting browser interactions 100x faster than a human can do them 😆
FWIW the instability is limited to a few specific tests, so in that regard it is consistent. For that reason, I'm hopeful we can fix the root cause.

Is it tough to try for the single-file approach first but move to multiple ones later if it's still bad?

It's not tough, it's just less ideal IMO. It may be a bit more verbose as much of the workflow definition will be the same which isn't great, but mostly because we wouldn't be able to leverage a shared build step because jobs can only depend on other jobs in the same workflow. Even if the overall job length isn't faster, I think we should leverage shared steps where possible to reduce concurrent jobs and total build time.

aaemnnosttv on 3 Nov 2020

how were you thinking of using services to improve this?

I'll create a POC to show it.

Unfortunately, I haven't been able to create a solid PoC. :(

eugene-manuilov on 9 Nov 2020

@aaemnnosttv @eugene-manuilov @tofumatt Regarding the file structure, I think for the additional flexibility and the ability to re-run tests more individually it makes sense to go with separate files per test. With only one file I'm pretty sure this would make day-to-day usage more painful because of the occasional failures. Today, if a test job fails, we run that one again. If we go with a single GH workflow for all tests, we would be required to run all again, which has a higher chance of randomly failing on something that previously already passed. So I'd suggest to use separate files, even if this means compromising on some performance benefits like reusing artifacts etc.

For the usage of services, I'd prefer to revisit this separately as a follow-up. Sticking to how our e2e tests currently work probably makes the transition smoother. Let's look at whether and how usage of services can improve this once we've migrated.

felixarntz on 9 Nov 2020

👍1

One thing that I think we need to add to the IB is to make sure we run e2e tests when a PR is ready for review (not in the draft status):

if: github.event.pull_request.draft == false

eugene-manuilov on 10 Nov 2020

👍1

IB ✅

felixarntz on 13 Nov 2020

@aaemnnosttv could you please take a look at this ticket? I can't make e2e tests pass in GHA :(

In the beginning, I tried to use our docker-compose approach, but all tests failed with timeout errors (see results here). Then I tried to install a new WP instance on the virtual machine itself using wp-cli commands but kept failing... You can see my changes in #2372.

Assigning back to you, maybe you can come up with a better idea.

eugene-manuilov on 7 Dec 2020

👍1

QA ✅

E2E tests running on actions, with nightly not required:

Screenshot 2020-12-22 at 20 53 54

tofumatt on 22 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Make sure plugin menu doesn't disappear from left nav

marrrmarrr · 5Comments

Proxy setup URL must contain all required scopes

aaemnnosttv · 5Comments

Default reducers causing unnecessary datastore updates

aaemnnosttv · 3Comments

Recognize 404s in entity detection

felixarntz · 4Comments

Use fetch-mock-jest over jest-fetch-mock

tofumatt · 3Comments