Spinalcordtoolbox: Use Github Actions for continuous integration

Created on 2 Dec 2020  Â·  25Comments  Â·  Source: spinalcordtoolbox/spinalcordtoolbox

Motivated by this https://github.com/neuropoly/spinalcordtoolbox/issues/3057#issuecomment-735281988 and as discussed during the weekly meeting we will migrate the continuous integration infrastructure to Github Actions.

I am creating this issue to document the process and centralize discussions on the topic.

CI HIGH

Most helpful comment

Update on the macOS hang: one of Github's employees worked on this on Christmas. They reproduced it by installing https://github.com/actions/runner in self-hosted mode, so I'm pretty sure it's a bug and I'm kicking it over to https://github.com/actions/runner/issues/884 and now I'm going to try to forget about it.

Maybe, maybe, the bug is in the combination of their macOS images plus their macOS runner, but I suspect it's just that they're subtlely mishandling dup() in their runner.

All 25 comments

Working on this! I got motivated today because @joshuacwnewton and shimming-toolbox were hitting pointless CI bugs with trying to use sct as part of other projects.

So far https://github.com/neuropoly/spinalcordtoolbox/runs/1602878596 has gotten to running the tests in 3 minutes. Travis always took at least 6 to get to the same place.

I hate to say it but vive la GithubCI.

I've ported the platform matrix over (almost); I sidestepped my util/dockerize.sh because Actions supports docker via its API: just say: container: debian:10, so that's nice.

However I've hit a stupid growing-pains snag: in travis we blocked off platforms into always/nightly, e.g.

always:

https://github.com/neuropoly/spinalcordtoolbox/blob/42f5a2ec1857a145f2de05e2b7fb6da627b5c7e1/.travis.yml#L94-L97

nightly:

https://github.com/neuropoly/spinalcordtoolbox/blob/42f5a2ec1857a145f2de05e2b7fb6da627b5c7e1/.travis.yml#L37-L40

Actions doesn't let you use if: within matrix: the way Travis does, so the most obvious equivalent on Actions I could think of is to add a nightly flag to the matrix and then read it:

https://github.com/neuropoly/spinalcordtoolbox/blob/514179be769410e1366cfc6a535344bc194ffd86/.github/workflows/tests.yml#L57

but this fails with

The workflow is not valid. .github/workflows/tests.yml (Line: 57, Col: 9): Unrecognized named-value: 'matrix'. Located at position 3 within expression: !(matrix.nightly) \|\| (matrix.nightly && (github.event_name == 'schedule' \|\| (github.event_name == 'push' && github.ref == 'refs/heads/release'))) |  
-- | --

This is a discovered bug, though Github hasn't addressed it so maybe it's not a known one? https://github.community/t/how-to-conditionally-include-exclude-items-in-matrix-eg-based-on-branch/16853/5. It sounds like if: on a job: got added only as an afterthought, and I guess not done very well. Indeed, it works if I move the if: a level down into step:. That'll work but leave us with a bunch of vacuous test runs cluttering up the UI and I don't want that.

The suggested fix there is to compute a run/don't run flag in the build matrix based on whatever condition we want -- just like we do with Travis -- but then add an exclude: block to cross off cases based on that. I don't really understand how exclude:/include: interact though and I haven't got this working yet.

EDIT: A couple of suggestions in https://github.community/t/conditional-matrices/17206/2; one unanswered SO post at https://stackoverflow.com/questions/65384420/how-to-make-a-github-action-matrix-element-conditional.

This person has nested a list inside the build matrix, which is already a list, but apparently that helped? https://github.com/rotators/ReDefine/blob/636a6676e48402489c88ea42a4874cdc5b98aa03/.github/workflows/Build.yml#L13-L28
This person is integrating information from github.* in their matrix https://github.community/t/how-to-conditionally-include-exclude-items-in-matrix-eg-based-on-branch/16853/6 but I don't see how yet if/how that lets them add or remove cases.

This one https://github.community/t/conditional-matrices/17206/2 suggests splitting the jobs in two; that would work, I think, though inelegant: we'd have one 'nightlies' job with a build matrix covering all the nightly builds, and one 'always' job with everything else, and the nightlies job could use if: in it because it wouldn't have to look into matrix from there, which is currently broken/unsupported in Actions (even though the other lines like runs-on and container can read it!); in the end it would all expand into one large pool of jobs running in parallel.

I get the sense GH Actions expects people to manage triggers by splitting their test suites into multiple workflows (i.e. multiple *.yml files), since event triggers are applied on a per-workflow basis (the on: keyword applies to the entire workflow file).

With Travis, we had one big .travis.yml file, but with GH Actions it seems they want us to think more in terms of a workflow directory (containing multiple workflows) instead.

So, we could have these two .yml files:

  • core_platforms.yml: Run on: branch push/pull request/nightly, with nightly timing being handled by the schedule: syntax.
  • extra_platforms.yml: Run on: nightly only.

Or, slightly different organization:

  • push-pr.yml: Core platforms only.
  • nightly.yml: Core platforms + extra platforms.

Thanks for the perspective @joshuacwnewton. I think you're probably right. I also think this is going to turn out to be an oversight and they'll come around to allowing this sort of thing because it forces code duplication for nothing.

Update: that SO thread now has an answer, which says to use two jobs: one to generate and filter the jobs -- using jq of course -- and the second to actually take that list of jobs and expand it: https://stackoverflow.com/a/65434401

I'm pretty sure this is just a stupid bug though. Working around it with jq is a good idea but hard to maintain because you can't use the standard yaml build matrix syntax for it -- you have to feed in the same data as json as a string or from a file instead which is annoying.

In other exciting adventures: Actions is tripping some kind of locale bug in sct_download_data: https://github.com/neuropoly/spinalcordtoolbox/runs/1606746772?check_suite_focus=true

Trying URL: https://github.com/sct-data/PAM50/releases/download/r20201104/PAM50-r20201104.zip
Downloading: PAM50-r20201104.zip
--- Logging error ---
Traceback (most recent call last):
  File "/github/home/sct_dev/python/envs/venv_sct/lib/python3.6/logging/__init__.py", line 996, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u201c' in position 42: ordinal not in range(128)
Call stack:
  File "/github/home/sct_dev/spinalcordtoolbox/scripts/sct_download_data.py", line 168, in <module>
    res = main()
  File "/github/home/sct_dev/spinalcordtoolbox/scripts/sct_download_data.py", line 160, in main
    install_data(url, dest_folder, keep=arguments.k)
  File "/github/home/sct_dev/spinalcordtoolbox/download.py", line 141, in install_data
    logger.warning("Removing existing destination folder \u201c%s\u201d", dest_folder)
Message: 'Removing existing destination folder \u201c%s\u201d'
Arguments: ('/github/home/sct_dev/data/PAM50',)

so that's on the pile to fix.

EDIT: probably because they set LC_ALL=LC_CTYPE=en_US.UTF-8. But this bug is on our end: we should be able to handle different locales; or else we should use locale.setlocale(locale.LC_ALL, 'C') (i.e. american english ascii) explicitly at the start of all our programs.

Also this: actions (specifically actions/checkout@v2) is quietly not using git behind our backs:

Run actions/checkout@v2
/usr/bin/docker exec  dcc30894dde171c7043d2c7308e176d7558eb5104d548c7ea9d9bf454d5ea0c3 sh -c "cat /etc/*release | grep ^ID"
Syncing repository: neuropoly/spinalcordtoolbox
Getting Git version info
  Working directory is '/__w/spinalcordtoolbox/spinalcordtoolbox'
Deleting the contents of '/__w/spinalcordtoolbox/spinalcordtoolbox'
The repository will be downloaded using the GitHub REST API
To create a local Git repository instead, add Git 2.18 or higher to the PATH
Downloading the archive
Writing archive to disk
Extracting the archive
/usr/bin/tar xz -C /__w/spinalcordtoolbox/spinalcordtoolbox/91e91672-e878-4bc4-880e-5aed13971109 -f /__w/spinalcordtoolbox/spinalcordtoolbox/91e91672-e878-4bc4-880e-5aed13971109.tar.gz
Resolved version neuropoly-spinalcordtoolbox-ae131aa

that's probably not good. that's going to break..stuff.

At a minimum it breaks https://github.com/neuropoly/spinalcordtoolbox/blob/6c18169f2d2ea8f7b43438c5f505a7fb0eea83d7/.ci.sh#L28

More adventures: the macOS test is hanging at conda activate: https://github.com/neuropoly/spinalcordtoolbox/runs/1606814539

@joshuacwnewton and I have seen this before but I'm unclear what causes it. ???

Got WSL passing. Now I need to figure out why macOS is hanging.

On test run https://github.com/neuropoly/spinalcordtoolbox/tree/0fdc21c54d6ce49816e7e8702b444e06e603fcdd/ here are outputs from travis vs github:

  • https://travis-ci.com/github/neuropoly/spinalcordtoolbox/jobs/464757046 -> travis-logs.txt
  • https://github.com/neuropoly/spinalcordtoolbox/runs/1607492573 -> gh-logs.txt

Their formats aren't quite the same; github has timestamps on each line while travis has them as special 'travis_time' lines at the start and end of blocks. Travis uses CR-LF lines. In set -x, Github's wrote "+ command" and travis wrote "+command". Both have cruft at the top and bottom I don't care about. This cleans it up:

$ cat /tmp/travis-logs.txt | tr -d '\r' | awk 'BEGIN { P=0 } /Installing SCT/ { P = 1 } P==1 { print }' | head
+echo Installing SCT
Installing SCT
+yes
+ASK_REPORT_QUESTION=false
+PIP_PROGRESS_BAR=off
+./install_sct


*******************************
* Welcome to SCT installation *
$ cat /tmp/gh-logs.txt | cut -f 2- -d ' ' | awk 'BEGIN { P=0 } /Installing SCT/ { P = 1 } P==1 { print }' | head
+ echo Installing SCT
Installing SCT
+ yes
+ ASK_REPORT_QUESTION=false
+ PIP_PROGRESS_BAR=off
+ ./install_sct


*******************************
* Welcome to SCT installation *

So now I can diff them:

$ diff -u <(cat /tmp/travis-logs.txt | tr -d '\r' | awk 'BEGIN { P=0 } /Installing SCT/ { P = 1 } P==1 { print }') <(cat /tmp/gh-logs.txt | cut -f 2- -d ' ' | sed 's/^\+ /\+/' | awk 'BEGIN { P=0 } /Installing SCT/ { P = 1 } P==1 { print }') | head -n 50
--- /dev/fd/63  2020-12-25 00:32:08.535441609 -0500
+++ /dev/fd/62  2020-12-25 00:32:08.535441609 -0500
@@ -14,7 +14,7 @@

 Checking OS type and version...

-Darwin Traviss-Mac.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
+Darwin Mac-1608870864276.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64
 ProductVersion:    10.15.7

 Checking requirements...
@@ -26,12 +26,12 @@
 SCT version ......... dev
 Installation type ... in-place
 Operating system .... osx (10.15.7)
-Shell config ........ /Users/travis/.bashrc
+Shell config ........ /Users/runner/.bashrc

 --> Crash reports will not be sent.


-SCT will be installed here: [/Users/travis/build/neuropoly/spinalcordtoolbox]
+SCT will be installed here: [/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox]

 Do you agree? [y]es/[n]o: 
 Skipping copy of source files (source and destination folders are the same)
@@ -40,45 +40,1203 @@
 Installing conda...


-rm -rf /Users/travis/build/neuropoly/spinalcordtoolbox/python
+rm -rf /Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python


-mkdir -p /Users/travis/build/neuropoly/spinalcordtoolbox/python
+mkdir -p /Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python


-wget -O /var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.NIkTWTnG/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
+wget -O /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp.3oleVKUV/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

---2020-12-25 04:48:52--  https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
+--2020-12-25 04:41:34--  https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
 Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3
 Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 57112343 (54M) [application/x-sh]
-Saving to: ‘/var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.NIkTWTnG/miniconda.sh’
+Saving to: ‘/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp.3oleVKUV/miniconda.sh’

Full output: diff.txt

They're both running on almost the same macOS, builds done just a couple days apart:

 ESC[0;32mChecking OS type and version...ESC[0m

-Darwin Traviss-Mac.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
+Darwin Mac-1608870864276.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64

They have different pwds of course: /Users/travis/build/neuropoly/ vs /Users/runner/work/spinalcordtoolbox/ but that's not a big deal.

Interestingly, only Github loaded my polyfill (#3123 ); Travis must preinstall the gnu userland tools?

+realpath () 
+{ 
+    python3 -c 'import sys, os; [print(os.path.realpath(f)) for f in sys.argv[1:]]' "$@"
+}

The interesting parts are probably somewhere in the environment vars, so here:

 +uname -a
-Darwin Traviss-Mac.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
+Darwin Mac-1608870864276.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64
 +set
-ANSI_CLEAR='\033[0K'
-ANSI_GREEN='\033[32;1m'
-ANSI_RED='\033[31;1m'
-ANSI_RESET='\033[0m'
-ANSI_YELLOW='\033[33;1m'
+AGENT_TOOLSDIRECTORY=/Users/runner/hostedtoolcache
+ANDROID_HOME=/Users/runner/Library/Android/sdk
+ANDROID_NDK_18R_PATH=/Users/runner/Library/Android/sdk/ndk/18.1.5063045
+ANDROID_NDK_HOME=/Users/runner/Library/Android/sdk/ndk-bundle
+ANDROID_SDK_ROOT=/Users/runner/Library/Android/sdk
 ASK_REPORT_QUESTION=false
 BASH=/bin/bash
 BASH_ARGC=()
@@ -177,136 +1336,135 @@
 BASH_REMATCH=([0]="y")
 BASH_SOURCE=([0]="./install_sct")
 BASH_VERSINFO=([0]="3" [1]="2" [2]="57" [3]="1" [4]="release" [5]="x86_64-apple-darwin19")
++tty
 BASH_VERSION='3.2.57(1)-release'
 BIN_DIR=bin
+BOOTSTRAP_HASKELL_NONINTERACTIVE=1
+CHROMEWEBDRIVER=/usr/local/Caskroom/chromedriver/87.0.4280.20
 CI=true
-CONTINUOUS_INTEGRATION=true
+CONDA=/usr/local/miniconda
 DATA_DIR=data
-DEBIAN_FRONTEND=noninteractive
 DIRSTACK=()
-DISPLAY=/private/tmp/com.apple.launchd.tLNuxrYBPd/org.macosforge.xquartz:0
-DISPLAY_UPDATE_PATH='export PATH="/Users/travis/build/neuropoly/spinalcordtoolbox/bin:$PATH"'
+DISPLAY_UPDATE_PATH='export PATH="/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/bin:$PATH"'
+DOTNET_MULTILEVEL_LOOKUP=0
+DOTNET_ROOT=/Users/runner/.dotnet
+EDGEWEBDRIVER=/usr/local/share/edge_driver
 EUID=501
-GEM_HOME=/Users/travis/.rvm/gems/ruby-2.6.6
-GEM_PATH=/Users/travis/.rvm/gems/ruby-2.6.6:/Users/travis/.rvm/gems/ruby-2.6.6@global
-GIT_ASKPASS=echo
+GECKOWEBDRIVER=/usr/local/opt/geckodriver/bin
+GITHUB_ACTION=run2
+GITHUB_ACTIONS=true
+GITHUB_ACTION_REF=
+GITHUB_ACTION_REPOSITORY=
+GITHUB_ACTOR=kousu
+GITHUB_API_URL=https://api.github.com
+GITHUB_BASE_REF=master
+GITHUB_ENV=/Users/runner/work/_temp/_runner_file_commands/set_env_ff0378c9-c61c-4899-84a2-f6d5739f6c5b
+GITHUB_EVENT_NAME=pull_request
+GITHUB_EVENT_PATH=/Users/runner/work/_temp/_github_workflow/event.json
+GITHUB_GRAPHQL_URL=https://api.github.com/graphql
+GITHUB_HEAD_REF=ng/ci-gh-actions
+GITHUB_JOB=test
+GITHUB_PATH=/Users/runner/work/_temp/_runner_file_commands/add_path_ff0378c9-c61c-4899-84a2-f6d5739f6c5b
+GITHUB_REF=refs/pull/3125/merge
+GITHUB_REPOSITORY=neuropoly/spinalcordtoolbox
+GITHUB_REPOSITORY_OWNER=neuropoly
+GITHUB_RETENTION_DAYS=90
+GITHUB_RUN_ID=443499730
+GITHUB_RUN_NUMBER=62
+GITHUB_SERVER_URL=https://github.com
+GITHUB_SHA=65218e7625ed0a4128aa435aedc27339f6a952f0
+GITHUB_WORKFLOW=Tests
+GITHUB_WORKSPACE=/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox
 GROUPS=()
-HAS_JOSH_K_SEAL_OF_APPROVAL=true
-HOME=/Users/travis
-HOMEBREW_NO_INSTALL_CLEANUP=1
-HOSTNAME=Traviss-Mac.local
+HOME=/Users/runner
+HOMEBREW_CASK_OPTS=--no-quarantine
+HOMEBREW_NO_AUTO_UPDATE=1
+HOSTNAME=Mac-1608870864276.local
 HOSTTYPE=x86_64
 IFS=$' \t\n'
-IRBRC=/Users/travis/.rvm/rubies/ruby-2.6.6/.irbrc
+ImageOS=macos1015
+ImageVersion=20201212.1
+JAVA_HOME=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
+JAVA_HOME_11_X64=/Library/Java/JavaVirtualMachines/adoptopenjdk-11.jdk/Contents/Home
+JAVA_HOME_12_X64=/Library/Java/JavaVirtualMachines/adoptopenjdk-12.jdk/Contents/Home
+JAVA_HOME_13_X64=/Library/Java/JavaVirtualMachines/adoptopenjdk-13.jdk/Contents/Home
+JAVA_HOME_14_X64=/Library/Java/JavaVirtualMachines/adoptopenjdk-14.jdk/Contents/Home
+JAVA_HOME_7_X64=/Library/Java/JavaVirtualMachines/zulu-7.jdk/Contents/Home
+JAVA_HOME_8_X64=/Library/Java/JavaVirtualMachines/adoptopenjdk-8.jdk/Contents/Home
 LANG=en_US.UTF-8
 LC_ALL=en_US.UTF-8
-LOGNAME=travis
+LC_CTYPE=en_US.UTF-8
+LOGNAME=runner
 MACHTYPE=x86_64-apple-darwin19
 MACOSSUPPORTED=13
 MPLBACKEND=Agg
-MY_RUBY_HOME=/Users/travis/.rvm/rubies/ruby-2.6.6
-NVM_BIN=/Users/travis/.nvm/versions/node/v15.1.0/bin
+NUNIT3_PATH=/Library/Developer/nunit/3.6.0
+NUNIT_BASE_PATH=/Library/Developer/nunit
 NVM_CD_FLAGS=
-NVM_DIR=/Users/travis/.nvm
-NVM_INC=/Users/travis/.nvm/versions/node/v15.1.0/include/node
-OLDPWD=/Users/travis/build/neuropoly/spinalcordtoolbox
+NVM_DIR=/Users/runner/.nvm
+OLDPWD=/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox
 OPTERR=1
 OPTIND=1
 OS=osx
 OSTYPE=darwin19
 OSver=10.15.7
-PAGER=cat
-PATH=/Users/travis/.rvm/gems/ruby-2.6.6/bin:/Users/travis/.rvm/gems/ruby-2.6.6@global/bin:/Users/travis/.rvm/rubies/ruby-2.6.6/bin:/Users/travis/.rvm/bin:/Users/travis/bin:/Users/travis/.local/bin:/Users/travis/.nvm/versions/node/v15.1.0/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/Apple/usr/bin
+PATH=/usr/local/opt/pipx_bin:/Users/runner/.cargo/bin:/usr/local/lib/ruby/gems/2.7.0/bin:/usr/local/opt/ruby/bin:/usr/local/opt/curl/bin:/usr/local/bin:/usr/local/sbin:/Users/runner/bin:/Users/runner/.yarn/bin:/usr/local/go/bin:/Users/runner/Library/Android/sdk/tools:/Users/runner/Library/Android/sdk/platform-tools:/Users/runner/Library/Android/sdk/ndk-bundle:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/usr/bin:/bin:/usr/sbin:/sbin:/Users/runner/.dotnet/tools:/Users/runner/.ghcup/bin:/Users/runner/hostedtoolcache/stack/2.5.1/x64
+PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG
 PIPESTATUS=([0]="0")
+PIPX_BIN_DIR=/usr/local/opt/pipx_bin
+PIPX_HOME=/usr/local/opt/pipx
 PIP_PROGRESS_BAR=off
-PPID=2500
-PS4=+
-PWD=/Users/travis/build/neuropoly/spinalcordtoolbox
+POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-macos1015
+PPID=1011
+PS4='+ '
+***
 PYTHONNOUSERSITE=1
 PYTHON_DIR=python
-RC_FILE_PATH=/Users/travis/.bashrc
+RCT_NO_LAUNCH_PACKAGER=1
+RC_FILE_PATH=/Users/runner/.bashrc
 REPORT_STATS=no
-RUBY_VERSION=ruby-2.6.6
+RUNNER_OS=macOS
+RUNNER_PERFLOG=/usr/local/opt/runner/perflog
+RUNNER_TEMP=/Users/runner/work/_temp
+RUNNER_TOOL_CACHE=/Users/runner/hostedtoolcache
+RUNNER_TRACKING_ID=github_0c1419f8-feb6-47f6-8b2d-8099980a8b52
+RUNNER_WORKSPACE=/Users/runner/work/spinalcordtoolbox
 SCRIPT_DIR=scripts
-SCT_DIR=/Users/travis/build/neuropoly/spinalcordtoolbox
+SCT_DIR=/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox
 SCT_INSTALL_TYPE=in-place
-SCT_SOURCE=/Users/travis/build/neuropoly/spinalcordtoolbox
+SCT_SOURCE=/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox
 SCT_VERSION=dev
 SHELL=/bin/bash
 SHELLOPTS=braceexpand:hashall:interactive-comments:xtrace
 SHLVL=4
-SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.2AtCNsNnZs/Listeners
-TERM=xterm
+SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.sFBrBcEC3k/Listeners
+TERM=dumb
 THE_RC=bash
-TMPDIR=/var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/
-TMP_DIR=/var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.NIkTWTnG
-TRAVIS=true
-TRAVIS_ALLOW_FAILURE=false
-TRAVIS_APP_HOST=build.travis-ci.com
-TRAVIS_APT_PROXY=http://build-cache.travisci.net
-TRAVIS_ARCH=amd64
-TRAVIS_BRANCH=master
-TRAVIS_BUILD_DIR=/Users/travis/build/neuropoly/spinalcordtoolbox
-TRAVIS_BUILD_ID=210558155
-TRAVIS_BUILD_NUMBER=14694
-TRAVIS_BUILD_STAGE_NAME=
-TRAVIS_BUILD_WEB_URL=https://travis-ci.com/neuropoly/spinalcordtoolbox/builds/210558155
-TRAVIS_CMD=./.travis.sh
-TRAVIS_COMMIT=65218e7625ed0a4128aa435aedc27339f6a952f0
-TRAVIS_COMMIT_MESSAGE='Merge ebc644d2d42928cf55ba7f12b348f662724d4ecf into 6c18169f2d2ea8f7b43438c5f505a7fb0eea83d7'
-TRAVIS_COMMIT_RANGE=42f5a2ec1857a145f2de05e2b7fb6da627b5c7e1...ebc644d2d42928cf55ba7f12b348f662724d4ecf
-TRAVIS_CPU_ARCH=amd64
-TRAVIS_DIST=notset
-TRAVIS_ENABLE_INFRA_DETECTION=true
-TRAVIS_EVENT_TYPE=pull_request
-TRAVIS_HOME=/Users/travis
-TRAVIS_INFRA=macstadium
-TRAVIS_INIT=notset
-TRAVIS_INTERNAL_RUBY_REGEX='^ruby-(2\.[0-4]\.[0-9]|1\.9\.3)'
-TRAVIS_JOB_ID=464757046
-TRAVIS_JOB_NAME='OSX 10.15 (Catalina)'
-TRAVIS_JOB_NUMBER=14694.3
-TRAVIS_JOB_WEB_URL=https://travis-ci.com/neuropoly/spinalcordtoolbox/jobs/464757046
-TRAVIS_LANGUAGE=ruby
-TRAVIS_OSX_IMAGE=xcode12.2
-TRAVIS_OS_NAME=osx
-TRAVIS_PULL_REQUEST=3125
-TRAVIS_PULL_REQUEST_BRANCH=ng/ci-gh-actions
-TRAVIS_PULL_REQUEST_SHA=ebc644d2d42928cf55ba7f12b348f662724d4ecf
-TRAVIS_PULL_REQUEST_SLUG=neuropoly/spinalcordtoolbox
-TRAVIS_REPO_SLUG=neuropoly/spinalcordtoolbox
-TRAVIS_ROOT=/
-TRAVIS_RUBY_VERSION=default
-TRAVIS_SECURE_ENV_VARS=false
-TRAVIS_SUDO=true
-TRAVIS_TAG=
-TRAVIS_TEST_RESULT=
-TRAVIS_TIMER_ID=1c567080
-TRAVIS_TIMER_START_TIME=1608871730959253000
-TRAVIS_TMPDIR=/var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.lM8NZWbk
+TMPDIR=/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/
+TMP_DIR=/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp.3oleVKUV
 UID=501
-USER=travis
+USER=runner
+VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg
+XCODE_10_DEVELOPER_DIR=/Applications/Xcode_10.3.app/Contents/Developer
+XCODE_11_DEVELOPER_DIR=/Applications/Xcode_11.7.app/Contents/Developer
+XCODE_12_DEVELOPER_DIR=/Applications/Xcode_12.3.app/Contents/Developer
 XPC_FLAGS=0x0
 XPC_SERVICE_NAME=0
 _=-a
-__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0
+__CF_USER_TEXT_ENCODING=0x1F5:0:0
 bidon=0
 change_default_path=y
-cmd='bash /var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.NIkTWTnG/miniconda.sh -p /Users/travis/build/neuropoly/spinalcordtoolbox/python -b -f'
+cmd='bash /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp.3oleVKUV/miniconda.sh -p /Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python -b -f'
 e_status=0
 macOSmajor=10
 macOSminor=15
 opt='?'
-profiles=/Users/travis/.bash_profile
-rvm_bin_path=/Users/travis/.rvm/bin
-rvm_path=/Users/travis/.rvm
-rvm_prefix=/Users/travis
-rvm_version='1.29.9 (latest)'
+profiles=/Users/runner/.bash_profile
 sourceblock=$'\nif [[ -n "$BASH_VERSION" ]]; then\n    # include .bashrc if it exists\n    if [[ -f "$HOME/.bashrc" ]]; then\n    . "$HOME/.bashrc"\n    fi\nfi'
 sw_vers_output=$'ProductVersion:\t10.15.7'
-txt='bash /var/folders/z3/_825pg0s3jvf0hb_q8kzmg5h0000gn/T/tmp.NIkTWTnG/miniconda.sh -p /Users/travis/build/neuropoly/spinalcordtoolbox/python -b -f'
+txt='bash /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp.3oleVKUV/miniconda.sh -p /Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python -b -f'
 type=code
-uname_output='Darwin Traviss-Mac.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64'
+uname_output='Darwin Mac-1608870864276.local 19.6.0 Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64 x86_64'

Combing through this, two things jump out at me:

  1. Both of them say "not a tty", so that's ruled out as the issue.
  2. Travis has DISPLAY set and is running xquartz and Github is not
  3. Github has TERM=dumb set:
    > > -SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.2AtCNsNnZs/Listeners > -TERM=xterm > +SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.sFBrBcEC3k/Listeners > +TERM=dumb >
  4. Travis has set several vars that force noninteractive mode in common apps:

    • PAGER=cat set, i.e. make less a no-op.

    • DEBIAN_FRONTEND=noninteractive

    • GIT_ASKPASS=echo

    • CONTINUOUS_INTEGRATION=true

TERM=dumb could definitely choke something like conda create; if conda create is expecting to be giving prompts. Or any of them really.

Anyway I did a quick test and applied by yes |-is-bad patch from #3102 and it..worked: https://github.com/neuropoly/spinalcordtoolbox/runs/1607610135 (I cancelled that run but if you scroll down you can see it getting past conda create and onto "sourcing conda.sh").

Then I went back and added

export TERM=xterm
export PAGER=cat

but it hung that time. Then I added the other three; still a hang.

Okay I am giving up on figuring this out. This is ...quite a mystery. Maybe something to do with Github Actions doing shennaigans with file descriptors? conda create -y works fine, and I wanted to get that in anyway, so that's what I'm going to do.

It's also hanging

https://github.com/neuropoly/spinalcordtoolbox/blob/91b04bda04fd2fd6d0c02e3308a0bd9c19ecc54a/install_sct#L571-L572

and at

https://github.com/neuropoly/spinalcordtoolbox/blob/e6cd4a750544adbb4d0ce3801516c5deed844c1f/.ci.sh#L9

once the install finishing.

It seems that yes is refusing to die when its stdout is closed.

I can reproduce it with just

https://github.com/neuropoly/spinalcordtoolbox/blob/b58bff117eebd992ae91dc51a514a7d9c72e303c/.ci.sh#L9-L10

Travis gives

+command -v yes
/usr/bin/yes
+yes
+head -n 10
y
y
y
y
y
y
y
y
y
y
+echo 'That was 10 yeses'
That was 10 yeses

Github gives

+ command -v yes
/usr/bin/yes
+ yes
+ head -n 10
y
y
y
y
y
y
y
y
y
y

and then a hang.

@kousu also ran into this, see minimum repro:https://github.com/Drulex/conda-test-actions.

Also my previous WIP: https://github.com/Drulex/spinalcordtoolbox (see aj-basic-github-actions branch).

From my experimentation I only saw the macOS hang on Github hosted instances, not on self-hosted runners.

Oh cool! Just finished mine at https://github.com/kousu/hanging-actions/. conda turned out to be a red-herring, it's something to do with yes, or maybe with how pipes work on those runners (which would be very bizarre).

Whatever. I'm kicking it upstream over to https://github.com/actions/virtual-environments/discussions/2352; if that doesn't get any answers hopefully at least they can point us to the right project to file a bug against.

When you say "self-hosted runners" do you mean you tested self-hosting macOS runners?

This seems to be the master script that generates their (probably buggy?) runners: https://github.com/actions/virtual-environments/blob/main/images/macos/templates/macOS-10.15.json, calling out to many of the scripts in https://github.com/actions/virtual-environments/tree/main/images/macos/provision/core. I'm not going to dig into that until someone else gives me a tip.

In the meantime I'm going to solve this by rebasing on my (growing by the day) patch #3102.

Weird thing about Actions: if the platform you're on doesn't happen to have git installed, actions/checkout shrugs and downloads a .zip via the Github web API instead.

This bugs out our installer: it makes the installer think it's installing a release because

https://github.com/neuropoly/spinalcordtoolbox/blob/6c18169f2d2ea8f7b43438c5f505a7fb0eea83d7/install_sct#L377-L384

I'm not sure how much of a problem this will be. We can head-off problems by running SCT_INSTALL_TYPE=in-place ./install_sct? Is that a good idea?

Update on the macOS hang: one of Github's employees worked on this on Christmas. They reproduced it by installing https://github.com/actions/runner in self-hosted mode, so I'm pretty sure it's a bug and I'm kicking it over to https://github.com/actions/runner/issues/884 and now I'm going to try to forget about it.

Maybe, maybe, the bug is in the combination of their macOS images plus their macOS runner, but I suspect it's just that they're subtlely mishandling dup() in their runner.

I realized just now that if we want to do this well, we should have these transition periods:

  1. Only Travis
  2. Travis + Actions
  3. Travis [non-mandatory] + Actions
  4. Only Actions

That way there won't be too many surprises. I didn't know how to actually implement 2, but @joshuacwnewton figured it out: just go to https://github.com/neuropoly/spinalcordtoolbox/settings/branch_protection_rules/12845205 and turn off/on the different checks; hopefully disabling the Travis check still means Travis runs, just that it doesn't block:

2020-12-28-141200_911x219_scrot

here, the labels are mostly by checks added by #3125, it picked them up even though they're not on master yet, and Travis means all the platforms checked by Travis.

I think I mentioned this above but: actions/checkout@v2 will fall back on downloading a .zip over HTTP if it doesn't find git on the platform it's on. Actions' ubuntu-16.04 doesn't come pre-installed with git, but ubuntu-18.04 does. For us, this means we fall into our release install path (#3140).

Should we try to fix this? So that every platform is running the same thing?


I tried to fix this by adding

https://github.com/neuropoly/spinalcordtoolbox/blob/594ffac9e8514f04ec4f779e0024261bcdd06e43/.ci.sh#L19

and using SCT_DIR explicitly

https://github.com/neuropoly/spinalcordtoolbox/blob/594ffac9e8514f04ec4f779e0024261bcdd06e43/.ci.sh#L99

buttttt this fails under CI (both Travis and Actions) because, contrary to the advice we give:

https://travis-ci.com/github/neuropoly/spinalcordtoolbox/jobs/466159959#L987

Open a new Terminal window to load environment variables, or run:
source /home/travis/.bashrc

Actually doing that hits

# If not running interactively, don't do anything
[[ $- != *i* ]] && return

(most distros come with a line like that prepackaged in their bashrc's).

This is a stumbling block:

What we should be doing is installing to ~/.local/bin/ or /usr/bin/ or whatever other standard system paths are out there; if we were using a pure-pip install and skipped conda entirely we would have this already in place. But because we don't we need install_sct to inform us where it decided to install, instead of us being able to tell it where to install. And the only way it currently shares that information is via .bashrc. Which I can't source under CI on most distros.


I worked around it by changing ./install_sct to use ~/.bash_profile instead. That's an API change and I'm not confident in it because even after all this time using unix I still don't know all the corners of shell thoroughly, but I think it's right: ~/.bashrc is specifically for interactive session; for more account-wide things, you use /etc/profile and it's subsidiaries.

BUG: I'm seeing intermittent crashes -- seems like a concurrency bug -- but so far only on macOS on Actions:

e.g. https://github.com/neuropoly/spinalcordtoolbox/runs/1625704027?check_suite_focus=true#step:4:2097

INTERNALERROR> E           AssertionError: Traceback (most recent call last):
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/_pytest/main.py", line 267, in wrap_session
INTERNALERROR> E                 config.hook.pytest_sessionstart(session=session)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
INTERNALERROR> E                 return self._hookexec(self, self.get_hookimpls(), kwargs)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
INTERNALERROR> E                 return self._inner_hookexec(hook, methods, kwargs)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
INTERNALERROR> E                 firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/callers.py", line 208, in _multicall
INTERNALERROR> E                 return outcome.get_result()
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/callers.py", line 80, in get_result
INTERNALERROR> E                 raise ex[1].with_traceback(ex[2])
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
INTERNALERROR> E                 res = hook_impl.function(*args)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/conftest.py", line 31, in pytest_sessionstart
INTERNALERROR> E                 downloader.main(['-d', 'sct_testing_data', '-o', sct_test_path()])
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/spinalcordtoolbox/scripts/sct_download_data.py", line 160, in main
INTERNALERROR> E                 install_data(url, dest_folder, keep=arguments.k)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/spinalcordtoolbox/download.py", line 199, in install_data
INTERNALERROR> E                 shutil.copy(srcpath, dstpath)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/shutil.py", line 246, in copy
INTERNALERROR> E                 copymode(src, dst, follow_symlinks=follow_symlinks)
INTERNALERROR> E               File "/Users/runner/work/spinalcordtoolbox/spinalcordtoolbox/python/envs/venv_sct/lib/python3.6/shutil.py", line 144, in copymode
INTERNALERROR> E                 chmod_func(dst, stat.S_IMODE(st.st_mode))
INTERNALERROR> E             FileNotFoundError: [Errno 2] No such file or directory: 'sct_testing_data/template/template/PAM50_small_t2.nii.gz'
INTERNALERROR> E           assert False
INTERNALERROR> 
INTERNALERROR> python/envs/venv_sct/lib/python3.6/site-packages/xdist/dsession.py:187: AssertionError
[gw0] node down: Not properly terminated

but the previous run https://github.com/neuropoly/spinalcordtoolbox/runs/1625475620?check_suite_focus=true passed.

The diff between them was simply making an in-place install happen

https://github.com/neuropoly/spinalcordtoolbox/compare/89f2ae1a1a92f9b74872c181ff4e0a673fd57385..b7a2f38b88f6411308938479622207239828debf

which shouldn't make a difference, because the macOS machines have git installed which leads to both being in-place anyway:

So, it looks like pytest is spawning parallel testers here? Why haven't we noticed that before? Is it new? It seems likely that if, say, three tests all start at the same time then there's a race to get to https://github.com/neuropoly/spinalcordtoolbox/blob/6172cd22feeb0f64ac1bd68bce3f89f7109e68b8/conftest.py#L28-L31 first.


Here's a re-run job: https://github.com/neuropoly/spinalcordtoolbox/runs/1625782954: it passed.

Found a UI bug in Actions:

  • to rerun, you have to rerun all jobs in a Workflow; Travis lets you pick individual jobs to rerun.
  • if you rerun a job, there's no way to see the history of the previous run. The links to the logs of the individual jobs still work, but the overall build itself hides them and replaces them with new links. Which is... unfortunate..because then you can't compare history over time. I'll probably end up working around this by making no-op commits just to retrigger the build almost all the time.
  • sometimes you don't get any output, the UI just says "Starting your Workflow Run" even if it's running in the background.
  • you don't get any output in the live view until new output comes out; so you need to load the page before the script starts if you're going to catch it all; else you have to wait for the end when you can download the logs

More issues from Actions:

BUG: I'm seeing intermittent crashes -- seems like a concurrency bug -- but so far only on macOS on Actions:

So, it looks like pytest is spawning parallel testers here? Why haven't we noticed that before? Is it new? It seems likely that if, say, three tests all start at the same time then there's a race to get to

Aha! I've experienced this exact same issue in https://github.com/neuropoly/spinalcordtoolbox/pull/3116#discussion_r549001772. From that comment:

Sometimes it works, but sometimes redownloading the data causes a FileNotFound error, because the data folder will be wiped for each worker. (See attached log.txt.) This is related to #2957 and #2959.

I can avoid this locally by setting -n 1 as recommended in #2957 (comment), but I'm not sure what the behavior will be like in the CI.

Possibly a sign that we should address #2957/#2959 sooner than later.

Whoops, wrong button. :sweat_smile:

BUG: after a while (after a cache times out? or something?) github actions won't download logs properly to the main UI. The logs are still there if you do "Download Raw Log" but you can't link to specific lines like you can with Travis.

e.g. a week ago this link took you to the line mentioned https://github.com/neuropoly/spinalcordtoolbox/runs/1629734360#step:3:2105, but now it just says "Error"

Screenshot_2021-01-07_00-43-36

@kousu Strange... When I click that link, I get taken to the right line. I wonder what happened with that screenshot. :worried:

Screenshot from 2021-01-07 11-15-06

Now that we've started to use up credits for TravisCI (see #3273), I've added the high priority label here. PR #3125 is ready, but it just needs a reviewer at this point.

Was this page helpful?
0 / 5 - 0 ratings