Node: Tracking Issue: commit queue issues and feedback

Created on 14 Aug 2020  路  34Comments  路  Source: nodejs/node

commit-queue can be used to land pull requests. It is still experimental, so when using please remember to check if the Pull Request landed successfully. Please share any failed landings here so we can keep track of issues, our landing process is quite complex and full of edge cases, we'll need to tweak the commit-queue and node-core-utils before we can consider this feature stable.

I'll keep this issue updated with all known issues.

  • [x] ~Commits pushed since last review (https://github.com/nodejs/node/pull/34729#issuecomment-673846409)~

    • IMO it's fine to request the lander to approve the PR before adding the label

  • [x] Missing CI on doc-only changes with successful Actions run (https://github.com/nodejs/node/pull/34729#issuecomment-673846409)
  • [x] Doesn't work with multiple commits in a PR
  • [x] Doesn't work with fixup and squash commits
  • [ ] Don't land PRs that should be landed manually (manual-land label)
  • [ ] Can't land DEPXXX (https://github.com/nodejs/node/pull/34940)
  • [ ] If the newest Jenkins job run is a run that was deleted on Jenkins, the commit queue skips instead of failing

    • [ ] But if CI ran on Jenkins and other job was deleted (like CITGM or benchmark), it should succeed

  • [ ] Should skip if GitHub Actions still running
  • [ ] Action fails without setting label if local is not sync'ed with remote (which should only happen as a race condition). Should retry instead.

    • [ ] Land purple

    • [ ] Don't try to land commits with changes to .github/workflows

    • [ ] Rewrite in JS

    • [ ] Try/catch functionality (ensure -failed label is added when something fails)

meta

All 34 comments

Tried two pull requests, both failed:

  • Commits pushed since last review (https://github.com/nodejs/node/pull/34729#issuecomment-673846409)
  • Missing CI on doc-only changes with successful Actions run (https://github.com/nodejs/node/pull/34729#issuecomment-673846409)

Simple cases of two last points may be fixed with
https://github.com/nodejs/node-core-utils/pull/473

:thinking: Anyone knows why but these PRs failed to apply patches:
https://github.com/nodejs/node/pull/34997#issuecomment-687832961
https://github.com/nodejs/node/pull/34682#issuecomment-691138681

even though locally it applied successfully with manual git node land. Is three-way-merge somehow disabled for commit-queue?

We should add a link to the Action in the bot comment: so we don't have to go chasing it every time it fails: https://github.com/nodejs/node/runs/1102351422?check_suite_focus=true

As for the failures, I'm not entirely sure, it should've worked 馃

One thing to do when this happens is to run git node land with --yes locally, since that's how it runs on the commit queue. This will ensure we're running the same command. _Maybe_ ncu doesn't go to 3-way-merge with --yes? It should, but there might be a bug.

Should wait for GitHub Actions to finish running as well: https://github.com/nodejs/node/pull/35160#issuecomment-691369784

Any idea why #34951 is skipped?

We have quite a backlog of scheduled Actions for the past hour, so something is up with GitHub apparently:

image

@aduh95 ncu thinks the CI is still running. Since the label was added four days ago, it's unrelated to the security release lockdown, but I can't check the CI status now because of the lockdown. If I forget to look at it tomorrow after the lockdown is lifted, feel free to ping me.

In https://github.com/nodejs/node/pull/33770#issuecomment-693200192, a PR with 3 commits, the bot commented only with the last commit hash, instead of a range.

Nice catch, we added the autorebase flag but didn't update the comment generation

Found another issue: if a jenkins run was deleted but that's the newest one in the PR, it'll show as "pending" by ncu, which will cause the commit queue to skip (it should fail instead). Also, having a successful CI run should take precedence over having a non-CI already-deleted Jenkins run (like a CITGM or benchmark).

Screenshot might make it easier to understand the issue:

image

NCU considers CITGM runs? They're hardly ever green 馃槥. (It's probably still worth checking one was started, particular for semver-majors.)

It checks all Jenkins links it knows how to parse

Commit queue for https://github.com/nodejs/node/pull/35423 failed but didn't post the reason to the PR (but did update the labels).

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The logs for that run are here: https://github.com/nodejs/node/runs/1187952829?check_suite_focus=true

It didn't post the reason because the output was apparently too long for jq.
It contains thousands of lines like this:

image

i never thought I would see the day jq would be bested

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The logs for that run are here: https://github.com/nodejs/node/runs/1187952829?check_suite_focus=true

It didn't post the reason because the output was apparently too long for jq.
It contains thousands of lines like this:

image

Looks like that run tried to land more than one PR -- perhaps an edge case where the first failure left things in a bad state?

From https://github.com/nodejs/node/issues/34770#issuecomment-701459197:

Commit queue for #35423 failed but didn't post the reason to the PR (but did update the labels).

The same appears to have happened with https://github.com/nodejs/node/pull/35332#event-3825920665

Yeah I think 486 is definitely related

Just throwing an idea here: we could use the pull_request_target event to run a lean commit queue check before actually queuing. The lean check would run everything but wouldn't push (to avoid having two jobs running in parallel pushing, which could create a racing condition), so collaborators will get quicker feedback.

There is a race condition when the cron job runs moments after the commit-queue label is added, because adding labels on PR triggers the staleAction GH Action: https://github.com/nodejs/node/pull/35410#issuecomment-705861333

The commit queue should skip if Actions is still running, when it does that race condition should be considerably mitigated (it won't be completely gone but the window for the race condition to happen will be a lot smaller).

Something ~weird~ happened on https://github.com/nodejs/node/pull/34055#issuecomment-712935338: the CQ "failed", ~but the PR was pushed to master nonetheless.~ EDIT: Nevermind, I was confused by the GitHub UI.

Is it intended that bot fails to integrate PRs which have a failed github CI but a green jenkins CI?
see e.g. https://github.com/nodejs/node/pull/35777#issuecomment-717100774

@Flarna yes, because some tests (ASAN for example) will only run on Actions.

It looks like commit queue is skipping PRs if they have the commit-queue-failed label set (e.g. from a previous run).
Maybe the commit-queue task should remove the label instead skipping the PR.
refs: #35968

IMO this is expected behavior to prevent cases where -failed is added but the commit-queue label is not removed. What we could do is add a pull_request_target Action that removes the -failed label when commit-queue is added.

https://github.com/nodejs/node/runs/1375922879?check_suite_focus=true

When the commit queue doesn't have an up-to-date master branch and the push fails, it crashes without adding the commit-queue-failed nor commenting on the PR (https://github.com/nodejs/node/pull/36027):

error: failed to push some refs to 'https://github.com/nodejs/node'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
Error: Process completed with exit code 1.

The intention was for it to retry later, but it removes the commit-queue label 馃. Might need to retry in the action then.

https://github.com/nodejs/node/runs/1425783467?check_suite_focus=true

The CQ is unable to land commits that contain changes to .github/workflows/ files. That causes the CQ script to crash without adding the commit-queue-failed label nor commenting on the PR (https://github.com/nodejs/node/pull/36118).

+ rm output
++ git rev-parse origin/master
++ git rev-parse HEAD
+ commits=d7bfa589420d089d48ae4e685bdbde446962a6d8...735d864131523c0199bc3e5ea3535d9df42cc258
+ git push origin master
2. Post "Landed in 735d86413152" in https://github.com/nodejs/node/pull/36118
To https://github.com/nodejs/node
 ! [remote rejected]       master -> master (refusing to allow a Personal Access Token to create or update workflow `.github/workflows/linters.yml` without `workflow` scope)
error: failed to push some refs to 'https://github.com/nodejs/node'
Error: Process completed with exit code 1.

Wow that's an interesting edge case. There's probably nothing we can do about CQ being unable to land PRs changing an Action, but it should add the -failed label.

Actually, after reading the error again, it should be possible to push changes to .github/workflows assuming that's something we want. We'll need to request an extra scope for the PAT on nodejs/admin though.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

danialkhansari picture danialkhansari  路  3Comments

srl295 picture srl295  路  3Comments

cong88 picture cong88  路  3Comments

fanjunzhi picture fanjunzhi  路  3Comments

jmichae3 picture jmichae3  路  3Comments