Node: Investigate flaky sequential/test-performance

Created on 6 Oct 2018 · 25Comments · Source: nodejs/node

https://ci.nodejs.org/job/node-test-binary-arm/3710/RUN_SUBSET=4,label=pi1-docker/console

00:40:41 not ok 330 sequential/test-performance
00:40:41   ---
00:40:41   duration_ms: 5.141
00:40:41   severity: fail
00:40:41   exitcode: 1
00:40:41   stack: |-
00:40:41     { name: 'node',
00:40:41       entryType: 'node',
00:40:41       startTime: 0,
00:40:41       duration: { around: 4609.206676006317 },
00:40:41       nodeStart: { around: 0 },
00:40:41       v8Start: { around: 0 },
00:40:41       bootstrapComplete: { around: 4517.247398018837 },
00:40:41       environment: { around: 0 },
00:40:41       loopStart: -1,
00:40:41       loopExit: -1 }
00:40:41     assert.js:349
00:40:41         throw err;
00:40:41         ^
00:40:41     
00:40:41     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1233.9948890209198 >= 1111
00:40:41         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:40:41         at Object.<anonymous> (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:104:1)
00:40:41         at Module._compile (internal/modules/cjs/loader.js:703:30)
00:40:41         at Object.Module._extensions..js (internal/modules/cjs/loader.js:714:10)
00:40:41         at Module.load (internal/modules/cjs/loader.js:613:32)
00:40:41         at tryModuleLoad (internal/modules/cjs/loader.js:552:12)
00:40:41         at Function.Module._load (internal/modules/cjs/loader.js:544:3)
00:40:41         at Function.Module.runMain (internal/modules/cjs/loader.js:756:12)
00:40:41         at startup (internal/bootstrap/node.js:303:19)
00:40:41         at bootstrapNodeJSCore (internal/bootstrap/node.js:865:3)
00:40:41   ...

Looks like a case of testing elapsed time (as a proxy for performance?). Is this a case of the test being unreliable or is it a case of the test flagging something we care about? Test doesn't have a comment explaining itself but it was added in 67269fd7f33 / https://github.com/nodejs/node/pull/14680 and there's context there.

/ping @jasnell

CI / flaky test performance test

Source

Trott

All 25 comments

@nodejs/testing @nodejs/performance

Trott on 6 Oct 2018

(The performance team is for changes related to /benchmark, not the perf hooks)

joyeecheung on 6 Oct 2018

https://ci.nodejs.org/job/node-test-commit-smartos/21373/nodes=smartos17-64/console

00:52:03 not ok 2418 sequential/test-performance
00:52:03   ---
00:52:03   duration_ms: 2.224
00:52:03   severity: fail
00:52:03   exitcode: 1
00:52:03   stack: |-
00:52:03     { name: 'node',
00:52:03       entryType: 'node',
00:52:03       startTime: 0,
00:52:03       duration: { around: 1893.748501777649 },
00:52:03       nodeStart: { around: 0 },
00:52:03       v8Start: { around: 0 },
00:52:03       bootstrapComplete: { around: 1887.9314036369324 },
00:52:03       environment: { around: 0 },
00:52:03       loopStart: -1,
00:52:03       loopExit: -1 }
00:52:03     assert.js:351
00:52:03         throw err;
00:52:03         ^
00:52:03     
00:52:03     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1796.7190556526184 >= 1111
00:52:03         at checkNodeTiming (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:92:7)
00:52:03         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos17-64/test/sequential/test-performance.js:104:1)
00:52:03         at Module._compile (internal/modules/cjs/loader.js:722:30)
00:52:03         at Object.Module._extensions..js (internal/modules/cjs/loader.js:733:10)
00:52:03         at Module.load (internal/modules/cjs/loader.js:620:32)
00:52:03         at tryModuleLoad (internal/modules/cjs/loader.js:560:12)
00:52:03         at Function.Module._load (internal/modules/cjs/loader.js:552:3)
00:52:03         at Function.Module.runMain (internal/modules/cjs/loader.js:775:12)
00:52:03         at startup (internal/bootstrap/node.js:308:19)
00:52:03         at bootstrapNodeJSCore (internal/bootstrap/node.js:878:3)
00:52:03   ...

Trott on 4 Nov 2018

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

Trott on 16 Nov 2018

Happened again in https://github.com/nodejs/node/pull/25656
https://ci.nodejs.org/job/node-test-binary-arm/5663/RUN_SUBSET=2,label=pi1-docker/testReport/(root)/test/sequential_test_performance/

Failed on a Raspberry PI 1 I think:

{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 2265.783347129822 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: -1,
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3541.9893350601196 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: -1 }
{ name: 'node',
  entryType: 'node',
  startTime: 0,
  duration: { around: 3574.7279500961304 },
  nodeStart: { around: 0 },
  v8Start: { around: 0 },
  bootstrapComplete: { around: 2176.2094011306763 },
  environment: { around: 0 },
  loopStart: { around: 2176.2094011306763 },
  loopExit: { around: 3575.0439462661743 } }
assert.js:351
    throw err;
    ^

AssertionError [ERR_ASSERTION]: bootstrapComplete: 1085.6422290802002 >= 1068
    at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
    at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
    at listOnTimeout (timers.js:324:15)
    at processTimers (timers.js:268:5)

Fishrock123 on 23 Jan 2019

I want to run a stress test on smartos 17 but it doesn't look like it's in a choice in our stress test job on CI. Can someone add it? @nodejs/build

@Trott Not sure if still required, since this was from a few months back, but I've just added it (seemed like a reasonable thing to have there): https://ci.nodejs.org/view/Stress/job/node-stress-single-test/jobConfigHistory/showDiffFiles?timestamp1=2018-12-07_11-55-34&timestamp2=2019-01-23_14-21-08

richardlau on 23 Jan 2019

🚀1

For some reason this has affected all three runs of https://github.com/nodejs/node/pull/25656 - does this just always happen on master right now?

Fishrock123 on 23 Jan 2019

Failing on my other rebased PR too.

Fishrock123 on 24 Jan 2019

Looks like it's barely-but-relatively-consistently failing due to arbitrary-ish timing constraints. The test lacks a comment explaining its purpose which is unfortunate, given it's bland name open to multiple interpretations. I guess it's testing perf_hooks generally?

What is this delta and delay stuff? Should this use common.platformTimeout() to adjust some values on slower machines? I mean, that's a band-aid and not a fix, but I'll certainly take it. Another workaround-but-not-a-solution-but-whatever-close-enough might be "change 1000 to 2000" or "change 1000 to 500". ¯\_(ツ)_/¯

pinging folks who have authored lines in the test: @BridgeAR @jasnell @TimothyGu @apapirovski @addaleax

Trott on 24 Jan 2019

Happened for the first time on node-daily-master today so something seems to have happened in the last 24 hours to cause this to be perma-red (or perma-ish-red) on Pi 1, it would seem.

@nodejs/testing

https://ci.nodejs.org/job/node-test-binary-arm/5673/RUN_SUBSET=3,label=pi1-docker/console

test-requireio_continuationlabs-debian9-armv6l_pi1p-1

00:44:32 not ok 342 sequential/test-performance
00:44:32   ---
00:44:32   duration_ms: 3.785
00:44:32   severity: fail
00:44:32   exitcode: 1
00:44:32   stack: |-
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 2250.0830249786377 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: -1,
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3522.481032371521 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: -1 }
00:44:32     { name: 'node',
00:44:32       entryType: 'node',
00:44:32       startTime: 0,
00:44:32       duration: { around: 3554.94070148468 },
00:44:32       nodeStart: { around: 0 },
00:44:32       v8Start: { around: 0 },
00:44:32       bootstrapComplete: { around: 2159.4409494400024 },
00:44:32       environment: { around: 0 },
00:44:32       loopStart: { around: 2159.4409494400024 },
00:44:32       loopExit: { around: 3555.3136978149414 } }
00:44:32     assert.js:351
00:44:32         throw err;
00:44:32         ^
00:44:32     
00:44:32     AssertionError [ERR_ASSERTION]: bootstrapComplete: 1075.962013244629 >= 1073
00:44:32         at checkNodeTiming (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:92:7)
00:44:32         at Timeout.setTimeout [as _onTimeout] (/home/iojs/build/workspace/node-test-binary-arm/test/sequential/test-performance.js:119:5)
00:44:32         at listOnTimeout (timers.js:324:15)
00:44:32         at processTimers (timers.js:268:5)
00:44:32   ...

Failed by just 2ms or so?

Trott on 24 Jan 2019

This gets weird. https://ci.nodejs.org/job/node-test-binary-arm/5610/ was green. Hooray? But looking at the fanned jobs for Pi1, nowhere was sequential/test-performance run. And the PR (https://github.com/nodejs/node/pull/25612) doesn't remove it or anything. The test is still in master and that PR is against master. ?????!!!!!! @nodejs/build

Trott on 24 Jan 2019

I am not entirely sure about what the test is actually doing but I guess it was failing because:

https://github.com/nodejs/node/blob/d40ed71c1136c4a1c93ed5fe16bf6e3f9ca7631d/test/sequential/test-performance.js#L17

Is too far away from the bootstrap complete milestone? Then there could be two remedies:

Get this value even earlier, do not get it only after common is required as common is not exactly small
Make the milestone somewhat more accurate: https://github.com/nodejs/node/pull/25667 handles part of that, but it's hard to say at what point we should consider the bootstrap as complete, unless we take care of every branches involved during the whole bootstrap process and place the milestones at the precise points for every one of the possible conditions.

joyeecheung on 24 Jan 2019

👍1

FWIW, the test seems to be passing again, at least for now, on Pi 1. See https://ci.nodejs.org/job/node-test-binary-arm/5677/RUN_SUBSET=3,label=pi1-docker/console for example. Not sure for how long, but I guess it's encouraging that it's not a big block fo red for the moment...

Trott on 24 Jan 2019

👍1

my take after reviewing the test case is:

though the value in this line
https://github.com/nodejs/node/blob/d40ed71c1136c4a1c93ed5fe16bf6e3f9ca7631d/test/sequential/test-performance.js#L59
seems arbitrary to me, as this test has passed over time make it an empirically proven value that represents an acceptable limit to the bootstrap latency.
if that is agreed, then it is worthwhile to see recent commits that increased the boot time.
if that commit (or commits) holds merit on its own, then accept that, and move the baseline to accommodate that by computing a new value for timeoutDelay

gireeshpunathil on 25 Jan 2019

this is what I did:

ran the test in a linux box (test does not fail here).
measured the actual time for it to reach the assertion point.
so for the bootstrapcomplete milestone I got 45 millis as the latency.
reduced the timeoutDelay until it failed.
deemed it as bad, and did biect.
and embarked on d3806f9f3cded6bce9831f5d8ff88372ba7e5861, beyond which the latency is only 35 millis.

these are small numbers in my fast linux box, and able to accommodate in the very conservative number of 1111 millis in the test. However, in the failing arm I guess the baseline as well as the increase in bootup time due to the said commit could be significantly large I guess.

So:

does the bootup time increase due to d3806f9f3cded6bce9831f5d8ff88372ba7e5861 justifiable? if not fix it,
if yes, how much it increases in arm, and is that acceptable? if not fix it,
if yes, accommodate this in the test

I guess the test is doing its intented work, and we should only be changing the heuristics, not the test logic itself.

/cc @joyeecheung

gireeshpunathil on 25 Jan 2019

@gireeshpunathil In theory https://github.com/nodejs/node/commit/d3806f9f3cded6bce9831f5d8ff88372ba7e5861 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

joyeecheung on 25 Jan 2019

👍1

@joyeecheung - yes,reading the commit message I also got the same impression - this commit as a change to boost the boot time.

unfortunately I don't know how to run a benchmark test!

gireeshpunathil on 25 Jan 2019

@gireeshpunathil In theory d3806f9 should’ve decreased the boot time because it delays the console extension until those are actually needed when the inspector console is accessed (and no longer done unconditionally during bootstrap), maybe a benchmark run. of misc/startup would help identifying that.

I suppose it's possible that the increase might be a peculiarity/limitation of perf_hooks rather than a real increase? @jasnell @TimothyGu

It's not clear to me that this test is actually trying to flag bootstrap performance issues as much as it is trying to confirm that perf_hooks module basically works. In that case, the fix might be https://github.com/nodejs/node/pull/25695.

Speaking of perf_hooks, I guess the module should theoretically pass the tests at https://github.com/web-platform-tests/wpt/tree/master/performance-timeline. Those are the tests for the API spec that perf_hooks implements. It may be possible that another solution is to remove this test and replace with those. (I don't actually know.)

Trott on 25 Jan 2019

I guess I also don't feel like using Pi 1 as our performance canary is a great approach. I'm inclined to treat this as a "bare functionality of perf_hooks" test rather than a "watch performance in Node.js" test.

This is supported IMO by the fact that the test was added with the initial implementation of perf_hooks in https://github.com/nodejs/node/commit/67269fd7f33 and not added after implementation as a performance test. We should probably rename it test-perf-hooks or something like that to avoid this confusion. @jasnell

Trott on 25 Jan 2019

@Trott I don't know about it passing as its failed twice in attempts for me today.

mhdawson on 25 Jan 2019

Fwiw, I also got 2 failures in a row for https://github.com/nodejs/node/pull/25688 as well, and going through the build history, @ofrobots and @danbev have also gotten failures on it today.

sam-github on 25 Jan 2019

Fixed by d1d357d

Trott on 26 Jan 2019

ok, now that #25695 is landed and this one closed it makes little difference to raise my opinion; but raising it anyways.

these points:

failing Pi are slow machines with little ram and a slow chip
the test is not supposed to validate performance

would have been considerations at the time when the test was introduced IMO. Now that it showed us something which is indicative of a change in timing in terms of latency in the boot sequences. So the question is:

should we ignore what we saw and amend the test to do what it is designed to test,
or
should we first make an effort to investigate what this failure means and implies?

we have many instances where unrelated tests threw light on serious issues in the code; issues associated in #25007 are latest examples - none of those were meant to catch race conditions; yet those tests were used to reproduce issues, troubleshoot, fix and finally validate the correctness of the fix.

my summary is that the test threw something to us - by design or by chance; amending the code block that suppresses the issue without looking at the root cause of the failure may be like silently ignoring an error.

gireeshpunathil on 26 Jan 2019

should we ignore what we saw and amend the test to do what it is designed to test,
or
should we first make an effort to investigate what this failure means and implies?

I guess ideally we'd do both. I don't want to leave it broken, causing people to either not be able to land valid changes or else get into the habit of ignoring failing test suites.

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

I think the issue this brought up is probably more appropriate for a benchmark than a test. (And maybe there already is a benchmark in benchmark/misc/startup.js?) Perhaps we can, at some point, introduce nightly performance benchmarks that will fail if things fall below a certain threshold or if there is a drop in performance larger than a certain percentage. (I'd like to see something like that for coverage stats too.)

Trott on 26 Jan 2019

👍1

But as you are suggesting, I also want to understand why the test started failing so a determination can be made if it is a sign of a larger problem.

Perhaps the thing to do is to open a new issue about the slower startup. Because I guess we are talking about two different things anyway. One issue is that a test is failing when it should not be failing. That's this issue. It's about fixing the test. But the second issue, which this one uncovered, is why is startup slower than it used to be? These questions are related, but they can (and probably should?) be considered separately? Like, it's possible (as we've done here) to fix the first issue without successfully addressing the second.

Trott on 26 Jan 2019

❤1

Was this page helpful?

0 / 5 - 0 ratings