Node: pummel/test-crypto-timing-safe-equal-benchmarks failing

Created on 26 Jun 2020 · 15Comments · Source: nodejs/node

Test: pummel/test-crypto-timing-safe-equal-benchmarks
Platform: Linux
Console Output:

08:05:52 not ok 66 pummel/test-crypto-timing-safe-equal-benchmarks
08:05:52   ---
08:05:52   duration_ms: 6.629
08:05:52   severity: fail
08:05:52   exitcode: 1
08:05:52   stack: |-
08:05:52     assert.js:385
08:05:52         throw err;
08:05:52         ^
08:05:52     
08:05:52     AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=7.814360810767208)
08:05:52         at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-custom-suites-freestyle/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
08:05:52         at Module._compile (internal/modules/cjs/loader.js:1217:30)
08:05:52         at Object.Module._extensions..js (internal/modules/cjs/loader.js:1238:10)
08:05:52         at Module.load (internal/modules/cjs/loader.js:1066:32)
08:05:52         at Function.Module._load (internal/modules/cjs/loader.js:954:14)
08:05:52         at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
08:05:52         at internal/main/run_main_module.js:17:47 {
08:05:52       generatedMessage: false,
08:05:52       code: 'ERR_ASSERTION',
08:05:52       actual: false,
08:05:52       expected: true,
08:05:52       operator: '=='
08:05:52     }
08:05:52   ...

Build Links:
https://ci.nodejs.org/job/node-test-commit-custom-suites-freestyle/15073/console started via https://ci.nodejs.org/job/node-daily-master/1949/

Daily master builds appear to have been consistently broken (for at least as long as we have history (27 May, we only keep a month's worth of results)) due to this test.

Previous issues with this test:

CI / flaky test

Source

richardlau

All 15 comments

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

addaleax on 26 Jun 2020

👍1

I bisected and it seems to be a result of a V8 update. Since we can't compile/test without most of the cherry-picks required for V8 updates, I was not able to identify a single commit. The first bad commit could be any of:

a48928836a930ecdc894e9c684a675bde1508109
c3866a1b3e9de7b19753d822b313509f43266c42
db0ed118d8f39e93063425076dd7d95351d799eb
308900faef8b2e5edab5f9737495d2d1b3b54ffe
6b5ea2efa23d5e6206ea8db2bba54995c20c6597
37abad4340d4602ced23125b5a409a2aa4a7f57d
55f7ae6386eafa10012e1a7ccf3149becb3b5288
1d6adf7432defeb39b751a19c68335e8afb0d8ee
9dfaf49b84ff13f83dbe9755ac2eb934719792df
32be677f49b6415ce08842fe78288637f300041d

Trott on 26 Jun 2020

Some work done by @sam-github revealed that adding the patch below makes the test pass even though there's no reason it should change execution, in theory. Sam commented:

I suspect v8 of doing some kind of cacheing of strings passed to eval. It might even be a bug.

diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js 
index b649b071e1e..bbe93d238b9 100644                                                                                                                                                          --- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -23,7 +23,7 @@ function runOneBenchmark(compareFunc, firstBufFill, secondBufFill, bufSize)
-    `);
+    `.replace(';', '#').replace('#', ';'));
}

Trott on 26 Jun 2020

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

Ah okay, I have no visibility of that. Given the state of daily master CI was discussed at the collaborator summit yesterday I was surprised no issue was open for it.

richardlau on 26 Jun 2020

👍1

Yeah, I think it’s fine to discuss it openly since the CI failures are available publicly as well.

addaleax on 26 Jun 2020

Fwiw, this has been discussed in the private repo for a bit since this is potentially security-relevant.

Ah okay, I have no visibility of that. Given the state of daily master CI was discussed at the collaborator summit yesterday I was surprised no issue was open for it.

We'd been talking about moving it to a public repo anyway because it does not seem to be security-relevant once people dug into it a bit. So, this is probably as good a way for that to happen as any. Plus, I think the results are public anyway. (I'm not sure if node-daily-master results is viewable by anyone with a GitHub account. I think it is.)

Trott on 26 Jun 2020

Some work done by @sam-github revealed that adding the patch below makes the test pass even though there's no reason it should change execution, in theory. Sam commented:

I suspect v8 of doing some kind of cacheing of strings passed to eval. It might even be a bug.
diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js 
index b649b071e1e..bbe93d238b9 100644                                                                                                                                                          --- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -23,7 +23,7 @@ function runOneBenchmark(compareFunc, firstBufFill, secondBufFill, bufSize)
-    `);
+    `.replace(';', '#').replace('#', ';'));
}

FWIW I can reproduce this on a Linux system inside IBM, where it's flaky:

-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
=== release test-crypto-timing-safe-equal-benchmarks ===
Path: pummel/test-crypto-timing-safe-equal-benchmarks
assert.js:385
    throw err;
    ^

AssertionError [ERR_ASSERTION]: timingSafeEqual should not leak information from its execution time (t=-14.106192455658816)
    at Object.<anonymous> (/home/users/riclau/sandbox/github/nodejs/test/pummel/test-crypto-timing-safe-equal-benchmarks.js:109:1)
    at Module._compile (internal/modules/cjs/loader.js:1217:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1238:10)
    at Module.load (internal/modules/cjs/loader.js:1066:32)
    at Function.Module._load (internal/modules/cjs/loader.js:954:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47 {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: false,
  expected: true,
  operator: '=='
}
Command: out/Release/node /home/users/riclau/sandbox/github/nodejs/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
...
[00:40|% 100|+   2|-   8]: Done
-bash-4.2$

Applying the patch from @sam-github doesn't appear to help. Running --jitless does, but maybe that's just because it makes everything slower? Or maybe there is a JIT/optimization bug?

-bash-4.2$ git diff
diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
index b649b07..7930213 100644
--- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -1,3 +1,4 @@
+// Flags: --jitless
 'use strict';
 const common = require('../common');
 if (!common.hasCrypto)
-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
[01:40|% 100|+  10|-   0]: Done
-bash-4.2$

richardlau on 26 Jun 2020

👍1

Or this:

-bash-4.2$ git diff
diff --git a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
index b649b07..bc831db 100644
--- a/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
+++ b/test/pummel/test-crypto-timing-safe-equal-benchmarks.js
@@ -1,3 +1,4 @@
+// Flags: --always_opt
 'use strict';
 const common = require('../common');
 if (!common.hasCrypto)
-bash-4.2$ tools/test.py --repeat 10 test/pummel/test-crypto-timing-safe-equal-benchmarks.js
[01:20|% 100|+  10|-   0]: Done
-bash-4.2$

(there's lots of --v8-options and I'll readily admit to not knowing what most of these do).

richardlau on 26 Jun 2020

We can try and rule in/out v8 changes. As far as I can tell, this was bisected to the v8 8.3 upgrade, which was upgraded from 8.1? Using the version numbers there, here's the git log between the two versions: https://chromium.googlesource.com/v8/v8/+log/8.1.307.31..8.3.110.9/?pretty=fuller&n=10000

One thing that would be helpful to know from someone who can reproduce this: why does the T-test fail? Is it that the first execution is slow, and then the rest are very fast? That would support the 'something gets cached incorrectly' theory.

psmarshall on 29 Jun 2020

I dug in a bit using V8 canary builds, since they provide a good way to bisect the combination of Node.js and V8 changes.

Fake git bisect log

canary20200222115d5495c2 [8.2.152-node.8]: good
canary2020032013d7e86e8c [8.3.35-node.8]: bad
canary20200303535dc9b07d [8.2.237-node.8]: bad
canary202002261c95df47d3 [8.2.184-node.8]: good
canary2020030169bd6793e5 [8.2.222-node.8]: bad
canary20200228c250e83364 [8.2.211-node.8]: good
canary20200229bc7ca2b453 [8.2.220-node.8]: good

That means the causing commit is one of https://chromium.googlesource.com/v8/v8/+log/8.2.220..8.2.222/?pretty=fuller&n=10000. https://github.com/v8/v8/commit/a4c14089b0852d21ec83e8b6fedf28e305ec7008 is the only one that sticks out to me. Unfortunately, reverting that patch does not apply cleanly on top of our 8.3 upgrade PR, so I’m not able to confirm.

However! This appears something that has been fixed in one of the later V8 versions:

Fake git bisect log

canary2020032013d7e86e8c [8.3.35-node.8]: bad
canary202004229479c2ddd5 [8.4.157-node.8]: good
canary20200405a0ba79f8ed [8.4.0-node.8 (candidate)]: bad
canary20200413e68a631861 [8.4.49-node.8]: bad
canary20200418c06b589ced [8.4.131-node.8]: bad
canary202004192943fd49c0 [8.4.133-node.8]: bad
canary20200420e3ff8005c4 [8.4.134-node.8]: bad

Iiuc, that means that the fix is one of https://chromium.googlesource.com/v8/v8/+log/8.4.134..8.4.157/?pretty=fuller&n=10000. Unfortunately, that’s a lot of commits, and none of them particularly stand out to me as possibly related to this test or possibly related to one of the commits that have potentially caused this.

(This is all based on repeated runs of the test with up to 30 iterations. That’s not a lot, so there’s always a chance that I went wrong here somewhere.)

One thing that would be helpful to know from someone who can reproduce this: why does the T-test fail? Is it that the first execution is slow, and then the rest are very fast? That would support the 'something gets cached incorrectly' theory.

@psmarshall My understanding is that the code in the test is supposed to remove outliers, so that’s probably not it.

addaleax on 29 Jun 2020

The fix might have been https://github.com/v8/v8/commit/8c68536601fe0e536026f8b10f9aa7e83646c317 which is in the fix range @addaleax provided. That would suggest the issue is sensitive to inlining. You could try configuring v8 with --no-turbo-inlining to turn it off and see if that fixes it even in the broken range?

psmarshall on 30 Jun 2020

@psmarshall Yes, --no-turbo-inlining makes the test pass consistently.

My understanding is that https://github.com/v8/v8/commit/8c68536601fe0e536026f8b10f9aa7e83646c317 is an optimization, so even if it might do the right thing for us here, it still means that something about the test isn’t quite right and should be fixed on our side?

addaleax on 30 Jun 2020

So … I’m still trying to understand this. The test is very carefully written to avoid the influence of JS engine modifications, so I still don’t fully understand why they might influence this test one way or another.

One thing I’ve noticed is that this flakiness goes away when testing the binding function directly instead of the JS function, so one thing I will try is to move the typechecking that we do into C++ and see if that helps. It’s probably a good idea anyway for this particular case (and imo other cases as well but that’s way out of scope here).

addaleax on 30 Jun 2020

Running with --trace-opt --trace-deopt --trace-turbo-inlining should give a full picture of what got optimized when and how the inlining happened too. Maybe that gives more precise clues about what's going wrong.

The crypto.timingSafeEqual function is a bound C++ function so inlining shouldn't matter.... but it seems something with inlining is going on.

psmarshall on 2 Jul 2020

Oh just saw your PR @addaleax - that JS wrapper before the C++ function would do it for sure

psmarshall on 2 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings