Node: investigate flaky buffer-free-callback addon test on AIX in CI

Created on 22 Dec 2019  Â·  21Comments  Â·  Source: nodejs/node

https://ci.nodejs.org/job/node-test-commit-aix/27470/nodes=aix61-ppc64/console

test-osuosl-aix61-ppc64_be-2

00:10:52 not ok 2527 addons/buffer-free-callback/test
00:10:52   ---
00:10:52   duration_ms: 0.169
00:10:52   severity: crashed
00:10:52   exitcode: -5
00:10:52   stack: |-
00:10:52     
00:10:52     
00:10:52     #
00:10:52     # Fatal error in , line 0
00:10:52     # Check failed: result.second.
00:10:52     #
00:10:52     #
00:10:52     #
00:10:52     #FailureMessage Object: fffffffffffd0a0
00:10:52   ...

This was on a node-daily-master run, not a CI for a PR or anything like that.

@nodejs/platform-aix

CI / flaky test buffer

Most helpful comment

@richardlau It’s definitely breaking addons in the wild, not just potentially. :confused:

I can try to make sure we include it in the 14.0.0 changelog.

All 21 comments

00:10:52 # Fatal error in , line 0

this is weird, even the C++ debug variables (filename and line number) seem to have been corrupted!

#find . | xargs grep "CHECK" 2> /dev/null | grep "result" | grep "second"
#
#find . | xargs grep "CHECK" 2> /dev/null | grep "second"
./node_binding.cc:    CHECK_GE(it->second.refcount, 1);
./env-inl.h:  CHECK_EQ(insertion_info.second, true);
./api/environment.cc:    CHECK_EQ(it->second, size);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(it->second.second);
./node_platform.cc:  CHECK(data.second);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
#

Is the result a variable name, or a typename?

#find . | xargs grep "CHECK" 2> /dev/null | grep "result" | grep "second"
#
#find . | xargs grep "CHECK" 2> /dev/null | grep "second"
./node_binding.cc:    CHECK_GE(it->second.refcount, 1);
./env-inl.h:  CHECK_EQ(insertion_info.second, true);
./api/environment.cc:    CHECK_EQ(it->second, size);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(insertion.second);
./node_platform.cc:  CHECK(it->second.second);
./node_platform.cc:  CHECK(data.second);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
./node_options-inl.h:  CHECK_EQ(it->second.type, kBoolean);
#

Is the result a variable name, or a typename?

@gireeshpunathil Strange the find/grep didn’t find https://github.com/nodejs/node/blob/5707ed21a253b2ed1e2f5944f622c20092b866d6/deps/v8/src/objects/backing-store.cc#L581 (not sure if others — I’m on a tablet and found that one through the GitHub web UI).

Ah looking again you ran find in src.

Also seen on a test build for the new AIX machines in the CI (https://github.com/nodejs/build/issues/2110#issuecomment-567731188): https://ci.nodejs.org/job/node-test-commit-aix/27426/nodes=aix71-ppc64/testReport/junit/(root)/test/addons_buffer_free_callback_test/
test-ibm-aix71-ppc64_be-1

thanks @richardlau !

so this would mean that we try to insert duplicate entries (of backing_stores) into the map. Don't know what that means.

There is a trace point in the neighborhood that might give more clues and that gets enabled with --trace_backing_store (FLAG_trace_backing_store) but the CLI parser does not accept such an option!

cc @nodejs/v8

Cc @addaleax and @thangktran as they’ve been doing some work recently to adapt Node.js to the changes V8 have made to backing store management.

@richardlau i submitted https://github.com/nodejs/node/pull/31053 recently that might fix this problem.

@gireeshpunathil it's due to a behaviour change in V8 ArrayBuffer. Previously, when a static buffer (or one that outlives the ArrayBuffer) is used, the BackingStore will not be registered. Now that it is registered, problem might occur when new ArrayBuffer is allocated at the same place of previous ArrayBuffer that is still being tracked in BackingStore table.

Stress run with #31053: https://ci.nodejs.org/view/Stress/job/node-stress-single-test/nodes=aix61-ppc64/31/

85 failures in 1000 runs so it doesn’t look like https://github.com/nodejs/node/pull/31053 fixes this 😞.
cc @thangktran

Attempting to bisect.

1000 runs against 2dff8ddafb3cd8ca6555d9e3731a1541c6a1bb2f (commit before 4f523c2c1a1c4cef33a1f925cb9102d5b8a51dab) had no failures: https://ci.nodejs.org/job/node-stress-single-test/32/nodes=aix61-ppc64/

Started 1000 runs against 4f523c2c1a1c4cef33a1f925cb9102d5b8a51dab: https://ci.nodejs.org/job/node-stress-single-test/33/nodes=aix61-ppc64/

Attempting to bisect.

1000 runs against 2dff8dd (commit before 4f523c2) had no failures: https://ci.nodejs.org/job/node-stress-single-test/32/nodes=aix61-ppc64/

Started 1000 runs against 4f523c2: https://ci.nodejs.org/job/node-stress-single-test/33/nodes=aix61-ppc64/

Still running but 15 failures out of 100 runs so it looks like 4f523c2c1a1c4cef33a1f925cb9102d5b8a51dab is the cause and is not fixed by any of the follow up fixes (https://github.com/nodejs/node/pull/30946, https://github.com/nodejs/node/pull/31053).

There is a trace point in the neighborhood that might give more clues and that gets enabled with --trace_backing_store (FLAG_trace_backing_store) but the CLI parser does not accept such an option!

I think the flag is only available in debug builds: https://github.com/nodejs/node/blob/2bdeb88c27b4d8de3a8f6b7a438cf0bcb88fa927/deps/v8/src/flags/flag-definitions.h#L893

This can easily be made reproducible by removing the global.gc() calls in the test.

The reason that this fails is that the test allocates multiple Buffers with the same base address, which, as @thangktran explained, is no longer allowed.

For fixing this, it should be enough to either move away from a static buffer for the underlying memory, or to modify the offset argument in the JS test file so that the Buffers never have the same base address.

@thangktran Are you interested in opening a PR for this?

This can easily be made reproducible by removing the global.gc() calls in the test.

The reason that this fails is that the test allocates multiple Buffers with the same base address, which, as @thangktran explained, is no longer allowed.

For fixing this, it should be enough to either move away from a static buffer for the underlying memory, or to modify the offset argument in the JS test file so that the Buffers never have the same base address.

@thangktran Are you interested in opening a PR for this?

@addaleax sure i'd love to. I'm on it.

fwiw, consistent reproduce - on any platform:

$ cat test/addons/buffer-free-callback/foo.js

const common = require('../../common')
const binding = require(`./build/${common.buildType}/binding`)
let buf1 = binding.alloc(64, 1, 0)
let buf2 = binding.alloc(64, 1, 0)
#tools/test.py addons/buffer-free-callback/foo
=== release foo ===                   
Path: addons/buffer-free-callback/foo
#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7ffeefbfe5c0

the only missing puzzle now is, why the C++ assertion check is failing to get the file name and line number. @addaleax - do you know?

the only missing puzzle now is, why the C++ assertion check is failing to get the file name and line number. @addaleax - do you know?

V8 is intentionally omitting these from non-debug builds, see

https://github.com/nodejs/node/blob/db109e85d678faf581433250bb1442f5eb24de61/deps/v8/src/base/logging.cc#L152-L158

That being said, here’s a PR that would enable stack trace printing for V8 check failures, which should already help a ton: https://github.com/nodejs/node/pull/31079

@gireeshpunathil it's due to a behaviour change in V8 ArrayBuffer. Previously, when a static buffer (or one that outlives the ArrayBuffer) is used, the BackingStore will not be registered. Now that it is registered, problem might occur when new ArrayBuffer is allocated at the same place of previous ArrayBuffer that is still being tracked in BackingStore table.

Are we calling this out anywhere as a notable change? If the behaviour change is breaking for one of our addon tests it’s potentially breaking for addons out in the wild.

@richardlau It’s definitely breaking addons in the wild, not just potentially. :confused:

I can try to make sure we include it in the 14.0.0 changelog.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ctavan picture ctavan  Â·  87Comments

AkashaThorne picture AkashaThorne  Â·  207Comments

aduh95 picture aduh95  Â·  104Comments

TazmanianDI picture TazmanianDI  Â·  127Comments

nicolo-ribaudo picture nicolo-ribaudo  Â·  147Comments