Hello
while fiddling with the toy js parser I have made, i came across weird 139 (that is, Segmentation fault) errors which are driving me nuts. I've pruned every unnecessary part of the parser and looks like i have been able to isolate the error-causing case. i have very little experience using debugging tools to delve in this issue all by myself; i would hence be glad if someone could help me find out what is going on .
the faulty case: https://github.com/icefapper/lubejs
meminfo: https://gist.github.com/icefapper/3081ce0e4f1e8bb17314
my sys is : Linux 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux (specifically it is LM 17.1)
cpuinfo: https://gist.github.com/icefapper/5dc357ab7d751a7ceb86
i have been experiencing it on 5.6.0 and up (i.e, even with the latest stable 5.9.* which i even built from source)
Thanks a lot reading this far, and I hope you could reproduce the error case. Simply run the "run.sh" thing 40-times or so, and the read the "lubelean.log"; please note, though, that the "lubelean.log" already contains the logs I received by running 'run.sh' which means if you need your own logs, you must delete it and _then_ run 'run.sh'.
I can reproduce a segfault with just node lubelean.js
, with this file: https://raw.githubusercontent.com/icefapper/lubejs/fd592cd28796cf3f557853b5011abcade6ea29fd/lubelean.js
Requires several runs, though.
It doesn't use any modules or Node.js API, except for the console. Most probably this is a v8 issue.
@icefapper could you try to produce a smaller testcase?
Thanks for such a quick reply!
Actually, the reason lubelean.js is so big is because i thought someone might want to use other cases (that is, other js code snippets than the default one i have provided.) I'd be glad to make it smaller if it is necessary.
Regards,
Neni
@icefapper I've done some initial cleanup in https://github.com/ChALkeR/lubejs/blob/master/lubelean.js.
Just removing methods that were not being called reduced the line count about three times.
WoW! I just did some leaning myself; I confess though, it is by no means leaner than yours :\
https://github.com/icefapper/lubejs/raw/master/lubelean.js
Stack trace:
* frame #0: 0x0000000100377534 node`v8::internal::StoreBuffer::IteratePointersToNewSpace(void (*)(v8::internal::HeapObject**, v8::internal::HeapObject*)) + 1972
frame #1: 0x0000000100313709 node`v8::internal::Heap::Scavenge() + 1273
frame #2: 0x000000010031203f node`v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 879
frame #3: 0x0000000100311a01 node`v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) + 689
frame #4: 0x00000001002ca72c node`v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) + 108
frame #5: 0x000000010051618e node`v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) + 110
frame #6: 0x00000ec7dcd062d5
frame #7: 0x00000ec7dce9ba92
frame #8: 0x00000ec7dce9b513
frame #9: 0x00000ec7dce9b513
@ofrobots sorry for asking a newbieish thing, but, could i ask what it means and how you acquird it? thanks a lot
@icefapper I got this stack trace by running node inside a debugger (e.g lldb -- node lubelean.js
). This shows that the segfault crash occurs while gc is iterating some objects. I am continuing to look into this further.
frame #1: 0x000000010021fc93 node_g`v8::internal::Map::instance_size(this=0xdeadbeedbeadbeed) + 35 at objects-inl.h:4294
4291
4292
4293 int Map::instance_size() {
-> 4294 return NOBARRIER_READ_BYTE_FIELD(
4295 this, kInstanceSizeOffset) << kPointerSizeLog2;
4296 }
4297
(lldb) p this
(v8::internal::Map *) $1 = 0xdeadbeedbeadbeed
(lldb) up
frame #2: 0x000000010021faa1 node_g`v8::internal::HeapObject::SizeFromMap(this=0x00001d2ea2ae9929, map=0xdeadbeedbeadbeed) + 33 at objects-inl.h:4353
4350
4351
4352 int HeapObject::SizeFromMap(Map* map) {
-> 4353 int instance_size = map->instance_size();
4354 if (instance_size != kVariableSizeSentinel) return instance_size;
4355 // Only inline the most frequent cases.
4356 InstanceType instance_type = map->instance_type();
(lldb) p this
(v8::internal::HeapObject *) $2 = 0x00001d2ea2ae9929
(lldb) v8 print 0x00001d2ea2ae9928
<Smi: 7470>
EDIT: v8 print is not to be trusted.
I performed a bit more cleanup, the result is at https://github.com/ChALkeR/lubejs
@ChALkeR thanks! I can reproduce the crash with vanilla d8; replacing console.log
with print
.
@ChALkeR funny, I thought the segv error is due to the stack getting repeatedly pushd and popd as the precedences fluctuated (in prseNonSeqExpr); you've proven me wrong :)
laugh at me, but aside from the prevalent 139 i frequently got, there was also a 132 error (i.e, "invalid instruction") occasionally showing up, this time coupled with a core dump, and "slice" was always the first frame that appeard in the dump; now i'm not telling the slice is the culprit, but it just makes me quite suspicious about it
Very curious. I see a bunch of deadbeef
, and kin, in the area of the heap around we die while doing a GC walk. This is the only area on this heap page that I found deadbeef
. https://gist.github.com/ofrobots/2e1c02541bfc0ebb675c
@icefapper I haven't been able to reproduce the invalid instruction crashes. That is interesting. These kinds of symptoms would indicate a memory-corruption type failure to me. If you do have those core dumps still available, I would be interested in know about what is in memory around the instruction the processor was trying to execute. I can give you the commands to run to get this data.
On a debug build, the GC zaps garbage memory blocks with deadbee*
. This is more reproducible with a debug build, or if you add --verify-heap
. This looks like a gc issue to me.
I can reproduce this with V8 4.5 (Node 4) as well.
@ofrobots, I reproduced the invalid instruction crash. But it happened only once out of many runs.
@ChALkeR If you still have a debugger active, or have a core-dump, could you print some instructions around the crash point? Running x/20i $pc
in lldb or gdb should work.
Opened V8 issue here: https://bugs.chromium.org/p/v8/issues/detail?id=4871
will try my best to reproduce the inv-instr error.
Also, thanks for confirming it's not a bug on my side. i was literally on the verge of a nervous breakdown.
@ofrobots Sorry, I don't have that now. Perhaps running the original testcase overnight would help?
The testcase is under 100 lines now.
@ofrobots @ChALkeR
I have made bit of a progress:
after inlining (i.e, substituting with their bodies) the calls to this.loc
, this.locBegin
, and this.locOn
in both the lean and the original testcase, SIGSEGV disappeard.
First I thought it might be that V8 does not like a function returning an object (silly, I cede, but experience has taught me that silly does not mean impossible.) I guess I was wrong there ; reason? take a look at this excellently leand testcase. Try fiddling with the object lterals that contain a 'loc:' by changing its value to {}, or by eliminating it altogether (there are only four of them I guess). I will try to share some of the variations that make the SIGSEGV disappear. I was so giddy finding out about it that I did not remember to save my findings :\ My take is that it has something to do with the convoluted way V8 might be handling the so-calld "hidden classes". As to why only 'loc' gets affected, I have no idea.
after inlining (i.e, substituting with their bodies) the calls to this.loc, this.locBegin, and this.locOn in both the lean and the original testcase, SIGSEGV disappeard.
Yes, I noticed that, too. It's not as trivial, though. Only one place counts (take a look at my testcase — it has the call to .loc()
in a single place).
Also notice how my testcase redefines loc()
several times — removing that keeps the segfault, but it looks like that makes it less often for some reason. I'm not 100% sure on this, though.
The same about re-assigning tok
— looks like its presense increases the chances to get a segfault.
@ofrobots Update the testcase, please. It looks like the new one fails more often (and doesn't require the tok
re-assigning anymore).
Looks like the vm is gc'ing undef
s; please give it a go
further evidence for it : https://gist.github.com/icefapper/ee8e346c5eead603c855
@icefapper Change it to head.type
in the first example, and it will still crash. Note that head.type
is defined =).
Thanks ; could I ask the precise location to apply the change?
@icefapper both of the lines you changed =)
Edit: that would be in the second example, in fact =)
@ChALkeR
You are completely right ; funnily enough, it happens even with ({l: 12}).l
instead of head.type
v8 issue is fixed, is this still reproducable?
The v8 issue is fixed, and afaik it landed in Node.js.
I can't reproduce on Node.js v6.2.1.
Closing. Feel free to comment and/or reopen =).
The problem still exist on on v4.x
and v5.x
.
@ofrobots Is there anything actionable here (i.e. is it feasible to backport the fix)?
Edit: ah, I see the PR for 4.x.
Yes, I opened the PR for v4.x
here: https://github.com/nodejs/node/pull/7303.
I am not sure there is enough runway left on v5.x
for it to be worth fixing on that branch.
Can we close this? The fix has landed in v4.x-staging
and I don't think we will be doing any new release of v5.
v5 is eol. closing.
In case anyone was wondering what the fixed versions were I believe this is it:
dalvizu:~/git/dalvizu/node$ git tag --contains 1164f542
v4.5.0
v4.6.0
v4.6.1
v4.6.2
v4.7.0
v4.7.1
v4.7.2
Most helpful comment
@icefapper I got this stack trace by running node inside a debugger (e.g
lldb -- node lubelean.js
). This shows that the segfault crash occurs while gc is iterating some objects. I am continuing to look into this further.