Node: Segmentation fault in node v5.*

Created on 25 Mar 2016 · 37Comments · Source: nodejs/node

Hello
while fiddling with the toy js parser I have made, i came across weird 139 (that is, Segmentation fault) errors which are driving me nuts. I've pruned every unnecessary part of the parser and looks like i have been able to isolate the error-causing case. i have very little experience using debugging tools to delve in this issue all by myself; i would hence be glad if someone could help me find out what is going on .

the faulty case: https://github.com/icefapper/lubejs
meminfo: https://gist.github.com/icefapper/3081ce0e4f1e8bb17314
my sys is : Linux 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux (specifically it is LM 17.1)
cpuinfo: https://gist.github.com/icefapper/5dc357ab7d751a7ceb86

i have been experiencing it on 5.6.0 and up (i.e, even with the latest stable 5.9.* which i even built from source)

Thanks a lot reading this far, and I hope you could reproduce the error case. Simply run the "run.sh" thing 40-times or so, and the read the "lubelean.log"; please note, though, that the "lubelean.log" already contains the logs I received by running 'run.sh' which means if you need your own logs, you must delete it and _then_ run 'run.sh'.

V8 Engine confirmed-bug

Source

icefapper

Most helpful comment

@icefapper I got this stack trace by running node inside a debugger (e.g lldb -- node lubelean.js). This shows that the segfault crash occurs while gc is iterating some objects. I am continuing to look into this further.

ofrobots on 25 Mar 2016

👍4 🎉1

All 37 comments

I can reproduce a segfault with just node lubelean.js, with this file: https://raw.githubusercontent.com/icefapper/lubejs/fd592cd28796cf3f557853b5011abcade6ea29fd/lubelean.js

Requires several runs, though.

It doesn't use any modules or Node.js API, except for the console. Most probably this is a v8 issue.

@icefapper could you try to produce a smaller testcase?

ChALkeR on 25 Mar 2016

👍2

Thanks for such a quick reply!
Actually, the reason lubelean.js is so big is because i thought someone might want to use other cases (that is, other js code snippets than the default one i have provided.) I'd be glad to make it smaller if it is necessary.

Regards,
Neni

icefapper on 25 Mar 2016

@icefapper I've done some initial cleanup in https://github.com/ChALkeR/lubejs/blob/master/lubelean.js.
Just removing methods that were not being called reduced the line count about three times.

ChALkeR on 25 Mar 2016

👍1

WoW! I just did some leaning myself; I confess though, it is by no means leaner than yours :\
https://github.com/icefapper/lubejs/raw/master/lubelean.js

icefapper on 25 Mar 2016

Stack trace:

  * frame #0: 0x0000000100377534 node`v8::internal::StoreBuffer::IteratePointersToNewSpace(void (*)(v8::internal::HeapObject**, v8::internal::HeapObject*)) + 1972
    frame #1: 0x0000000100313709 node`v8::internal::Heap::Scavenge() + 1273
    frame #2: 0x000000010031203f node`v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) + 879
    frame #3: 0x0000000100311a01 node`v8::internal::Heap::CollectGarbage(v8::internal::GarbageCollector, char const*, char const*, v8::GCCallbackFlags) + 689
    frame #4: 0x00000001002ca72c node`v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) + 108
    frame #5: 0x000000010051618e node`v8::internal::Runtime_AllocateInTargetSpace(int, v8::internal::Object**, v8::internal::Isolate*) + 110
    frame #6: 0x00000ec7dcd062d5
    frame #7: 0x00000ec7dce9ba92
    frame #8: 0x00000ec7dce9b513
    frame #9: 0x00000ec7dce9b513

ofrobots on 25 Mar 2016

👍3

@ofrobots sorry for asking a newbieish thing, but, could i ask what it means and how you acquird it? thanks a lot

icefapper on 25 Mar 2016

ofrobots on 25 Mar 2016

👍4 🎉1

frame #1: 0x000000010021fc93 node_g`v8::internal::Map::instance_size(this=0xdeadbeedbeadbeed) + 35 at objects-inl.h:4294
   4291
   4292
   4293 int Map::instance_size() {
-> 4294   return NOBARRIER_READ_BYTE_FIELD(
   4295       this, kInstanceSizeOffset) << kPointerSizeLog2;
   4296 }
   4297
(lldb) p this
(v8::internal::Map *) $1 = 0xdeadbeedbeadbeed
(lldb) up
frame #2: 0x000000010021faa1 node_g`v8::internal::HeapObject::SizeFromMap(this=0x00001d2ea2ae9929, map=0xdeadbeedbeadbeed) + 33 at objects-inl.h:4353
   4350
   4351
   4352 int HeapObject::SizeFromMap(Map* map) {
-> 4353   int instance_size = map->instance_size();
   4354   if (instance_size != kVariableSizeSentinel) return instance_size;
   4355   // Only inline the most frequent cases.
   4356   InstanceType instance_type = map->instance_type();
(lldb) p this
(v8::internal::HeapObject *) $2 = 0x00001d2ea2ae9929
(lldb) v8 print 0x00001d2ea2ae9928
<Smi: 7470>

EDIT: v8 print is not to be trusted.

ofrobots on 25 Mar 2016

I performed a bit more cleanup, the result is at https://github.com/ChALkeR/lubejs

ChALkeR on 25 Mar 2016

👍1

@ChALkeR thanks! I can reproduce the crash with vanilla d8; replacing console.log with print.

ofrobots on 25 Mar 2016

@ChALkeR funny, I thought the segv error is due to the stack getting repeatedly pushd and popd as the precedences fluctuated (in prseNonSeqExpr); you've proven me wrong :)

icefapper on 25 Mar 2016

laugh at me, but aside from the prevalent 139 i frequently got, there was also a 132 error (i.e, "invalid instruction") occasionally showing up, this time coupled with a core dump, and "slice" was always the first frame that appeard in the dump; now i'm not telling the slice is the culprit, but it just makes me quite suspicious about it

icefapper on 25 Mar 2016

Very curious. I see a bunch of deadbeef, and kin, in the area of the heap around we die while doing a GC walk. This is the only area on this heap page that I found deadbeef. https://gist.github.com/ofrobots/2e1c02541bfc0ebb675c

@icefapper I haven't been able to reproduce the invalid instruction crashes. That is interesting. These kinds of symptoms would indicate a memory-corruption type failure to me. If you do have those core dumps still available, I would be interested in know about what is in memory around the instruction the processor was trying to execute. I can give you the commands to run to get this data.

ofrobots on 25 Mar 2016

On a debug build, the GC zaps garbage memory blocks with deadbee*. This is more reproducible with a debug build, or if you add --verify-heap. This looks like a gc issue to me.

I can reproduce this with V8 4.5 (Node 4) as well.

ofrobots on 25 Mar 2016

@ofrobots, I reproduced the invalid instruction crash. But it happened only once out of many runs.

ChALkeR on 25 Mar 2016

@ChALkeR If you still have a debugger active, or have a core-dump, could you print some instructions around the crash point? Running x/20i $pc in lldb or gdb should work.

ofrobots on 25 Mar 2016

Opened V8 issue here: https://bugs.chromium.org/p/v8/issues/detail?id=4871

ofrobots on 25 Mar 2016

👍1

will try my best to reproduce the inv-instr error.

Also, thanks for confirming it's not a bug on my side. i was literally on the verge of a nervous breakdown.

icefapper on 25 Mar 2016

@ofrobots Sorry, I don't have that now. Perhaps running the original testcase overnight would help?

ChALkeR on 25 Mar 2016

The testcase is under 100 lines now.

ChALkeR on 25 Mar 2016

@ofrobots @ChALkeR
I have made bit of a progress:
after inlining (i.e, substituting with their bodies) the calls to this.loc, this.locBegin, and this.locOn in both the lean and the original testcase, SIGSEGV disappeard.

First I thought it might be that V8 does not like a function returning an object (silly, I cede, but experience has taught me that silly does not mean impossible.) I guess I was wrong there ; reason? take a look at this excellently leand testcase. Try fiddling with the object lterals that contain a 'loc:' by changing its value to {}, or by eliminating it altogether (there are only four of them I guess). I will try to share some of the variations that make the SIGSEGV disappear. I was so giddy finding out about it that I did not remember to save my findings :\ My take is that it has something to do with the convoluted way V8 might be handling the so-calld "hidden classes". As to why only 'loc' gets affected, I have no idea.

icefapper on 26 Mar 2016

after inlining (i.e, substituting with their bodies) the calls to this.loc, this.locBegin, and this.locOn in both the lean and the original testcase, SIGSEGV disappeard.

Yes, I noticed that, too. It's not as trivial, though. Only one place counts (take a look at my testcase — it has the call to .loc() in a single place).

Also notice how my testcase redefines loc() several times — removing that keeps the segfault, but it looks like that makes it less often for some reason. I'm not 100% sure on this, though.

The same about re-assigning tok — looks like its presense increases the chances to get a segfault.

ChALkeR on 26 Mar 2016

👍1

@ofrobots Update the testcase, please. It looks like the new one fails more often (and doesn't require the tok re-assigning anymore).

ChALkeR on 26 Mar 2016

Looks like the vm is gc'ing undefs; please give it a go

icefapper on 26 Mar 2016

further evidence for it : https://gist.github.com/icefapper/ee8e346c5eead603c855

icefapper on 26 Mar 2016

@icefapper Change it to head.type in the first example, and it will still crash. Note that head.type is defined =).

ChALkeR on 26 Mar 2016

Thanks ; could I ask the precise location to apply the change?

icefapper on 26 Mar 2016

@icefapper both of the lines you changed =)
Edit: that would be in the second example, in fact =)

ChALkeR on 26 Mar 2016

👍1

@ChALkeR
You are completely right ; funnily enough, it happens even with ({l: 12}).l instead of head.type

icefapper on 26 Mar 2016

v8 issue is fixed, is this still reproducable?

ChALkeR on 2 Jun 2016

The v8 issue is fixed, and afaik it landed in Node.js.
I can't reproduce on Node.js v6.2.1.

Closing. Feel free to comment and/or reopen =).

ChALkeR on 14 Jun 2016

The problem still exist on on v4.x and v5.x.

ofrobots on 15 Jun 2016

@ofrobots ~~Is there anything actionable here (i.e. is it feasible to backport the fix)?~~

Edit: ah, I see the PR for 4.x.

ChALkeR on 15 Jun 2016

Yes, I opened the PR for v4.x here: https://github.com/nodejs/node/pull/7303.

I am not sure there is enough runway left on v5.x for it to be worth fixing on that branch.

ofrobots on 15 Jun 2016

Can we close this? The fix has landed in v4.x-staging and I don't think we will be doing any new release of v5.

targos on 13 Jul 2016

v5 is eol. closing.

MylesBorins on 13 Jul 2016

In case anyone was wondering what the fixed versions were I believe this is it:

dalvizu:~/git/dalvizu/node$ git tag --contains 1164f542
v4.5.0
v4.6.0
v4.6.1
v4.6.2
v4.7.0
v4.7.1
v4.7.2

dalvizu on 6 Jan 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

installer does not run on vista x32 pae amd

jmichae3 · 3Comments

Crypto SHA1 - NodeJs v C#

filipesilvaa · 3Comments

stream, test: add test for _readableState.readingMore

mcollina · 3Comments

querystring should maybe support Buffers

addaleax · 3Comments

Thank you

vsemozhetbyt · 3Comments