Using iterm2 on OSX, whenever I resize my terminal while using the CLI, the thing crashes. It spews a ton of fatal: morestack on g0, and, after a few pathetic seconds, Illegal instruction: 4.
Assigning to @knz in the hope that his libedit transition will magically solve this.
Happens for me too. I never resize iterm2 (always full screen) so there's a
chance this has been happening for a long time.
On Mon, Oct 9, 2017 at 1:41 PM Andrei Matei notifications@github.com
wrote:
Using iterm2 on OSX, whenever I resize my terminal while using the CLI,
the thing crashes. It spews a ton of fatal: morestack on g0, and, after a
few pathetic seconds, Illegal instruction: 4.Assigning to @knz https://github.com/knz in the hope that his libedit
transition will magically solve this.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/19132, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AE135FB67QJ2X0HNpgFTaTAHJtrMEdb3ks5sqlrPgaJpZM4Py1IF
.>
-- Tobias
Thanks, if you could launch this in gdb and get me the stack trace too that would be swell.
@benesch provides:
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
frame #0: 0x000000000405d1b2 cockroach`runtime.morestack + 34
cockroach`runtime.morestack:
-> 0x405d1b2 <+34>: movq 0x50(%rbx), %rsi
0x405d1b6 <+38>: cmpq %rsi, %gs:0x8a0
0x405d1bf <+47>: jne 0x405d1c8 ; <+56>
0x405d1c1 <+49>: callq 0x4031b00 ; runtime.badmorestackgsignal
which points me to: https://golang.org/pkg/os/signal/#hdr-Go_programs_that_use_cgo_or_SWIG
If the non-Go code installs any signal handlers, it must use the SA_ONSTACK flag with sigaction. Failing to do so is likely to cause the program to crash if the signal is received.
Indeed libedit does not do this. Thankfully libedit also supports a mode where the caller deals with signals itself. Will use that.
Worked on this all weekend :sob:
So the core of the issue is that osx/macos provides the POSIX API for signal handling but does not follow the POSIX semantics for signal delivery, at least not from the perspective of the C code called from cgo. [Technical detail: when a signal is delivered and a custom handler is called, if the custom handler calls sigaction to restore the original handler and then raises the signal again, macos incorrectly invokes the custom handler instead of the original handler, and apparently doesn't properly restore the stack afterwards. This doesn't happen on other OSes.] This may be either a problem in the macOS implementation of signals (it's not a pure BSD after all -- signals are implemented on top of mach), or the way cgo is "preparing" the C context (it does play with signal handling in a weird way, I haven't yet fully understand the Go runtime code to do this on darwin).
I think the problem is in macOS and not in cgo however, because a simple program using the original libedit code in osx seems to also suffer from this problem; while investigating this I discovered there is a bug in the way libedit handles signals, which shows up on non-pure-BSD systems, including macos and linux, which makes libedit appear to work on these systems (it doesn't run into the crash described above on window resizes) by failing to support some other things (e.g. you can't Ctrl+Z two times in a row and expect it to work properly).
It is because I am trying to also fix this bug in libedit in my go-libedit code (by improving the way it sets up the signal handler), that I am causing it to run into the macos weird behavior and ultimately this crash (or, in the alternative new version upstream, into an infinite loop).
I have not yet succeeded in reverse-engineering what macos is precisely doing; for this I would need a working gdb but I was unsuccessful in getting gdb to work due to the silly binary protection scheme in osx 10.12+. "printf-based debugging" doesn't really work with signal code because entering the libc in a signal handler changes the scheduling of the delivery somehow, so I really need a debugger for that.
There are several ways forward I intend to explore:
I might go for option 4 as a stop-gap, then work on a combination of 1 and 2 until either I find the problem or I give up, and only then go for option 3.
cc @benesch for suggestions
I _think_ this is a related problem since I believe I never had this problem until the resizing bug also came up (and I have a tiled manager, so my windows resize very frequently).
After getting the error, I'd close the terminal which I think will not force quit cockroach sql, and in the case this bug happens I believe the program never catches the quit signal (or w.e. signal) and just continues to run in the background, and become a huge battery hog (and make everything sluggish :sob:).
joey 94263 120.2 0.1 556700824 18892 ?? R 1:08pm 31:22.16 cockroach sql --insecure -d test
joey 64719 61.3 0.0 556686296 300 ?? S 12Oct17 3408:26.14 cockroach sql --insecure -d test

Edit: Oh yea, it's the same issue as you mentioned the newer version gets into an infinite loop so this seems like it, phew. Is there anything we mac users can do to help?
Yep, same bug. The poor thing doesn't register SIGHUP properly either.
As to what to do to workaround: use a release-1.1 client (will work with master servers too).
I plan to work on this tomorrow / this weekend.
Found it, fixed it. 😎
:)
Sorry to revive this :sob:. This still seems to happen if the terminal is resized while a query is being executed. Fortunately, that's far less common and annoying.
Thanks for finding this out. Don't be sorry :)
As far as I can see this is again macOS-specific.
I requested some solution on the Go side here:
I made a typo in my fix...