Cockroach: cli: cli crashes when I resize my terminal

Created on 9 Oct 2017  Â·  13Comments  Â·  Source: cockroachdb/cockroach

Using iterm2 on OSX, whenever I resize my terminal while using the CLI, the thing crashes. It spews a ton of fatal: morestack on g0, and, after a few pathetic seconds, Illegal instruction: 4.

Assigning to @knz in the hope that his libedit transition will magically solve this.

C-bug

All 13 comments

Happens for me too. I never resize iterm2 (always full screen) so there's a
chance this has been happening for a long time.

On Mon, Oct 9, 2017 at 1:41 PM Andrei Matei notifications@github.com
wrote:

Using iterm2 on OSX, whenever I resize my terminal while using the CLI,
the thing crashes. It spews a ton of fatal: morestack on g0, and, after a
few pathetic seconds, Illegal instruction: 4.

Assigning to @knz https://github.com/knz in the hope that his libedit
transition will magically solve this.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/cockroachdb/cockroach/issues/19132, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AE135FB67QJ2X0HNpgFTaTAHJtrMEdb3ks5sqlrPgaJpZM4Py1IF
.

>

-- Tobias

Thanks, if you could launch this in gdb and get me the stack trace too that would be swell.

@benesch provides:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=EXC_I386_BPT, subcode=0x0)
   frame #0: 0x000000000405d1b2 cockroach`runtime.morestack + 34
cockroach`runtime.morestack:
->  0x405d1b2 <+34>: movq   0x50(%rbx), %rsi
   0x405d1b6 <+38>: cmpq   %rsi, %gs:0x8a0
   0x405d1bf <+47>: jne    0x405d1c8                 ; <+56>
   0x405d1c1 <+49>: callq  0x4031b00                 ; runtime.badmorestackgsignal

which points me to: https://golang.org/pkg/os/signal/#hdr-Go_programs_that_use_cgo_or_SWIG

If the non-Go code installs any signal handlers, it must use the SA_ONSTACK flag with sigaction. Failing to do so is likely to cause the program to crash if the signal is received.

Indeed libedit does not do this. Thankfully libedit also supports a mode where the caller deals with signals itself. Will use that.

Worked on this all weekend :sob:

So the core of the issue is that osx/macos provides the POSIX API for signal handling but does not follow the POSIX semantics for signal delivery, at least not from the perspective of the C code called from cgo. [Technical detail: when a signal is delivered and a custom handler is called, if the custom handler calls sigaction to restore the original handler and then raises the signal again, macos incorrectly invokes the custom handler instead of the original handler, and apparently doesn't properly restore the stack afterwards. This doesn't happen on other OSes.] This may be either a problem in the macOS implementation of signals (it's not a pure BSD after all -- signals are implemented on top of mach), or the way cgo is "preparing" the C context (it does play with signal handling in a weird way, I haven't yet fully understand the Go runtime code to do this on darwin).

I think the problem is in macOS and not in cgo however, because a simple program using the original libedit code in osx seems to also suffer from this problem; while investigating this I discovered there is a bug in the way libedit handles signals, which shows up on non-pure-BSD systems, including macos and linux, which makes libedit appear to work on these systems (it doesn't run into the crash described above on window resizes) by failing to support some other things (e.g. you can't Ctrl+Z two times in a row and expect it to work properly).

It is because I am trying to also fix this bug in libedit in my go-libedit code (by improving the way it sets up the signal handler), that I am causing it to run into the macos weird behavior and ultimately this crash (or, in the alternative new version upstream, into an infinite loop).

I have not yet succeeded in reverse-engineering what macos is precisely doing; for this I would need a working gdb but I was unsuccessful in getting gdb to work due to the silly binary protection scheme in osx 10.12+. "printf-based debugging" doesn't really work with signal code because entering the libc in a signal handler changes the scheduling of the delivery somehow, so I really need a debugger for that.

There are several ways forward I intend to explore:

  1. finding a working gdb on osx to troubleshoot this further; any suggestions would be welcome.
  2. finding out if lldb can also help. So far I haven't succeeded in asking lldb to set a breakpoint on signal delivery, so more study is needed.
  3. reverse-engineering the signalling code in GNU readline, which doesn't suffer from the specific bug I found in libedit and does appear to work despite the macos weird semantics, and do the same it does in go-libedit.
  4. un-patching my libedit bug fix when building on OSX, and hope the users don't see the few problems this implies.

I might go for option 4 as a stop-gap, then work on a combination of 1 and 2 until either I find the problem or I give up, and only then go for option 3.

cc @benesch for suggestions

I _think_ this is a related problem since I believe I never had this problem until the resizing bug also came up (and I have a tiled manager, so my windows resize very frequently).

After getting the error, I'd close the terminal which I think will not force quit cockroach sql, and in the case this bug happens I believe the program never catches the quit signal (or w.e. signal) and just continues to run in the background, and become a huge battery hog (and make everything sluggish :sob:).

joey             94263 120.2  0.1 556700824  18892   ??  R     1:08pm  31:22.16 cockroach sql --insecure -d test
joey             64719  61.3  0.0 556686296    300   ??  S    12Oct17 3408:26.14 cockroach sql --insecure -d test

screen shot 2017-10-19 at 3 51 46 pm

Edit: Oh yea, it's the same issue as you mentioned the newer version gets into an infinite loop so this seems like it, phew. Is there anything we mac users can do to help?

Yep, same bug. The poor thing doesn't register SIGHUP properly either.

As to what to do to workaround: use a release-1.1 client (will work with master servers too).
I plan to work on this tomorrow / this weekend.

Found it, fixed it. 😎

:)

Sorry to revive this :sob:. This still seems to happen if the terminal is resized while a query is being executed. Fortunately, that's far less common and annoying.

Thanks for finding this out. Don't be sorry :)

As far as I can see this is again macOS-specific.

I requested some solution on the Go side here:

https://github.com/golang/go/issues/22805

I made a typo in my fix...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xudongzheng picture xudongzheng  Â·  3Comments

magaldima picture magaldima  Â·  3Comments

danhhz picture danhhz  Â·  3Comments

rafiss picture rafiss  Â·  3Comments

otan picture otan  Â·  4Comments