Go: cmd/compile/internal/ssa: test TestNexting failing

Created on 5 Feb 2020  路  15Comments  路  Source: golang/go

What version of Go are you using (go version)?

$ go version
go version release-1_14-work.mailed linux/amd64

What operating system and processor architecture are you using?

Debian GNU/Linux rodete
amd64

What did you do?

Ran test from the src directory.
$ go/src> go test cmd

What did you expect to see?

Passing tests

What did you see instead?

```
--- FAIL: TestNexting (7.71s)
--- FAIL: TestNexting/dlv-dbg-hist (7.71s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
[recovered]
panic: There was an error writing 'b main.test
', write |1: broken pipe

goroutine 249 [running]:
testing.tRunner.func1.1(0xc84b60, 0xc0004aa3b0)
/home/chuck/Code/go/src/testing/testing.go:942 +0x3d0
testing.tRunner.func1(0xc00051d680)
/home/chuck/Code/go/src/testing/testing.go:945 +0x3f9
panic(0xc84b60, 0xc0004aa3b0)
/home/chuck/Code/go/src/runtime/panic.go:967 +0x15d
cmd/compile/internal/ssa_test.(ioState).writeReadExpect(0xc0002f7030, 0xcf606d, 0xc, 0xcf42fc, 0xa, 0x0, 0x0, 0xc000318000, 0x87)
/home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x299
cmd/compile/internal/ssa_test.(
delveState).start(0xc00002a1e0)
/home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x216
cmd/compile/internal/ssa_test.runDbgr(0xed5180, 0xc00002a1e0, 0x3e8, 0x2a)
/home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x35
cmd/compile/internal/ssa_test.testNexting(0xc00051d680, 0xcec462, 0x4, 0xc0000974d0, 0x7, 0xcec5f0, 0x5, 0x3e8, 0x1423490, 0x0, ...)
/home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x5dd
cmd/compile/internal/ssa_test.subTest.func1(0xc00051d680)
/home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0xae
testing.tRunner(0xc00051d680, 0xc000076690)
/home/chuck/Code/go/src/testing/testing.go:993 +0xdc
created by testing.(*T).Run
/home/chuck/Code/go/src/testing/testing.go:1044 +0x357
FAIL cmd/compile/internal/ssa 133.862s

NeedsInvestigation

Most helpful comment

How many (if any) of the builders have dlv installed?

Looking at the test body, it appears that we skip the test entirely otherwise (rather that, say, falling back to GDB). Is that intentional?

Given that Delve is a Go program, perhaps the test should instead download and build (a specific version of) the dlv binary locally if testenv.Builder() is non-empty and testing.Short() is false.

https://github.com/golang/go/blob/0efbd1015774a2d894138519f1efcf7704bb2d95/src/cmd/compile/internal/ssa/debug_test.go#L114
https://github.com/golang/go/blob/0efbd1015774a2d894138519f1efcf7704bb2d95/src/cmd/compile/internal/ssa/debug_test.go#L130-L136

All 15 comments

@dr2chase

Works for me.
I removed my build copy of dlv, discovered "/usr/bin/dlv", that worked okay.
I updated dlv to their tip, that worked okay.
Tried it on Mac with dlv completely removed, got expected skip.

I'd assume a flake, unless you can repeat this (this test can flake, worked pretty hard to deflake it but it's not zero). Maybe dlv has some extra behavior that I don't know about, depending on environment, hard to say.

How many (if any) of the builders have dlv installed?

Looking at the test body, it appears that we skip the test entirely otherwise (rather that, say, falling back to GDB). Is that intentional?

Given that Delve is a Go program, perhaps the test should instead download and build (a specific version of) the dlv binary locally if testenv.Builder() is non-empty and testing.Short() is false.

https://github.com/golang/go/blob/0efbd1015774a2d894138519f1efcf7704bb2d95/src/cmd/compile/internal/ssa/debug_test.go#L114
https://github.com/golang/go/blob/0efbd1015774a2d894138519f1efcf7704bb2d95/src/cmd/compile/internal/ssa/debug_test.go#L130-L136

We were running this on our workstations as part of the testing in the release process.

@bcmills I added dlv to the containers that run longtests, so it should be there.

Using gdb is a recipe for flaky awfulness, different versions behave differently, it can be sensitive to your Python installation, it's not an option on Macs anymore (as in, I have been unable to build and install it correctly despite multiple tries and many searches for how-to recipes -- I follow them, I do not end up debugging code as a non-root user).

This seems to be happening on linux-386-longtest builder specifically more often. It has happened on the post-submit builder on release-branch.go1.14 here, and in the SlowBot run here.

I'll do some more investigation to confirm how reproducible it is. It's worth keeping in mind that the linux-386-longtest builder was misconfigured to test linux/amd64 until it was resolved in CL 234520 recently, so past data may be misleading.

It is also happening on the release-branch.go1.13 branch. See the post-submit failure here, and the SlowBot run here.

From looking at the recent results (post-CL 234520) in the linux-386-longtest column at https://build.golang.org/?branch=release-branch.go1.14 and https://build.golang.org/?branch=release-branch.go1.13, it is very reproducible on linux/386.

I've been playing with gdb-on-Darwin, now that I figured out the truly secret handshake for code signing (use certificate name for email address; only Trust-Always for codesigning, leave the rest alone) and it works by hand, fails by program, no idea why. The new version of Delve continues to work just fine.

The version of dlv installed on the linux-386-longtest builder is:

root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $

The latest stable release at https://github.com/go-delve/delve/releases seems to be v1.4.1.

The new version of Delve continues to work just fine.

@dr2chase Which exact version did you try? Do you think a good next step would be to update dlv on the linux-386-longtest builder?

I just remembered @bcmills's https://github.com/golang/go/issues/37050#issuecomment-582914424 from earlier, that seems like another good option to consider, as it would make the test depend on the environment configuration less, and able to run in more environments.

I can reproduce the TestNexting/dlv-dbg-hist failure on linux/386 (not linux/amd64) consistently.


root@buildlet-linux-stretch-morecpu-rn5eea3af:/workdir/go/src# go version
go version devel gomote.XXXXX linux/386
root@buildlet-linux-stretch-morecpu-rn5eea3af:/workdir/go/src# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa 
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- FAIL: TestNexting (5.30s)
    --- FAIL: TestNexting/dlv-dbg-hist (5.30s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
 [recovered]
    panic: There was an error writing 'b main.test
', write |1: broken pipe


goroutine 7 [running]:
testing.tRunner.func1.1(0x883aa40, 0xb216010)
    /workdir/go/src/testing/testing.go:940 +0x27c
testing.tRunner.func1(0xb136140)
    /workdir/go/src/testing/testing.go:943 +0x349
panic(0x883aa40, 0xb216010)
    /workdir/go/src/runtime/panic.go:969 +0x122
cmd/compile/internal/ssa_test.(*ioState).writeReadExpect(0xb18e900, 0x8884b7d, 0xc, 0x8882dea, 0xa, 0x0, 0x0, 0xb1b6000, 0x63)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x206
cmd/compile/internal/ssa_test.(*delveState).start(0xb194000)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x1cc
cmd/compile/internal/ssa_test.runDbgr(0x8a6ada0, 0xb194000, 0x3e8, 0x2a)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x29
cmd/compile/internal/ssa_test.testNexting(0xb136140, 0x887af68, 0x4, 0xb0182b0, 0x7, 0x887b0f1, 0x5, 0x3e8, 0x8f4f148, 0x0, ...)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x54e
cmd/compile/internal/ssa_test.subTest.func1(0xb136140)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0x93
testing.tRunner(0xb136140, 0xb0104e0)
    /workdir/go/src/testing/testing.go:991 +0xb4
created by testing.(*T).Run
    /workdir/go/src/testing/testing.go:1042 +0x2ad
FAIL    cmd/compile/internal/ssa    5.303s
FAIL

Updating to dlv v1.4.1 makes the test pass. Edit: See https://github.com/golang/go/issues/37050#issuecomment-634981536.


root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go version
go version devel gomote.XXXXX linux/386
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa 
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- FAIL: TestNexting (5.32s)
    --- FAIL: TestNexting/dlv-dbg-hist (5.32s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
 [recovered]
    panic: There was an error writing 'b main.test
', write |1: broken pipe


goroutine 7 [running]:
testing.tRunner.func1.1(0x883aa40, 0x998c0c0)
    /workdir/go/src/testing/testing.go:940 +0x27c
testing.tRunner.func1(0x9936140)
    /workdir/go/src/testing/testing.go:943 +0x349
panic(0x883aa40, 0x998c0c0)
    /workdir/go/src/runtime/panic.go:969 +0x122
cmd/compile/internal/ssa_test.(*ioState).writeReadExpect(0x9990900, 0x8884b7d, 0xc, 0x8882dea, 0xa, 0x0, 0x0, 0x99b6000, 0x63)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x206
cmd/compile/internal/ssa_test.(*delveState).start(0x9996000)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x1cc
cmd/compile/internal/ssa_test.runDbgr(0x8a6ada0, 0x9996000, 0x3e8, 0x2a)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x29
cmd/compile/internal/ssa_test.testNexting(0x9936140, 0x887af68, 0x4, 0x9816270, 0x7, 0x887b0f1, 0x5, 0x3e8, 0x8f4f148, 0x0, ...)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x54e
cmd/compile/internal/ssa_test.subTest.func1(0x9936140)
    /workdir/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0x93
testing.tRunner(0x9936140, 0x98104e0)
    /workdir/go/src/testing/testing.go:991 +0xb4
created by testing.(*T).Run
    /workdir/go/src/testing/testing.go:1042 +0x2ad
FAIL    cmd/compile/internal/ssa    5.321s
FAIL
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# export PATH="/workdir/gopath/bin:$PATH"
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.4.1
Build: $Id: bda606147ff48b58bde39e20b9e11378eaa4db46 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- PASS: TestNexting (1.44s)
    --- PASS: TestNexting/dlv-dbg-hist (1.44s)
PASS
ok      cmd/compile/internal/ssa    1.442s
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# 

I tried Delve 1.4.1.

I just rebooted after a Catalina security or dot upgrade, and one of my copies of gdb now hangs from the keyboard, the other (same binary, different signing) does not. All of them still hang running under the control of TestNexting.

Upon further testing, it seems the problem was not the version of dlv, but the fact that the installed dlv binary was built with GOOS=linux GOARCH=amd64, and the same binary is being used for GOHOSTARCH=386 testing.

I can get the test to fail the same way (in the linux/386 configuration) if I install and use dlv binary with GO111MODULE=on GOARCH=amd64 go get github.com/go-delve/delve/cmd/[email protected] instead of GO111MODULE=on GOARCH=386 go get github.com/go-delve/delve/cmd/[email protected].

So, using an amd64 delve to debug a 386 Go binary leads to flakiness?
When this fails with gdb, is gdb also built to be amd64 instead of 386?

Delve says it doesn't support 386:

drchase@drchase1:~/work/go/src/cmd/compile/internal/ssa$ TERM=dumb dlv exec testdata/test-hist.dlv-dbg
unsupported architecture - only linux/amd64 and linux/arm64 are supported

and then it panics all over the place.

Still haven't gotten gdb to work consistently-properly on either Darwin or Linux; I have a fear that success depends not just on gdb version, but also on the version of Python that is compiled into it when gdb is built.

I've looked into how the linux-386-longtest builder is passing on tip, and found a similar (or duplicate) issue #37404. It's passing on tip because CL 227587 has added a skip for TestNexting.

So, using an amd64 delve to debug a 386 Go binary leads to flakiness?

Yes. It's not just flakiness, it's reproducible failure:

image

I think there are two distinct issues between this issue and #37404.

  1. The linux-386-longtest builder is failing reproducibly now. It's only visible after May 19 because before CL 234520 the builder was testing linux/amd64, not linux/386.

    I've opened a new issue #39309 for this problem specifically.

  2. The linux-amd64-longest builder has apparent deadlocks that happen infrequently.

    I think #37404 is better suited to track problem 2, so let's use that issue, and close this as duplicate.

Was this page helpful?
0 / 5 - 0 ratings