The "fatal only for main goroutine" nature of T.FailNow is a source of bugs when writing tests. We should either remove it or improve it's usability for Go 2. See issues #24678, #15758.
Option 0:
Make misuse complain loudly. Rejected in #15758.
Option 1:
Just remove it (along with its friends Fatal and Fatalf). Whether SkipNow, Skip, Skipf are removed is possible too. With them gone, there is no temptation to misuse it.
Option 2:
Expand the functionality of FailNow to be callable from any goroutine. The behavior would be such that it tears down only the goroutine it is called form and also cancels a Context. Other goroutines can synchronize of the context for their own termination. This presumes that Contexts are added to testing. See https://github.com/golang/go/issues/18368#issuecomment-341496336. This option can be done in Go 1.
\cc @neild @bcmills @ash2k
I vote for option 0.
How would (1) work, in practice? Every t.Fatal() becomes t.Fatal(); return or something?
It sounds like (2) would require action from users to opt into the correct thing, so it wouldn't directly prevent the same misuse that happens today.
I still think we should do #15758. AFAICT that thread doesn't actually contain any reasons why it's infeasible. I don't see why the testing package can't use horrible hacks as long as they're hidden in its implementation.
I've seen this mistake many, many times while reviewing Go code.
How would (1) work, in practice?
Personally, I would probably use panic instead. In a top-level test function it has the same effect, and in a background goroutine it more obviously terminates the entire binary.
It sounds like (2) would require action from users to opt into the correct thing, so it wouldn't directly prevent the same misuse that happens today.
Agreed.
Option (2) is still missing some sort of synchronization. If we terminate the test, how do we prevent it from interfering with other tests? In particular, how do we distinguish goroutines started by lazy library initialization from goroutines associated with the test function itself?
We should revisit #15758 and complain loudly when FailNow is called from the wrong goroutine.
If we're willing to go further then we should just do os.Exit(1) on such an illegal invocation. The test is basically unsalvageable at this point and the whole thing should just die. If that's too far for Go 1 then it should be considered for Go 2.
Either way we should still make FailNow complain in Go 1. It'll save a lot of pain.
In #15758 @rsc stated that there's no way to tell if t.Fatal is called from the "right" goroutine, but I don't think that's true. Fatal could examine the stack to see if it has an appropriate function from the testing package in its ancestry.
Exactly, that doesn't even require exposing goroutine IDs, which is what stalled the original discussion.
I believe there are two independent questions that this issue conflates:
T.FailNow is documented as being required to be called from the main test goroutine. Nothing enforces this, and users frequently ignore this requirement. The requirement should either be enforced or removed.
Many, many tests suffer from race conditions caused by test goroutines outliving the test. (e.g., @dsnet's example in #15758.) Perhaps something can be done to make these errors less common. If so, the solution might address tests in particular or concurrency in general.
These questions can be addressed independently. We could just permit T.FailNow to be called from any goroutine with a simple documentation change, acknowledging the current state of affairs.
Such a documentation change won't fix any of the problems caused by calling FailNow outside of the test. This seems like a bad option.
Let's just do #15758.
Most helpful comment
In #15758 @rsc stated that there's no way to tell if t.Fatal is called from the "right" goroutine, but I don't think that's true. Fatal could examine the stack to see if it has an appropriate function from the testing package in its ancestry.