A PR CI job failed due to a preexisting race: https://travis-ci.org/golang/dep/jobs/278159807
It appears to be cmd.CombinedOutput related.
I can circle back to this, but pinging @tamird since he touched that code recently via: #1180
Ugh, restarting that job ~gave me a #2270.2 but~ replaced 278159807, so now I can't get back to the ~#2270.1~ log.
I'll try to reproduce locally.
On Sep 21, 2017 08:34, "Jordan Krage" notifications@github.com wrote:
Ugh, restarting that job gave me a #2270.2 but replaced 278159807, so now
I can't get back to the #2270.1 log.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/dep/issues/1194#issuecomment-331142851, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABdsPMyviYtOIy3Ml22nmbCa5Bi7U4Orks5sklfFgaJpZM4PfPz5
.
So this is a real bummer - without at least knowing which test failed, it's hard to stress this in a reasonable way. Some of these tests manipulate the file system, and that's just slow and generally no fun.
Phooey. From what I can remember, it was regarding the CombinedOutput goroutine that uses time.AfterFunc. Could it have just been racing to set t? What do you think about this simplification?
-var t *time.Timer
-defer func() {
- if t != nil {
- t.Stop()
- }
-}()
// Adapted from (*os/exec.Cmd).Start
waitDone := make(chan struct{})
defer close(waitDone)
go func() {
select {
case <-c.ctx.Done():
if err := c.Cmd.Process.Signal(os.Interrupt); err != nil {
// If an error comes back from attempting to signal, proceed
// immediately to hard kill.
c.cancel()
} else {
- t = time.AfterFunc(time.Minute, c.cancel)
+ t := time.NewTimer(time.Minute)
+ select {
+ case <-t.C:
+ c.cancel()
+ case <-waitDone:
+ t.Stop()
+ }
}
case <-waitDone:
}
}()
if err := c.Cmd.Wait(); err != nil {
return nil, err
}
return b.Bytes(), nil
Ah, that's definitely a race! That simplification is close to what I'd write, but I'd still use AfterFunc; I'll send a PR.
The only other time.AfterFunc is in (sm *SourceMgr) HandleSignals. This seems more likely to be the problem. IIRC the stack indicated that the triggered func body was racing, and that first one is only calling a cancelFunc.
Most helpful comment
I'll try to reproduce locally.
On Sep 21, 2017 08:34, "Jordan Krage" notifications@github.com wrote: