Dep: cmd.CombinedOutput race condition

Created on 21 Sep 2017  Â·  6Comments  Â·  Source: golang/dep

A PR CI job failed due to a preexisting race: https://travis-ci.org/golang/dep/jobs/278159807

It appears to be cmd.CombinedOutput related.

I can circle back to this, but pinging @tamird since he touched that code recently via: #1180

Most helpful comment

I'll try to reproduce locally.

On Sep 21, 2017 08:34, "Jordan Krage" notifications@github.com wrote:

Ugh, restarting that job gave me a #2270.2 but replaced 278159807, so now
I can't get back to the #2270.1 log.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/dep/issues/1194#issuecomment-331142851, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABdsPMyviYtOIy3Ml22nmbCa5Bi7U4Orks5sklfFgaJpZM4PfPz5
.

All 6 comments

Ugh, restarting that job ~gave me a #2270.2 but~ replaced 278159807, so now I can't get back to the ~#2270.1~ log.

I'll try to reproduce locally.

On Sep 21, 2017 08:34, "Jordan Krage" notifications@github.com wrote:

Ugh, restarting that job gave me a #2270.2 but replaced 278159807, so now
I can't get back to the #2270.1 log.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/golang/dep/issues/1194#issuecomment-331142851, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABdsPMyviYtOIy3Ml22nmbCa5Bi7U4Orks5sklfFgaJpZM4PfPz5
.

So this is a real bummer - without at least knowing which test failed, it's hard to stress this in a reasonable way. Some of these tests manipulate the file system, and that's just slow and generally no fun.

Phooey. From what I can remember, it was regarding the CombinedOutput goroutine that uses time.AfterFunc. Could it have just been racing to set t? What do you think about this simplification?

-var t *time.Timer
-defer func() {
-   if t != nil {
-       t.Stop()
-   }
-}()
// Adapted from (*os/exec.Cmd).Start
waitDone := make(chan struct{})
defer close(waitDone)
go func() {
    select {
    case <-c.ctx.Done():
        if err := c.Cmd.Process.Signal(os.Interrupt); err != nil {
            // If an error comes back from attempting to signal, proceed
            // immediately to hard kill.
            c.cancel()
        } else {
-           t = time.AfterFunc(time.Minute, c.cancel)
+           t := time.NewTimer(time.Minute)
+           select {
+           case <-t.C:
+               c.cancel()
+           case <-waitDone:
+               t.Stop()
+           }
        }
    case <-waitDone:
    }
}()

if err := c.Cmd.Wait(); err != nil {
    return nil, err
}
return b.Bytes(), nil

Ah, that's definitely a race! That simplification is close to what I'd write, but I'd still use AfterFunc; I'll send a PR.

The only other time.AfterFunc is in (sm *SourceMgr) HandleSignals. This seems more likely to be the problem. IIRC the stack indicated that the triggered func body was racing, and that first one is only calling a cancelFunc.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rogpeppe picture rogpeppe  Â·  4Comments

jeffwillette picture jeffwillette  Â·  3Comments

sdboyer picture sdboyer  Â·  3Comments

cemremengu picture cemremengu  Â·  3Comments

jiharal picture jiharal  Â·  3Comments