I'm pretty confident we've encountered a race in the Pod Utilities between entrypoint and sidecar.
Specifically, we saw a job whose test binary exited with status code 0, but was marked as a failure in finished.json because the sidecar failed to parse the marker file contents into an int (the marker file existed, but was empty). I think this is because we don't write the marker file atomically from entrypoint so sidecar is able to read the file contents before they are written.
Here is a demo:
https://play.golang.org/p/_XfPQLp42O-
The easiest way to get atomicity is probably to have entrypoint write to a temp file and then rename it to the marker file location.
/area prow
/kind bug
/cc @stevekuznetsov @kargakis
cc @steuhs
Using a temp file and renaming is really the only way. Thought this also only works if they are on the same filesystem.
/assign
On Thu, Jun 28, 2018, 21:23 Cole Wagner notifications@github.com wrote:
I'm pretty confident we've encountered a race in the Pod Utilities between
entrypoint and sidecar.
Specifically, we saw a job whose test binary exited with status code 0,
but was marked as a failure in finished.json because the sidecar failed
to parse the marker file contents into an int (the marker file existed, but
was empty). I think this is because we don't write the marker file
atomically from entrypoint so sidecar is able to read the file contents
before they are written.
Here is a demo:
https://play.golang.org/p/_XfPQLp42O-The easiest way to get atomicity is probably to have entrypoint write to
a temp file and then rename it to the marker file location./area prow
/kind bug
/cc @stevekuznetsov https://github.com/stevekuznetsov @kargakis
https://github.com/kargakis
cc @steuhs https://github.com/steuhs—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/test-infra/issues/8509, or mute the thread
https://github.com/notifications/unsubscribe-auth/AA4Bq4gJxDU0CtYC5V0M4z72ia20AfXjks5uBau0gaJpZM4U8av_
.
Also this is the best bug title so:

should be fixed in #8510
/close
Most helpful comment
Also this is the best bug title so:
