Today and yesterday, I've observed instances where a playground execution seems to be successful (there's no error), but the output is truncated rather than complete.
Filing this issue for tracking purposes, will fill in more information later.
Edit: A workaround that is available is to modify the program (e.g., add whitespace) and re-run it until it works.
/cc @toothrot @bradfitz
Can be reproduced fairly reliably with https://play.golang.org/p/ctu9bUY9ufb.
See the constant comment there:
// Change this number to control amount of output.
//
// A value of 100 works okay always.
//
// Values between 1000-2000 sometimes work okay (a complete PNG image is shown),
// but result in truncated output other times (displayed as a broken PNG image).
size = 100
While investigating, to remove the browser as a variable, I wrote the following script:
curl 'https://play.golang.org/compile' --data 'version=3&body=package+main%0A%0Aimport+(%0A%09%22image%22%0A%09%22image%2Fcolor%22%0A%09%22image%2Fdraw%22%0A%09%22math%2Frand%22%0A%0A%09%22golang.org%2Fx%2Ftour%2Fpic%22%0A)%0A%0Aconst+(%0A%09%2F%2F+Change+this+number+to+control+amount+of+output.%0A%09%2F%2F%0A%09%2F%2F+A+value+of+100+works+okay+always.%0A%09%2F%2F%0A%09%2F%2F+Values+between+1000-2000+sometimes+work+okay+(a+complete+PNG+image+is+shown)%2C%0A%09%2F%2F+but+result+in+truncated+output+other+times+(displayed+as+a+broken+PNG+image).%0A%09size+%3D+1200%0A%0A%09randomSeed+%3D+1+%2F%2F+Change+this+number+to+force+new+executions.%0A)%0A%0Afunc+main()+%7B%0A%09m+%3A%3D+image.NewNRGBA(image.Rect(0%2C+0%2C+size%2C+size))%0A%09draw.Draw(m%2C+m.Bounds()%2C+Noise%7Brand.New(rand.NewSource(randomSeed))%7D%2C+image.Point%7B%7D%2C+draw.Src)%0A%09pic.ShowImage(m)%0A%7D%0A%0Atype+Noise+struct%7B+r+*rand.Rand+%7D%0A%0Afunc+(Noise)+ColorModel()+color.Model+%7B+return+color.Gray16Model+%7D%0Afunc+(Noise)+Bounds()+image.Rectangle+%7B%0A%09return+image.Rectangle%7Bimage.Point%7B-1e9%2C+-1e9%7D%2C+image.Point%7B1e9%2C+1e9%7D%7D%0A%7D%0Afunc+(n+Noise)+At(x%2C+y+int)+color.Color+%7B%0A%09return+color.Gray16%7Buint16(n.r.Intn(0xFFFF))%7D%0A%7D%0A&withVet=true' --compressed > raw.json
cat raw.json | jq .Events[0].Message > raw.image
cat raw.image | sed -e s/\"IMAGE:// | sed -e s/=.*$/=/ > whatever.base64
base64 -d whatever.base64 > foo.png
The final few characters of the base64 looks like this, in a bad example:
BgYVFVVGxsbgfr2fwEAAP//+vzpnB150HsAAAAASUVORK5CYII=\n"
The trailing \n makes it look like we've read and returned everything the program had to offer, and that looks a lot like a base64 encoded string to me.
It does fail base64 decoding though, so I am wondering if there's a bug in the base64 encoding that we're hitting?
It's also possible that we could be corrupting output read from the container somehow, but we might need to create a separate test to verify that.
I've verified that this can't be reproduced locally, so it's almost certain that output is getting corrupted somewhere in the chain.
Race in gvisor sandbox reading from stdout + Process.Wait or something? Haven't looked, but that sort of bug is common enough.
/cc @bcmills and I spent a lot of time making the Process.Wait pretty bulletproof here. I'd first thought that was an issue when the issue title was that the output was truncated, but the fact that we're reading the full base64 string, but with invalid characters in it makes me think it's something else.
In a failure scenario, the middle of the base64 string looks like this:
DhKSkq+f/+upKRkYmKiqqr6/v\u0000\u0000PB\u0011t���\u0018`\u0000\u0000FЛ17GRmZ6upqZ2fn0NDQR48
So we're either having trouble reading something from the container, or something else is writing concurrently to stdout and junking up our output.
I could reproduce it (sometimes, not 100%) by re-running the same curl command (size 1500, some random seed) two times and getting different output:
$ curl -s 'https://play.golang.org/compile' --data 'version=3&body=package+main%0A%0Aimport+(%0A%09%22image%22%0A%09%22image%2Fcolor%22%0A%09%22image%2Fdraw%22%0A%09%22math%2Frand%22%0A%0A%09%22golang.org%2Fx%2Ftour%2Fpic%22%0A)%0A%0Aconst+(%0A%09%2F%2F+Change+this+number+to+control+amount+of+output.%0A%09%2F%2F%0A%09%2F%2F+A+value+of+100+works+okay+always.%0A%09%2F%2F%0A%09%2F%2F+Values+between+1000-2000+sometimes+work+okay+(a+complete+PNG+image+is+shown)%2C%0A%09%2F%2F+but+result+in+truncated+output+other+times+(displayed+as+a+broken+PNG+image).%0A%09size+%3D+1500%0A%0A%09randomSeed+%3D+175432523+%2F%2F+Change+this+number+to+force+new+executions.%0A)%0A%0Afunc+main()+%7B%0A%09m+%3A%3D+image.NewNRGBA(image.Rect(0%2C+0%2C+size%2C+size))%0A%09draw.Draw(m%2C+m.Bounds()%2C+Noise%7Brand.New(rand.NewSource(randomSeed))%7D%2C+image.Point%7B%7D%2C+draw.Src)%0A%09pic.ShowImage(m)%0A%7D%0A%0Atype+Noise+struct%7B+r+*rand.Rand+%7D%0A%0Afunc+(Noise)+ColorModel()+color.Model+%7B+return+color.Gray16Model+%7D%0Afunc+(Noise)+Bounds()+image.Rectangle+%7B%0A%09return+image.Rectangle%7Bimage.Point%7B-1e9%2C+-1e9%7D%2C+image.Point%7B1e9%2C+1e9%7D%7D%0A%7D%0Afunc+(n+Noise)+At(x%2C+y+int)+color.Color+%7B%0A%09return+color.Gray16%7Buint16(n.r.Intn(0xFFFF))%7D%0A%7D%0A&withVet=true' --compressed > compile-output-1.txt
$ curl -s 'https://play.golang.org/compile' --data 'version=3&body=package+main%0A%0Aimport+(%0A%09%22image%22%0A%09%22image%2Fcolor%22%0A%09%22image%2Fdraw%22%0A%09%22math%2Frand%22%0A%0A%09%22golang.org%2Fx%2Ftour%2Fpic%22%0A)%0A%0Aconst+(%0A%09%2F%2F+Change+this+number+to+control+amount+of+output.%0A%09%2F%2F%0A%09%2F%2F+A+value+of+100+works+okay+always.%0A%09%2F%2F%0A%09%2F%2F+Values+between+1000-2000+sometimes+work+okay+(a+complete+PNG+image+is+shown)%2C%0A%09%2F%2F+but+result+in+truncated+output+other+times+(displayed+as+a+broken+PNG+image).%0A%09size+%3D+1500%0A%0A%09randomSeed+%3D+175432523+%2F%2F+Change+this+number+to+force+new+executions.%0A)%0A%0Afunc+main()+%7B%0A%09m+%3A%3D+image.NewNRGBA(image.Rect(0%2C+0%2C+size%2C+size))%0A%09draw.Draw(m%2C+m.Bounds()%2C+Noise%7Brand.New(rand.NewSource(randomSeed))%7D%2C+image.Point%7B%7D%2C+draw.Src)%0A%09pic.ShowImage(m)%0A%7D%0A%0Atype+Noise+struct%7B+r+*rand.Rand+%7D%0A%0Afunc+(Noise)+ColorModel()+color.Model+%7B+return+color.Gray16Model+%7D%0Afunc+(Noise)+Bounds()+image.Rectangle+%7B%0A%09return+image.Rectangle%7Bimage.Point%7B-1e9%2C+-1e9%7D%2C+image.Point%7B1e9%2C+1e9%7D%7D%0A%7D%0Afunc+(n+Noise)+At(x%2C+y+int)+color.Color+%7B%0A%09return+color.Gray16%7Buint16(n.r.Intn(0xFFFF))%7D%0A%7D%0A&withVet=true' --compressed > compile-output-2.txt
$ ls -la compile-output*
-rw-r--r-- 1 dmitshur primarygroup 7639457 May 4 12:16 compile-output-1.txt
-rw-r--r-- 1 dmitshur primarygroup 7639505 May 4 12:16 compile-output-2.txt
The JSON field "Message" contains some characters that are clearly not base64:
[...]
Hx8fBwbGzs0NJSbm9va2gqJDF\u0000\u0000PB\u0011t���\u0018`\u0000\u0000s�VV1fb29nv37p2dnamoqPz+/VtRUXFsbKympmZ
[...]
mZlBQkKqrq\u0000\u0000PB\u0011t���\u0018`\u0000\u0000r\u00006GF7ampubZ2VlXVxcKCoqLi4uenh
[...]
ERHBx8+/\u0000\u0000PB\u0011t���\u0018`\u0000\u0000q\u0000ZtVlZWY2NjGxsbLS0taB
[...]
It happened around offsets 69k, 168k, and 233k in the execution above.
Edit: In another run, I measured the deltas between the "\u0000\u0000PB" fragments more precisely, and they were exactly 65536 bytes (64 KiB) apart in most cases, sometimes a bit more (69632 bytes).
I’m curious if we should provide a method to avoid the cache to help with these types of issues. Uncertain whether it would be worth the effort but it would help in debugging at times.
a method to avoid the cache
I've been relying on adding whitespace in a program when I want to explicitly avoid the cache.
I've been relying on adding whitespace in a program when I want to explicitly avoid the cache.
Yes but that’s pretty tedious when you’re attempting to debug an issue.
the \u0000\u0000PB prefix in the corrupted segments indicates that the interrupting blob is generated by the faketime patch:
https://github.com/golang/go/blob/a38a917aee626a9b9d5ce2b93964f586bf759ea0/src/runtime/time_fake.go#L75-L78
@bcmills Thanks! That's consistent with the theory that something is writing to STDOUT in parallel. It seems to indicate that it's the program itself that is printing something in parallel (as only the snippet binary is built with the faketime patch).
Could it be a warning or error from the runtime? These images are "large", in that they're many megabytes, and we have hard constraints set on our sandbox containers.
By modifying the program to print to Stdout more efficiently, I haven't been able to trigger a broken image. I have been able to trigger a 500 with some large images, which means we may not handle capped output gracefully.
Another side thought: random noise doesn't compress well to the PNG output (or anywhere else for that matter). I've been able to generate larger images using the inefficient print with a random-offset gradient: https://play.golang.org/p/Ca7xtHuq7RX
The faketime framing happens at the syscall.Write level:
https://github.com/golang/go/blob/4c003f6b780b471afbf032438eb6c7519458855b/src/syscall/syscall_unix.go#L207
https://github.com/golang/go/blob/4c003f6b780b471afbf032438eb6c7519458855b/src/syscall/time_fake.go#L20-L26
https://github.com/golang/go/blob/4c003f6b780b471afbf032438eb6c7519458855b/src/runtime/sys_linux_amd64.s#L96-L103
So if the OS happens to split the write, as it is allowed to do, we can end up writing multiple headers within a given payload.
Where is the Playground-side code that parses this output? (Is it assuming any particular framing?)
I'm guessing that's where our bug will be found...
Oh-ho-ho! This is a fundamental problem with the faketime patch.
faketime writes a header followed by a payload length. However, nothing ensures that the OS will actually write that same length: it may write an arbitrarily truncated packet, in which case the header for the next packet will appear midway through what was declared as the previous payload.
I'm not very familiar with NaCL, but it is quite possible that NaCL always did complete writes. gVisor, like Linux, on the other hand, with certainly do partial writes.
I think the solution may be to rework faketime so that the locking (and header injection) occurs around the loop in (*internal/poll.FD).Write:
https://github.com/golang/go/blob/4c003f6b780b471afbf032438eb6c7519458855b/src/internal/poll/fd_unix.go#L244-L254
If we move the header-injection there, then we can guarantee that a the payload is written to completion atomically after each header, even if it happens to require multiple write syscalls.
Change https://golang.org/cl/232177 mentions this issue: pic: avoid triggering write issue in faketime
Here's a dead simple program to reproduce the issue:
https://play.golang.org/p/qiktfHPSqAy
That also explains why the updated program to have the png to base64 encoder print directly to stdout did not trigger this, as it's less likely to happen with smaller writes.
Change https://golang.org/cl/232867 mentions this issue: pic: don't make a string copy when writing image
Most helpful comment
I'm not very familiar with NaCL, but it is quite possible that NaCL always did complete writes. gVisor, like Linux, on the other hand, with certainly do partial writes.