go version)?$ go version go version go1.15 linux/amd64
I'm able to repro this in since go1.15, including the latest go1.15.2.
go env)?go env Output
$ go env
GO111MODULE="off"
GOARCH="amd64"
GOBIN="/usr/local/google/home/yekuang/infra/go/bin"
GOCACHE="/usr/local/google/home/yekuang/infra/go/.cache"
GOENV="/usr/local/google/home/yekuang/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/google/home/yekuang/infra/go/.vendor/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/yekuang/infra/go/.vendor:/usr/local/google/home/yekuang/infra/go"
GOPRIVATE=""
GOPROXY="off"
GOROOT="/usr/local/google/home/yekuang/golang/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/yekuang/golang/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build699870689=/tmp/go-build -gno-record-gcc-switches"
We have compressed the same data using zlib.NewWriterLevel(writer, zlib.BestSpeed), and found that the output between go1.15 and go1.14.2 were different. This difference has eventually led to a data corruption when we uploaded the compressed file to GCS.
Here's the minimal reproducible example I have:
package main
import (
"bufio"
"compress/zlib"
"flag"
"fmt"
"io"
"os"
"path/filepath"
)
func main() {
var filename string
flag.StringVar(&filename, "f", "", "filename")
flag.Parse()
fi, err := os.Open(filename)
if err != nil {
panic(err)
}
defer fi.Close()
outname := filepath.Base(filename) + "-gzip"
fo, err := os.Create(outname)
if err != nil {
panic(err)
}
defer fo.Close()
fmt.Printf("%s -> %s\n", filename, outname)
const outBufSize = 1024 * 1024
foWr := bufio.NewWriterSize(fo, outBufSize)
compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)
buf := make([]byte, outBufSize*3)
if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
compressor.Close()
panic(err)
}
// compressor needs to be closed first to flush the rest of the data
// into the bufio.Writer
if err := compressor.Close(); err != nil {
panic(err)
}
if err := foWr.Flush(); err != nil {
panic(err)
}
}
The input data was too big to be shared (2.5G). But I can share the data internally (FYI, my LDAP is yekuang@).
No difference in the compressed data between go1.14.2 and go.1.15.
Compressed data were different.
Size of the compressed data:
go1.14.2: 728571269go1.15: 728570333I also did a cmp, and they started to differ at byte 363266597.
Let me know if you need more information, thanks!
cc @dsnet
FWIW, this could be related to the input file size. I tested the script with a smaller file (102MB). The outcomes were the same between different Go versions.
This difference has eventually led to a data corruption when we uploaded the compressed file to GCS.
Using decompressor is not sufficient to verify data corruption?
And maybe CLs for https://github.com/golang/go/issues/34121 is related
It is unclear to me whether this issue is about the compressed output simply being different or that the compressed output is actually invalid (i.e., cannot be decompressed). Can you please clarify?
If the output is valid (e.g., can be decompressed), then this is working as expected. The compression libraries make no guarantees that the output remains stable for all time. While that guarantee can be useful in some contexts, if unfortunately means that we can never make changes to the compression algorithm either to improve the speed or to improve the compression ratio, both which are properties that are generally considered more important than stability.
Thanks for the advice. Unfortunately, It looks like the data was indeed corrupted, as I got an invalid checksum error when trying to decompress the outcome of the compression.
I added the decompression logic:
package main
import (
"bufio"
"compress/zlib"
"flag"
"fmt"
"io"
"os"
"path/filepath"
"runtime"
"strings"
)
var fileSplit = "---"
func getBaseName(p string) string {
s := filepath.Base(p)
return strings.Split(s, fileSplit)[0]
}
func doCompression(filename string) error {
fi, err := os.Open(filename)
if err != nil {
return err
}
defer fi.Close()
outname := fmt.Sprintf("%s%scompressed-%s", getBaseName(filename), fileSplit, runtime.Version())
fo, err := os.Create(outname)
if err != nil {
return err
}
defer fo.Close()
const outBufSize = 1024 * 1024
foWr := bufio.NewWriterSize(fo, outBufSize)
compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)
fmt.Printf("Compress: %s -> %s\n", filename, outname)
buf := make([]byte, outBufSize*3)
if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
compressor.Close()
return err
}
// compressor needs to be closed first to flush the rest of the data
// into the bufio.Writer
if err := compressor.Close(); err != nil {
return err
}
if err := foWr.Flush(); err != nil {
return err
}
return nil
}
func doDecompression(filename string) error {
fi, err := os.Open(filename)
if err != nil {
return err
}
defer fi.Close()
fiRd, err := zlib.NewReader(fi)
if err != nil {
return err
}
outname := fmt.Sprintf("%s%sdecompressed-%s", getBaseName(filename), fileSplit, runtime.Version())
fo, err := os.Create(outname)
if err != nil {
return err
}
defer fo.Close()
const outBufSize = 1024 * 1024
foWr := bufio.NewWriterSize(fo, outBufSize)
fmt.Printf("Decompress: %s -> %s\n", filename, outname)
buf := make([]byte, outBufSize*3)
if _, err := io.CopyBuffer(foWr, fiRd, buf); err != nil {
fiRd.Close()
return err
}
if err := fiRd.Close(); err != nil {
return err
}
if err := foWr.Flush(); err != nil {
return err
}
return nil
}
func main() {
var filename string
var decompression bool
flag.StringVar(&filename, "f", "", "filename")
flag.BoolVar(&decompression, "d", false, "should decompress")
flag.Parse()
var err error
if decompression {
err = doDecompression(filename)
} else {
err = doCompression(filename)
}
if err != nil {
panic(err)
}
}
I first tested this with go1.14.2, by compressing the input, and then decompressing the result of the previous step. The decompressed output is the same as the original input.
$ ./cli -f ~/chromium/src/out/Release/content_browsertests
Compress: ~/chromium/src/out/Release/content_browsertests -> content_browsertests---compressed-go1.14.2
$ ./cli -f content_browsertests---compressed-go1.14.2 -d
Decompress: content_browsertests---compressed-go1.14.2 -> content_browsertests---decompressed-go1.14.2
$ diff ~/chromium/src/out/Release/content_browsertests content_browsertests---decompressed-go1.14.2
# No diff
Then I switched to go1.15 and did the same:
$ ./cli -f ~/chromium/src/out/Release/content_browsertests
Compress: ~/chromium/src/out/Release/content_browsertests -> content_browsertests---compressed-go1.15
$ ./cli -f content_browsertests---compressed-go1.15 -d
Decompress: content_browsertests---compressed-go1.15 -> content_browsertests---decompressed-go1.15
panic: zlib: invalid checksum
goroutine 1 [running]:
main.main()
.../main.go:105 +0x1f9
And there was a difference between the original input and the decompressed file.
A bit more update:
I used go1.15 to decompress the data produced by go1.14.2. This time the decompression went OK, and the decompressed output matched the original data. So it seems that at least the decompression part in go1.15 is good.
On the other hand, I also tried this vice versa -- using go.1.14.2 to decompress the data produced by go1.15. I still got the invalid checksum error, and the decompressed data didn't even match content_browsertests---decompressed-go1.15.
I need more information to investigate this. The code snippets provided only do basic compression/decompression. I need real test data that exhibits the problem.
Sent you the data link in an email.
Does this issue not happen if you use randomly generated data having similar size?
No it didn't happen with random data. I tested with /bin/dd if=/dev/random of=rand_bytes.dat bs=1M count=3000 to produce a 3GB file, and the script worked fine. Although to be fair, little compression can be achieved on random data (if the data are truly randomized).
You might be able to shrink the test case down from 3GB with https://github.com/dgryski/go-ddmin.
I was at first able to reproduce only a part of this. I downloaded this small dataset and appended it to a file 28 times until it was 1139520480 bytes.
I could then get a zlib file of 291858894 bytes in 1.14 and 291858627 bytes in 1.15. The first time the output changed was in 8d4330742c1866faa8b1ef575877e5afb8a4355c.
cmp: foo14 foo15 differ: char 275153560, line 1014680
BUT, zlib reader in both 1.14 and 1.15 can read each other's output, so there doesn't appear to be any corruption there, just a change in layout.
Then I tried it on a different file (~4.8 GB), sadly a proprietary dataset, and got a bit further. When decompressing the just-compressed file to calculate its checksum, I get a panic
$ go run cmp.go -f filename.csv
panic: zlib: invalid checksum
goroutine 1 [running]:
main.sumZFile(0xc00001a094, 0xc, 0xc0000141c0, 0x40)
/Users/okokes/tmp/zlib/cmp.go:37 +0x29b
main.main()
/Users/okokes/tmp/zlib/cmp.go:53 +0x1a5
exit status 2
there's a difference here: all14.gz all15.gz differ: char 199653755, line 768787
git bisect points to the same commit as before though.
It's a shame I can't share this file, not even privately, but I can try and run it again using a different Go build to see if it's fixed.
edited code to (de)compress
package main
import (
"bufio"
"compress/zlib"
"crypto/sha256"
"flag"
"fmt"
"io"
"log"
"os"
"path/filepath"
)
func sumFile(filename string) string {
f, err := os.Open(filename)
if err != nil {
panic(err)
}
hasher := sha256.New()
if _, err := io.Copy(hasher, f); err != nil {
panic(err)
}
return fmt.Sprintf("%x", hasher.Sum(nil))
}
func sumZFile(filename string) string {
f, err := os.Open(filename)
if err != nil {
panic(err)
}
zr, err := zlib.NewReader(f)
if err != nil {
panic(err)
}
hasher := sha256.New()
if _, err := io.Copy(hasher, zr); err != nil {
panic(err)
}
return fmt.Sprintf("%x", hasher.Sum(nil))
}
func main() {
var filename string
flag.StringVar(&filename, "f", "", "filename")
flag.Parse()
outname, err := compress(filename)
if err != nil {
log.Fatal(err)
}
out := sumZFile(outname)
inc := sumFile(filename)
if inc != out {
log.Fatal("mismatch")
}
}
func compress(filename string) (string, error) {
fi, err := os.Open(filename)
if err != nil {
return "", err
}
defer fi.Close()
outname := filepath.Base(filename) + "-gzip"
fo, err := os.Create(outname)
if err != nil {
return "", err
}
defer fo.Close()
// fmt.Printf("%s -> %s\n", filename, outname)
const outBufSize = 1024 * 1024
foWr := bufio.NewWriterSize(fo, outBufSize)
compressor, err := zlib.NewWriterLevel(foWr, zlib.BestSpeed)
buf := make([]byte, outBufSize*3)
if _, err := io.CopyBuffer(compressor, fi, buf); err != nil {
compressor.Close()
return "", err
}
// compressor needs to be closed first to flush the rest of the data
// into the bufio.Writer
if err := compressor.Close(); err != nil {
return "", err
}
if err := foWr.Flush(); err != nil {
return "", err
}
return outname, nil
}
bisection code
package main
import (
"log"
"os/exec"
)
func main() {
build := exec.Command("/Users/okokes/git/go/src/make.bash")
build.Dir = "/Users/okokes/git/go/src"
err := build.Run()
if err != nil {
log.Fatal("toolchain build failed", err)
}
cmd := exec.Command("/Users/okokes/git/go/bin/go", "run", "cmp.go", "-f", "all.csv")
cmd.Dir = "/Users/okokes/git/go/src/"
err = cmd.Run()
if err != nil {
log.Fatal("program build failed", err)
}
}
@kokes, difference in output in itself is not particularly interesting since the compress/flate library provides no guarantees about output stability. However, producing corrupted output is something we take seriously.
Thank you @k-ye. I am able to reproduce which does indeed indicate that it's producing corrupted output. I'm bisecting the root cause now (may very well be the CLs related to #34121 that you pointed to earlier).
Adding to Go1.15.3 milestone since this is a regression relative to Go1.14 and a serious one.
@dsnet, requesting a backport adds it to 1.15.x milestone. This issue should target 1.16.
Thanks @networkimprov. I've been a bit distant from Go toolchain development for a while, so I've gotten behind on the latest in processes.
@gopherbot please open a backport issue for Go 1.15.
The compress/flate package can produce corrupt output. There is no workaround.
Backport issue(s) opened: #41463 (for 1.15).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
Bisect implicates https://go-review.googlesource.com/c/go/+/193605
\cc @klauspost
@dsnet Anyway you can share the reproducer? My github name @gmail.com
Not sure whether this is helpful, but it seems setting bufferReset = maxMatchOffset<<4 - maxStoreBlockSize*2 makes tests fail. I would think that changing that constant shouldn't make tests fail.
@klauspost should lines https://github.com/golang/go/blob/master/src/compress/flate/deflatefast.go#L296 read:
for i := range e.table[:] {
v := e.table[i].offset - e.cur + maxMatchOffset
if v < 0 {
e.table[i] = tableEntry{}
continue
}
e.table[i].offset = v
}
At least this seems to fix the tests for the smaller bufferReset value.
@egonelbre This also fixes my reproducer. (A 3.6GB file of concatenated Go source files from my $GOPATH).
@egonelbre fixes my reproducer as well
If I am reading the code right it looks to me like any input over 2,147,352,577 bytes (bufferReset) could be corrupted, and it's more likely for corruption to occur at every multiple of 32,768 bytes (maxMatchOffset) over that. Is that accurate ?
@egonelbre also fixed my reproducer as well 🎉
@egonelbre No, that cannot be correct. The value may not matter, that will only fix 16383 out of 16384 cases, but will still result in a false hit when the hash(0)&16383 matches the index.
The offset must be enough to invalidate it completely. I will see if I can figure out what is causing the false match.
My own branch has a defensive if offset < maxMatchOffset to reject long offset matches, whereas this has if offset > maxMatchOffset, which is technically correct, but since we only bump with maxMatchOffset a match at offset 0 would still be valid.
So we will need to bump offset by an extra one in this case. Let me cook up a PR.
If someone has reproducers, please throw everything you have at #41477
Change https://golang.org/cl/255879 mentions this issue: compress/flate: fix corrupted output
@klauspost fixes mine from https://github.com/golang/go/issues/41420#issuecomment-694217284
Does anyone have a test case that can be included within the standard repo?
It sounds like testing this requires a large (multi GB) input, which presumably also takes considerable CPU time to deflate/reflate. So if the test needs to be disabled by default that's fine. The test input needs to either be synthesized locally or downloaded as-needed from the network though.
If test would somehow change bufferReset it would mean a smaller reproducer could be used. Alternatively, maybe it's possible invoke reset or shiftOffset manually from tests?
Also, maybe there's some input that deflates/inflates relatively fast?
Was AFK and preoccupied.
Alternatively, maybe it's possible invoke reset or shiftOffset manually from tests?
Yeah. Probably a reasonable thing (shiftOffsets, reset clears the history). I will see if I can add a reasonable test with this.
Also, maybe there's some input that deflates/inflates relatively fast?
Zeroes. Should also be doable. Encode a MB of semi-compressible content. A bit more than 2GB zeroes and re-add some semi-compressible content that should find matches in the first part if broken.
However, decompressing 2GB will probably be by far the slowest. I could see that part taking several seconds, even with this.
PR/CL updated with a test that catches the regression and sanity tests a few more things.
Friendly ping as this issue has the Soon label.
@mdempsky Are you able to take another look at the updated CL?
I commented on the CL. I think it's good, but I had a few questions to double-check that I understand the code correctly.
Change https://golang.org/cl/266177 mentions this issue: [release-branch.go1.15] compress/flate: fix corrupted output
Most helpful comment
It is unclear to me whether this issue is about the compressed output simply being different or that the compressed output is actually invalid (i.e., cannot be decompressed). Can you please clarify?