Victoriametrics: While under heavy load getting error panic

Created on 22 Oct 2019  Â·  19Comments  Â·  Source: VictoriaMetrics/VictoriaMetrics

Describe the bug
Server will not start. Getting:
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: panic: runtime error: index out of range [32] with length 32
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: goroutine 192 [running]:
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/klauspost/compress/zstd.(fseEncoder).writeCount(0xc00d674000, 0xc0141ae000, 0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/vendor/github.com/klauspost/compress/zstd/fse_encoder.g
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/klauspost/compress/zstd.(
blockEnc).encode(0xc0002a0ee0, 0xc0002a0ee0, 0xc0052
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/vendor/github.com/klauspost/compress/zstd/blockenc.go:6
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/klauspost/compress/zstd.(Encoder).EncodeAll(0xc0001cad80, 0xc0052e5d6e, 0x0,
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/vendor/github.com/klauspost/compress/zstd/encoder.go:47
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd.CompressLevel(0xc00a990000,
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/encoding/zstd/zstd_pure.go:43 +0x68
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding.CompressZSTDLevel(0xc00a990000, 0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/encoding/compress.go:16 +0x98
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(
inmemoryBlock).marshalData(0xc0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/encoding.go:241 +0x5ec
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(inmemoryBlock).MarshalSortedDat
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/encoding.go:166 +0x98
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(
blockStreamWriter).WriteBlock(0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/block_stream_writer.go:167 +0x6c
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(blockStreamMerger).flushIB(0xc0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/merge.go:193 +0x488
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(
blockStreamMerger).Merge(0xc000
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/merge.go:134 +0xf0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.mergeBlockStreams(0xc00df89cd8, 0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/merge.go:35 +0x11c
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).mergeParts(0xc0001ca900,
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/table.go:774 +0x4fc
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(
Table).mergeExistingParts(0xc00
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/table.go:653 +0x120
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(Table).partMerger(0xc0001ca900,
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/table.go:667 +0x90
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(
Table).startPartMergers.func1(0
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/table.go:635 +0x30
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: created by github.com/VictoriaMetrics/VictoriaMetrics/lib/mergeset.(*Table).startPartMerg
Oct 21 04:57:16 linux-infrastructure-chicagodbaas-influx-10.novalocal victoria-metrics-pure-1.28.0.ppc64le[54144]: /root/vendor_repo/VictoriaMetrics/lib/mergeset/table.go:634 +0x68

To Reproduce
Unknown how to reproduce. Instance is under heavy load and after about 1 week of running error started to occur.

Expected behavior
No error to occur

Screenshots
If applicable, add screenshots to help explain your problem.

Version
1.28.0 (ppc64le pure compiled version)

Additional context
image

image

image

image

image

bug

Most helpful comment

No problem. Thanks for reporting the issue and helping to fix it. Tagging a release tomorrow.

All 19 comments

This looks like a bug in pure Go zstd implementation. cc'ing @klauspost . VictoriaMetrics calls concurrently Encoder.EncodeAll for the same Encoder. I suppose there is some data race in this case.

In the mean time you can try building VictoriaMetrics with gozstd - Go wrapper for upstream C implementation of zstd. Run make clean libzstd.a inside gozstd directory before building VictoriaMetrics with make victoria-metrics on ppc64le machine.

Yes. Definitely looks like something that shouldn't happen. I will take a look.

@nickmy9729 Do you have the trace where the end of the lines haven't been cut off?

@nickmy9729 v1.9.0 released which should fix the issue.

https://github.com/klauspost/compress

@klauspost Still getting error after using v1.9.0 of compress. Here is stack:
victoria_stack_trace.txt

@nickmy9729 the fix is not merged yet. PR https://github.com/VictoriaMetrics/VictoriaMetrics/pull/217

I will take a look. This looks like a logic error. I will try to see if I can set up a fuzz test for this.

If it is at all possible to grab the input sent at VictoriaMetrics/lib/encoding/compress.go:16 when the crash happens that would be a great help.

I am not sure I am able to do that... Given that VictoriaMetrics uses your
library might be a steep learning curve to get that data.

-Nick

On Fri, Oct 25, 2019 at 11:08 AM Klaus Post notifications@github.com
wrote:

I will take a look. This looks like a logic error. I will try to see if I
can set up a fuzz test for this.

If it is at all possible to grab the input sent at
VictoriaMetrics/lib/encoding/compress.go:16 when the crash happens that
would be a great help.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/215?email_source=notifications&email_token=ACCSQKVGYG44CF2S3LFWZHDQQMDXFA5CNFSM4JDT5QQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECIUVTA#issuecomment-546392780,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACCSQKVPA4OL3PXX274KOVLQQMDXFANCNFSM4JDT5QQA
.

Maybe it could be something like this:

func CompressZSTDLevel(dst, src []byte, compressLevel int) []byte {
    var finished bool
    defer func() {
        if !finished {
            ioutil.WriteFile("/tmp/crashfile.bin", src, os.ModePerm)
        }
    }()
    compressCalls.Inc()
    originalBytes.Add(len(src))
    dstLen := len(dst)
    dst = zstd.CompressLevel(dst, src, compressLevel)
    compressedBytes.Add(len(dst) - dstLen)
    finished = true
    return dst
}

and add "os" a "io/ioutil" to the imports at the top.

I've been running fuzz test with a hard test for this condition for 24h and still nothing, so I would really appreciate a reproduction case.

There is a small chance this could be down to a compiler error, so I will not throw in a quick workaround since that could lead to data loss (invalid stream), if that is indeed the case.

I will put that in later tonight recompile and get you the crashfile.bin.

On Fri, Oct 25, 2019 at 3:49 PM Klaus Post notifications@github.com wrote:

Maybe it could be something like this:

func CompressZSTDLevel(dst, src []byte, compressLevel int) []byte {
var finished bool
defer func() {
if !finished {
ioutil.WriteFile("/tmp/crashfile.bin", src, os.ModePerm)
}
}()
compressCalls.Inc()
originalBytes.Add(len(src))
dstLen := len(dst)
dst = zstd.CompressLevel(dst, src, compressLevel)
compressedBytes.Add(len(dst) - dstLen)
finished = true
return dst
}

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/215?email_source=notifications&email_token=ACCSQKX3QCWWJ44TXQMXCA3QQNETRA5CNFSM4JDT5QQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECJMFZY#issuecomment-546489063,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACCSQKXXMK2DAZBHA7WN6HLQQNETRANCNFSM4JDT5QQA
.

@klauspost

Here is the crashfile.bin

I have changed the extension to .zip to allow it to be posted here. I did not zip it, just remove the .zip extention and you will have the bin file.
crashfile.bin.zip

Thanks! That reproduced the issue!

@klauspost I just recompiled VictoriaMetrics with your latest patch. So far everything is looking good. I assume you will create a new version of compress once you have completed all your regression tests etc.

Thanks @klauspost for all your help! You have been very generous with your time.

No problem. Thanks for reporting the issue and helping to fix it. Tagging a release tomorrow.

@klauspost , thanks for fixing the issue!

@nickmy9729 , the issue should be fixed in the commit 6ab48838bff62228dc660580fa01ee7ed95c434c . The fix will be available in the next VictoriaMetrics release.

Agreed. Been using VM with latest patch compiled and no issues. Thanks again!

FYI, the bugfix is available starting from v1.28.1.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WilliamDahlen picture WilliamDahlen  Â·  3Comments

oOHenry picture oOHenry  Â·  4Comments

dima-vm picture dima-vm  Â·  3Comments

faceair picture faceair  Â·  3Comments

EricAntoni picture EricAntoni  Â·  3Comments