Thanos, Prometheus and Golang version used:
Thanos master-2020-02-13-adfef4b5
Object Storage Provider: AWS S3
What happened: Store errored with segfault.
What you expected to happen: Store does not segfault :slightly_smiling_face:
How to reproduce it (as minimally and precisely as possible): Unclear, but the thing to note probably is that this store has the experimental binary index header feature enabled.
Full logs to relevant components:
Full log from start of the store:
https://gist.github.com/brancz/2cd7d982b844b521bf4301895bafb61b
Anything else we need to know:
n/a
Thanks for report.
Segfault seems to be here:
// removeElement is used to remove a given list element from the cache
func (c *LRU) removeElement(e *list.Element) {
c.evictList.Remove(e)
kv := e.Value.(*entry)
-> delete(c.items, kv.key)
if c.onEvict != nil {
c.onEvict(kv.key, kv.value)
}
}
And particularly for cacheKeyPostings which is created here:
// StorePostings sets the postings identified by the ulid and label to the value v,
// if the postings already exists in the cache it is not mutated.
func (c *InMemoryIndexCache) StorePostings(ctx context.Context, blockID ulid.ULID, l labels.Label, v []byte) {
c.set(cacheTypePostings, cacheKey{blockID, cacheKeyPostings(l)}, v)
It might be the case that l labels.Label are backed up by mmaped memory, so if:
We will have SEGFault. Implementing a fix now.
The exact issue is when using hash key in ASM:
// load data to be hashed
MOVOU (AX), X2
Bad news is that I cannot reproduce this locally with the test cases I thought they will trigger this...
Trying to repro this here: https://github.com/thanos-io/thanos/pull/2078/commits/ea6a26c5f76774bc0e0bebcf99d81118c07b8d44
No luck yet.
Ok, got it. Was actually pretty close with reproduction (:
https://github.com/thanos-io/thanos/pull/2151
Fix will be shipped tomorrow, hopefully before RC.
cc @metalmatze @brancz
Most helpful comment
Thanks for report.
Segfault seems to be here:
And particularly for
cacheKeyPostingswhich is created here:It might be the case that
l labels.Labelare backed up by mmaped memory, so if:We will have SEGFault. Implementing a fix now.