Thanos: segfault in store binary index header feature

Created on 18 Feb 2020  路  5Comments  路  Source: thanos-io/thanos

Thanos, Prometheus and Golang version used:

Thanos master-2020-02-13-adfef4b5

Object Storage Provider: AWS S3

What happened: Store errored with segfault.

What you expected to happen: Store does not segfault :slightly_smiling_face:

How to reproduce it (as minimally and precisely as possible): Unclear, but the thing to note probably is that this store has the experimental binary index header feature enabled.

Full logs to relevant components:

Full log from start of the store:

https://gist.github.com/brancz/2cd7d982b844b521bf4301895bafb61b

Anything else we need to know:

n/a

bug store P0

Most helpful comment

Thanks for report.

Segfault seems to be here:

// removeElement is used to remove a given list element from the cache
func (c *LRU) removeElement(e *list.Element) {
    c.evictList.Remove(e)
    kv := e.Value.(*entry)
->  delete(c.items, kv.key)
    if c.onEvict != nil {
        c.onEvict(kv.key, kv.value)
    }
}

And particularly for cacheKeyPostings which is created here:

// StorePostings sets the postings identified by the ulid and label to the value v,
// if the postings already exists in the cache it is not mutated.
func (c *InMemoryIndexCache) StorePostings(ctx context.Context, blockID ulid.ULID, l labels.Label, v []byte) {
    c.set(cacheTypePostings, cacheKey{blockID, cacheKeyPostings(l)}, v)

It might be the case that l labels.Label are backed up by mmaped memory, so if:

  • Block is removed/unloaded
  • Mmap unloads for other reasons?

We will have SEGFault. Implementing a fix now.

All 5 comments

Thanks for report.

Segfault seems to be here:

// removeElement is used to remove a given list element from the cache
func (c *LRU) removeElement(e *list.Element) {
    c.evictList.Remove(e)
    kv := e.Value.(*entry)
->  delete(c.items, kv.key)
    if c.onEvict != nil {
        c.onEvict(kv.key, kv.value)
    }
}

And particularly for cacheKeyPostings which is created here:

// StorePostings sets the postings identified by the ulid and label to the value v,
// if the postings already exists in the cache it is not mutated.
func (c *InMemoryIndexCache) StorePostings(ctx context.Context, blockID ulid.ULID, l labels.Label, v []byte) {
    c.set(cacheTypePostings, cacheKey{blockID, cacheKeyPostings(l)}, v)

It might be the case that l labels.Label are backed up by mmaped memory, so if:

  • Block is removed/unloaded
  • Mmap unloads for other reasons?

We will have SEGFault. Implementing a fix now.

The exact issue is when using hash key in ASM:

    // load data to be hashed
    MOVOU   (AX), X2

Bad news is that I cannot reproduce this locally with the test cases I thought they will trigger this...

Ok, got it. Was actually pretty close with reproduction (:

https://github.com/thanos-io/thanos/pull/2151

Fix will be shipped tomorrow, hopefully before RC.

cc @metalmatze @brancz

Was this page helpful?
0 / 5 - 0 ratings