Dgraph: panic: runtime error during snapshotting

Created on 12 Nov 2018  路  2Comments  路  Source: dgraph-io/dgraph

If you suspect this could be a bug, follow the template.

  • What version of Dgraph are you using?
    release/v1.0.10

  • Have you tried reproducing the issue with latest release?
    Yes.

  • What is the hardware spec (RAM, OS)?
    CPU: 32-core
    MEM: 64GB
    STORAGE: SSD
    OS: CentOS 7.4

  • Steps to reproduce the issue (command/config used to run Dgraph).
    Couldn't reproduce this for now.

I exported the database from 1.0.9 and imported to 1.0.10.
And the panic occurred accidentally while snapshotting after a few days.
We've stopped running this node for now.

  • Expected behaviour and actual result.
    syslog wrote following logs.
Nov  9 15:51:43 dgraph101 dgraph[45581]: panic: runtime error: invalid memory address or nil pointer dereference
Nov  9 15:51:43 dgraph101 dgraph[45581]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xbd8aee]
Nov  9 15:51:43 dgraph101 dgraph[45581]: goroutine 8325297 [running]:
Nov  9 15:51:43 dgraph101 dgraph[45581]: github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*valueLog).write(0xc0000992c8, 0xc1ab5a2b40, 0x1, 0xa, 0x0, 0x0)
Nov  9 15:51:43 dgraph101 dgraph[45581]: /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/value.go:929 +0x1ee
Nov  9 15:51:43 dgraph101 dgraph[45581]: github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*DB).writeRequests(0xc000099180, 0xc1ab5a2b40, 0x1, 0xa, 0xc14c079aa0, 0x0)
Nov  9 15:51:43 dgraph101 dgraph[45581]: /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:593 +0x106
Nov  9 15:51:43 dgraph101 dgraph[45581]: github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*DB).doWrites.func1(0xc1ab5a2b40, 0x1, 0xa)
Nov  9 15:51:43 dgraph101 dgraph[45581]: /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:662 +0x55
Nov  9 15:51:43 dgraph101 dgraph[45581]: created by github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger.(*DB).doWrites
Nov  9 15:51:43 dgraph101 dgraph[45581]: /ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/db.go:711 +0x30e
Nov  9 15:51:43 dgraph101 dgraph[45581]: W1109 15:51:43.024667   45581 draft.go:327] Error while calling CreateSnapshot: requested index is older than the existing snapshot. Retrying...

The panic occurred here in badger.

// write is thread-unsafe by design and should not be called concurrently.
func (vlog *valueLog) write(reqs []*request) error {
    vlog.filesLock.RLock()
    maxFid := atomic.LoadUint32(&vlog.maxFid)
    curlf := vlog.filesMap[maxFid]
    vlog.filesLock.RUnlock()
    ...
    for i := range reqs {
        b := reqs[i]
        b.Ptrs = b.Ptrs[:0]
        for j := range b.Entries {
            e := b.Entries[j]
            var p valuePointer

            p.Fid = curlf.fid // <-- PANIC

In badger, it looks like having a hypothesis that vlog.filesMap[maxFid] must have non-nil value. But I think it's more safer to check nil here because there always is a chance to get a nil from map.

func TestNil(t *testing.T) {
    m := map[uint32]*logFile{}
    var p valuePointer
    curlf := m[0]
    p.Fid = curlf.fid // <-- PANIC
}

Thank you!

investigate kinbug

Most helpful comment

I reproduced this, and there were too many open files before the panic occurred.

Nov 14 23:19:19 dgraph101 dgraph[141898]: E1114 23:19:19.722793  141898 lists.go:97] Can't read the proc file. Err: open /proc/self/stat: too many open files

Dgraph looks fine for now after increasing LimitNOFILE.

Sorry for your time to check this issue :-(
Thank you!

All 2 comments

That fid should not be nil. If it is, we have a bug.

I reproduced this, and there were too many open files before the panic occurred.

Nov 14 23:19:19 dgraph101 dgraph[141898]: E1114 23:19:19.722793  141898 lists.go:97] Can't read the proc file. Err: open /proc/self/stat: too many open files

Dgraph looks fine for now after increasing LimitNOFILE.

Sorry for your time to check this issue :-(
Thank you!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

armaneous picture armaneous  路  3Comments

ShawnMilo picture ShawnMilo  路  4Comments

xhochipe picture xhochipe  路  3Comments

jerodsanto picture jerodsanto  路  3Comments

mbudge picture mbudge  路  3Comments