Dgraph: Crash when online write

Created on 18 May 2020  路  20Comments  路  Source: dgraph-io/dgraph

What version of Dgraph are you using?

Dgraph version : v1.2.4
Dgraph SHA-256 : 38ca7ecc19103d21bdd1eddf0d2695e872883b37d4c270916606b0b51f1d0a1b
Commit SHA-1 : b51171c
Commit timestamp : 2020-05-15 15:42:10 -0700
Branch : HEAD
Go version : go1.13.6

Have you tried reproducing the issue with the latest release?

v1.2.4

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

Just online write with java api, its wierd.

Expected behaviour and actual result.

Alpha crah and cluster is blocked. The log is as followings

`
panic: runtime error: slice bounds out of range [:135467521] with capacity 7968

goroutine 10106572 [running]:
github.com/dgraph-io/badger/v2/table.(blockIterator).setIdx(0xc001b41120, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:89 +0x52c
github.com/dgraph-io/badger/v2/table.(
blockIterator).seekToFirst(...)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:147
github.com/dgraph-io/badger/v2/table.(Iterator).next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:314 +0x12b
github.com/dgraph-io/badger/v2/table.(
Iterator).next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:323 +0x1b3
github.com/dgraph-io/badger/v2/table.(Iterator).Next(0xc001b41110)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:379 +0x47
github.com/dgraph-io/badger/v2/table.(
ConcatIterator).Next(0xc682d840a0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/iterator.go:497 +0x33
github.com/dgraph-io/badger/v2/table.(node).next(0xc112aec210)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/merge_iterator.go:82 +0x3d
github.com/dgraph-io/badger/v2/table.(
MergeIterator).Next(0xc112aec210)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/merge_iterator.go:158 +0x38
github.com/dgraph-io/badger/v2/table.(node).next(0xc112aec300)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/merge_iterator.go:80 +0x72
github.com/dgraph-io/badger/v2/table.(
MergeIterator).Next(0xc112aec2c0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/table/merge_iterator.go:158 +0x38
github.com/dgraph-io/badger/v2.(Iterator).parseItem(0xc22baf0000, 0x1)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:621 +0x17f
github.com/dgraph-io/badger/v2.(
Iterator).Next(0xc22baf0000)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:566 +0x10b
github.com/dgraph-io/badger/v2.(Stream).produceKVs.func1(0xc83b156a40, 0x17, 0x20, 0xc83b156a60, 0x17, 0x20, 0x0, 0x0, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:176 +0xa48
github.com/dgraph-io/badger/v2.(
Stream).produceKVs(0xc5d2ba5ea0, 0x1a971a0, 0xc000038118, 0x0, 0x0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:234 +0x2c6
github.com/dgraph-io/badger/v2.(Stream).Orchestrate.func1(0xc5d4a7daa0, 0xc5d2ba5ea0, 0x1a971a0, 0xc000038118, 0xc5d4a53ec0)
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:337 +0x7f
created by github.com/dgraph-io/badger/v2.(
Stream).Orchestrate
/root/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:334 +0x175
`

arecrash kinbug prioritP0 statuaccepted

Most helpful comment

@H4midR Can you downgrade to v20.03.1? The latest version has this failure. @JimWen you should also switch to v1.2.3 if the crash happens frequently.

We're working on fixing this and we'll do a patch release soon.

All 20 comments

When i shutdown and restart alpha with v1.2.4, the problem is still there. But with v1.2.3, there is no problem.

I've seen similar failures in badger on PR https://github.com/dgraph-io/badger/pull/1308 . This is an issue in badger.

we are on heavy load.the server crash every 10 minutes . how we can handle it?
increasing alpha groups could be an option ?

@H4midR Can you downgrade to v20.03.1? The latest version has this failure. @JimWen you should also switch to v1.2.3 if the crash happens frequently.

We're working on fixing this and we'll do a patch release soon.

my dgraph version is 20.03.2
i recorded my screen and i can share it with you. and i really need to find an option. there is project working on it more than 1.5 years. we grow up with you and now , in real use and the first global use,we screwed. if you interested in records or can help me , i really looking forward to hearing from you.
[email protected]

@H4midR Did you try downgrading your version? Dgraph v20.03.1 should work fine.

@H4midR Have you tried using Dgraph v20.03.1 and are still seeing this issue? We have reached out to you over email as well.

@H4midR Can you downgrade to v20.03.1? The latest version has this failure. @JimWen you should also switch to v1.2.3 if the crash happens frequently.

We're working on fixing this and we'll do a patch release soon.

yes v1.2.3 is fine

@H4midR Have you tried using Dgraph v20.03.1 and are still seeing this issue? We have reached out to you over email as well.

i'll do it but we are not in situation that have options for test.as soon as exams end i'm going to simulate the process to see how it happens again.
thank you all friends.

@JimWen @H4midR when does the crash happen? Do you have any logs or any information about when the crash shows up? I'm trying to reproduce the failure.

@JimWen @H4midR when does the crash happen? Do you have any logs or any information about when the crash shows up? I'm trying to reproduce the failure.

Replace with lastest v1.2.4, crash happen. log related is in the issue.

it sounds like when the mutations request increase, it happens.the duration between two failures depends on request rates.
in low rates. it's fine.
in reading request it's fine.
i can help you to reproduce it. but after my this work done here on 28th May.

This is a bug in badger and not in dgraph. It can be reproduced by the following steps

git clone https://github.com/dgraph-io/badger
cd badger
git fetch origin ibrahim/panic-test
git checkout ibrahim/panic-test
cd badger
go run main.go bank test --dir ./badgerdb -d 24h 2>&1

so finally. do version 1.2.3 has this problem too?

@H4midR No, this affects only version 1.2.4 and v20.03.2 .

This is not yet fixed in Dgraph. I'll close this issue once badger is updated in Dgraph.

This has been fixed by updating badger in https://github.com/dgraph-io/dgraph/pull/5404

Hey @JimWen and @H4midR, we've released a new patch release candidate with the fix for this issue https://github.com/dgraph-io/dgraph/releases v20.03.3-rc1 and v1.2.5-rc1 . I would really appreciate it if you guys can help us test it.

We believe it is fixed but it would be very helpful if someone running dgraph can also confirm it.

thank you very much.
we are in the production environment and our customers are not very bored.but i try to test it locally.

Hey @JimWen and @H4midR, we've released a new patch release candidate with the fix for this issue https://github.com/dgraph-io/dgraph/releases v20.03.3-rc1 and v1.2.5-rc1 . I would really appreciate it if you guys can help us test it.

We believe it is fixed but it would be very helpful if someone running dgraph can also confirm it.

i test with v20.03.3, online write has worked fine until now for 2 days. @jarifibrahim
but it seems to introduce another crash , please have a look #5573

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jeffkhull picture jeffkhull  路  3Comments

captain-me0w picture captain-me0w  路  4Comments

MichelDiz picture MichelDiz  路  3Comments

pjebs picture pjebs  路  4Comments

xhochipe picture xhochipe  路  3Comments