If you suspect this could be a bug, follow the template.
1.05
yes
rdf data set: 400G
RAM: 128G
OS: centos
when use dgraph bulk map reduce do batch data loading, program exceeds 10000-thread limit error raised at the begin of reduce.
full log:
MAP 05h41m10s rdf_count:4.165G rdf_speed:203.4k/sec edge_count:13.98G edge_speed:682.8k/sec
REDUCE 05h41m11s [0.00%] edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion
runtime stack:
runtime.throw(0x1368a7c, 0x11)
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/panic.go:605 +0x95
runtime.checkmcount()
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/proc.go:525 +0xa4
runtime.mcommoninit(0xc5f0d73400)
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/proc.go:545 +0x9f
runtime.allocm(0xc42046ec00, 0x0, 0xc700000000)
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/proc.go:1344 +0x99
runtime.newm(0x0, 0xc42046ec00)
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/proc.go:1637 +0x39
runtime.startm(0xc42046ec00, 0x1afb500)
/home/travis/.gimme/versions/go1.9.4.linux.amd64/src/runtime/proc.go:1728 +0x13f
runtime.handoffp(0xc42046ec00)
What's the command that you're using? How many files?
@manishrjain
dgraph bulk -r file.rdf -s file.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080
There is only one file. At previous try, the file.rdf (350G) works well.
After adding some facets to relationship, the data set goes to 400G and error occurred during this try.
Are you on a slow disk? I think this might be because disk reads are so slow that Go keeps on creating threads, and ends up hitting the limit.
Can you run your program on an SSD?
@manishrjain
yes, I have run it on hdd.
After changing to SSD, the problem does not occur.
However, I think the problem of automatically creating threads when the disk is too slow to write may be a bug should be fixed for guys who do not have SSD environment.
That's an artifact of Go. If a goroutine blocks for a while, it would leave it aside and spawn a new thread. In this case, your reads were taking so long that Go created too many system threads. There's not much that we can do from within Badger to tackle this.
Most helpful comment
@manishrjain
yes, I have run it on hdd.
After changing to SSD, the problem does not occur.
However, I think the problem of automatically creating threads when the disk is too slow to write may be a bug should be fixed for guys who do not have SSD environment.