Considering that the default datastore will be transitioning from flatfs to badger soon (#4279) it would be useful to have an approximate idea of the performance gains (which are considerable) and also the losses (e.g., #4298).
I'll be working in a separate repository (it can be integrated here later) to develop some profiling statistics about it, this will also help in having a better understanding of Badger's internals (at least that's my expectation).
I'm open to (and in need of) suggestions for use cases to test (random reads, rewriting/deleting the same value several times, GC'ing while reading/writing, etc).
I think the time to complete some common operations would be interesting for repos with varying sizes, pin counts, and varying amounts of unpinned content:
ipfs pin lsipfs add-ing various directory and file mixesipfs repo verifyipfs repo gcIf memory utilization comparisons can also be made I think it would be useful. I've seen some really high memory usage from the IPFS daemon in certain cases when working with large 300+GB repos which I attributed to badger, but I haven't gone back to test with flatfs to see if it actually was badger or just having a large repo (or something else).
@schomatis see also https://github.com/Kubuxu/go-ds-bench
It is a bit old but should work after few fixes.
See dgraph-io/badger#446 for a discussion of search key performance in the IPFS case.
@leerspace : Btw, Badger's memory usage can be reduced via options. For e.g., by mmap-ing LSM tree instead of loading to RAM. Or, by keeping value log on disk, instead of mmap-ing it.
Hello
I am totally new at ipfs. I am a developer of https://github.com/recoilme/slowpoke datastore
Maybe you may use it instead of badger? Slowpoke has similar ideas too badger, but without boring lsm tree.
@schomatis I create PR with slowpoke test: https://github.com/schomatis/datastore_benchmarks/pull/1
Quick summary
Slowpoke:
Put time: 1.16221931s
Get time: 805.776917ms
Db size: 1 048 570 bytes (Zero bytes overhead)
Index size: 5 033 136 bytes
Badger:
Put time: 902.318742ms
Get time: 723.95486ms
Vlog size: 7 247 634 bytes
Sst size: 6 445 276 bytes
Slowpoke looks little slow then badger, but not dramatically
And has many other advantages
@recoilme how does slowpoke scale in few TB to PB range? btw I'd put this discussion in a separate issue as it's kind of off-topic here.
Ok, @magik6k please let me link to it please
In general, slowpoke may be little slow then badger on synthetic benchmarks but it scales better on the big databases. Slowpoke is proxy to the filesystem, like flatfs + indexes + management memory
Or like badger without Lsm tree inside. Each table work in the separate goroutine and each key store is a map with values size and address in file.
@recoilme please also test "repo gc" when say 99% of the repo is not pinned.
Badger seams an order of magnate (at least) slower than flatfs (i.e slowpoke), but this needs verification.
@kevina Could you provide a simple example of a GC operation that would take an order of magnitude more than flatfs please so I could take a deeper look into this performance issue?
@schomatis on an empty repo do a
ipfs ls --resolve-type=false QmXNHWdf9qr7A67FZQTFVb6Nr1Vfp4Ct3HXLgthGG61qy1
This could take a very long time, you may need to use https://github.com/ipfs/go-ipfs/pull/4979.
Then do a "ipfs repo gc".
Note: If you use flatfs you should turn the sync option off, before populating the datastore, to help with performance.
I have not done former test yet. I am guessing you should see the same problem with any repo that contains lots of (over 100k) small objects with very few of the objects pinned.
Also see #4908.
@schomatis here is a script to reproduce the problem
#!/bin/bash
# requires 'jq': https://stedolan.github.io/jq/
set -e
RANDOM=$GOPATH/src/github.com/ipfs/go-ipfs/test/bin/random
TMP=/aux/scratch/tmp
RNDFILE="$TMP/128MB-rnd-file"
"$RANDOM" 134217728 > "$RNDFILE"
export IPFS_PATH="$TMP/ipfs-tmp"
ipfs init > /dev/null
mv "$TMP/ipfs-tmp"/config "$TMP/ipfs-tmp"/config.bk
cat "$TMP/ipfs-tmp"/config.bk | jq .Datastore.Spec.mounts[0].child.sync'='false > "$TMP/ipfs-tmp"/config
ipfs add --pin=false --chunker=size-1024 "$RNDFILE"
mv "$TMP/ipfs-tmp"/config.bk "$TMP/ipfs-tmp"/config
echo "calling repo gc, default config"
time ipfs repo gc > /dev/null
rm -rf "$TMP/ipfs-tmp"
ipfs init -p badgerds > /dev/null
ipfs add --pin=false --chunker=size-1024 "$RNDFILE"
echo "calling repo gc, badgerds"
time ipfs repo gc > /dev/null
rm -rf "$TMP/ipfs-tmp"
rm "$RNDFILE"
When TMP pointed to "/tmp" which uses 'tmpfs' the 'repo gc' was fine. When TMP pointed to a non-memory file system that is when it was very slow. It try and let it complete and report the results.
Okay. I gave up and killed "repo gc" when badgerds is used. Here are the results:
Flatfs: 47 sec
Badgerds: >26m17s or 1577s (killed the process)
So Badgerds is at least 30 times slower.
Great work @kevina! Thanks for the test script. I'll try to reproduce it on my end and see if I can pinpoint the bottleneck on the Badger implementation.
The GC is supposed to be slower in Badger due to the added complexity of checking for the deletion marks in the value log, but an order of magnitude slower (or more) would be too much to ask the user to bear during this transition.
Let me see if I can run this. I'm doing a whole bunch of improvements related to GC, and versioning. I feel those should fix this up nicely.
What version of Badger are you guys on?
Update: When I run the script by @kevina above, it fails with
$ ./t.sh ~/test
./t.sh: line 11: 20034: command not found
My go-ipfs/test/bin doesn't have a random binary, after make install. What am I missing?
My go-ipfs/test/bin doesn't have a random binary, after make install. What am I missing?
That is probably installed with the tests. "cd go-ipfs/test && make deps" should install it. You also need jq, installed, if you don't have you can comment out those lines, the ipfs add may me a bit slow though.
Hi @manishrjain, thanks for stepping in here. The master branch of IPFS is using v1.3.0, I'm not sure what version @kevina used for the tests, but feel free to assume we're using the latest version in Badger's master branch, Badger is still not the default datastore so we have some latitude here.
To use the latest version of Badger inside IPFS you can run the following commands:
export BADGER_HASH=QmdKhi5wUQyV9i3GcTyfUmpfTntWjXu8DcyT9HyNbznYrn
rm -rfv $GOPATH/src/gx/ipfs/$BADGER_HASH
git clone https://github.com/dgraph-io/badger $GOPATH/src/gx/ipfs/$BADGER_HASH/badger
wget https://ipfs.io/ipfs/$BADGER_HASH/badger/package.json -O $GOPATH/src/gx/ipfs/$BADGER_HASH/badger/package.json
cd $GOPATH/src/github.com/ipfs/go-ipfs
make install
Sorry for the convoluted commands, the current tool to work with development packages in gx (gx-go link) has an issue at the moment which prevents a more elegant solution (AFAIK).
@schomatis I am using master at Fri Apr 27 12:52:01 2018 +0900 commit 2162f7e681b87e6b322712870f9f883bf9f9ec40.
@manishrjain The random util as mentioned by @kevina needs to be installed separately (it's a dependency of the sharness tests):
cd $GOPATH/src/github.com/ipfs/go-ipfs/test/sharness/
make deps
@schomatis I am using master at Fri Apr 27 12:52:01 2018 +0900 commit 2162f7e.
Great, so that should indeed be Badger's last stable version 1.3.0.
I pulled in Badger master head, not sure if it makes any difference here.
I set SyncWrites to false, which sped up the repo gc command significantly -- for GC, having async writes makes sense anyway. So, you might want to turn that on for the GC command.
Another thing I noticed is that each "removed Q.." is run as a separate transaction, serially. If deletions can be batched up in one transaction, that'd speed up things a lot. Though, looking at the code, it might be easier to run bs.DeleteBlock(k) concurrently in multiple goroutines. The input and output is already done via channels, so this should be feasible.
On my laptop, with $TMP set to $HOME/test/tmp:
$ ipfs add --pin=false --chunker=size-1024 128MB-rnd-file ~/test/tmp
Creating badger
added QmfYcagv5oqa2cKKaxiDBJaU5bXu4VwKTUNz5h1DQx3Uye 128MB-rnd-file
19:57:59 î‚° ~/test/tmp î‚°
$ time ipfs repo gc > /dev/null ~/test/tmp
ipfs repo gc > /dev/null 11.28s user 1.47s system 179% cpu 7.085 total
I can confirm that setting SyncWrites to false takes down Badger GC time to almost the value of flatfs. Still, in this example provided, 128MB is below the 1GB threshold of ValueLogFileSize so no GC is actually triggered on the Badger side, just the deletion writes are issued on the IPFS side. Lowering ValueLogFileSize to force Badger GC increases the ipfs repo gc command time for the badger datastore, but just to 1.25-1.5x, more tests with bigger repo sizes are needed but this is very promising.
Thanks a lot @manishrjain!! This is a big win, please let me know if I can be of help with the ongoing GC improvements of https://github.com/dgraph-io/badger/pull/454.
@schomatis the idea behind adding a small file with very small chunks was to approximate how a very large shared directory will be stored.
@kevina I see, my comment above is not about the file size (or chunk size) itself, but rather that it is necessary to surpass ValueLogFileSize (whatever that size is) to make sure there is more than one value log file so Badger actually does a GC over one of them (if there is only one log file as in this example only the deletion writes are measured but not Badger's search time when it inspects a value log file to see which key is deleted and which isn't).
Sorry, just my 5 cents about nosync:
Setting SyncWrites to false == corrupted database
Without fsync - data not store and if app/os crash or shut down unexpectedly - all data will be lost and database may(will, in case of Lsmtree based db's) be corrupted
Async writes to file == may lead to corrupted data too
For async writes to file you must use only one file descriptor per file and mutexes like in this library: https://github.com/recoilme/syncfile
Or goroutines like in https://github.com/recoilme/slowpoke
But how do it with badger? You may not open badger file from another file descriptor with the guarded library like syncfile
Slowpoke don't have "nosync" option but has butch write (sets method) with fsync at the end. It's work like a transaction. And has DeleteFile and Close methods. It may store unpinned/pinned items separately and you may close (free keys from memory) not needed data and delete all files with not needed data fast and safely
Thanks a lot @manishrjain!! This is a big win, please let me know if I can be of help with the ongoing GC improvements of dgraph-io/badger#454.
Thanks for the offer, @schomatis . I could definitely use your help on various issues. That PR you referenced is part of a v2.0 launch. Want to chat over email? Mine is my first name at dgraph.io.
@recoilme That's a good point on syncing I/O (I'll have that in mind for the Badger tests), also thanks for submitting the PR with the slowpoke benchmarks.
Please note that this issue concerns the Badger benchmarks as a datastore for IPFS, if you would like to add slowpoke as an experimental datastore for IPFS I'm all for it but please open a new issue to discuss that in detail.
Regarding the benchmark results, I would like to point out that although performance is the main motivation to transitioning to Badger there are also other aspects to choosing a DB that in my opinion are also important to consider, such as how many developers are actively working on it, how many users in the community are using/testing it, who else has adopted it as its default DB, what documentation does the project have, what's the adaptive capacity of the system (to the IPFS use case). I think all of those (and I'm missing many more) are important aspects to consider besides the fact that a benchmark might suggest that one DB outperforms another by ~5% in some particular store/search scenario.
@schomatis Thanks for the detailed answer. It seems to me that badger is an excellent choice. But I would be happy to add slowpoke as an experimental datastore for research. I just want to solve some problems specific to Ipfs for my storage because it's interesting. I implement the datastore interface after the vacation and open the issue for discussion if you like.
@recoilme Great!
Is anyone actively working on this?
Hey @ajbouh, I am (sort of, I've been distracted with other issues), but any help is more than welcome :)
If you were just asking to use Badger as your datastore you can enable it as an experimental feature.
I'm trying to make IPFS performance benchmarks real partially in service of:
And partly to help support some of the awesome efforts already underway like:
IPFS + TensorFlow :heart_eyes:
So, these were just some informal benchmarks to get an idea of the already known benefits of using Badger over a flat architecture, and as they stand they are just an incomplete skeleton. As mentioned by @Stebalien in https://github.com/ipfs/go-ipfs/issues/4279#issuecomment-405730422 the priority right now is on making Badger as stable as possible, so benchmarking performance is not really at the top of the list at the moment. That being said, if you want to continue with this work feel free to write me and we could coordinate a plan forward so I could guide you through it.
Covered by #6523.