Go-ipfs: Make badger-ds the default datastore

Created on 5 Oct 2017  路  15Comments  路  Source: ipfs/go-ipfs

This is the master issue to centralize all other issues/PRs related to the Badger transition.

The priorities to check before the transition are:

  • DB integrity. That means, besides minimizing data loss in case of errors such as a system crash or running out of disk space, to always keep a consistent database for Badger (and IPFS) to be able to start. Even though some truncation for example might be needed, the scenario to avoid is for Badger to encounter a DB it can't work with (e.g., a failed assertion that Badger doesn't know how to recover from) and refusing to start (which would mean IPFS would not work) without some manual interaction (which can't be expected from the normal end-user).

  • Performance in worst case scenarios. We are transitioning from a flat file-system storage (one key, one file) which in most cases has a (much) lower performance than Badger, but there are some scenarios (e.g., GC or some search cases) where a flat architecture may outperform Badger (or any other LSM architecture for that matter), that should be minimized as much as possible so the end-user won't notice the transition.

The active issues (mostly the ones tagged with badger) are:

epic kinfeature statudeferred topibadger topimeta topirepo

All 15 comments

@Stebalien Do you mind if I hijack this issue to keep track of all the other issues related to the Badger transition?

@schomatis go right ahead!

Is this still on track to happen sometime soon?

In addition to the issues listed in the description, ~we're still working through some recovery issues~ (not a bug) and memory usage is pretty bad (we may be able to tune this a bit ~but I'm getting some really weird behavior on Linux)~ (can't reproduce anymore).

Basically, we can't roll this out until:

  1. We can always recover after a crash.
  2. It doesn't eat ram needlessly.

Thanks for the reference, I should add those.

I doubt this effects most users but I'm linking it anyway.
My own instance of IPFS runs with flatfs hosted on an SMB/CIFS share.
badger doesn't currently handle this: https://github.com/dgraph-io/badger/issues/699
although it can.

For full context, I do this because my local disks are small. And I can't run IPFS on the remote machine because components of libp2p don't build on Solaris yet.
(when trying to port it I encountered an oddity where the Go standard library says something is implemented but it isn't)

@djdv that is why we provide other datastore implementations and simple switches to initialise repo with different configurations.

@magik6k IMO, we _should_ be able to graduate badger from experimental even if we don't go ahead and make it the default. However, we may want to land https://github.com/ipfs/go-ds-badger/issues/51 first.

Hey, whats the status here? I see that https://github.com/ipfs/go-ds-badger/issues/51 is closed, does that mean that the badger datastore is to be considered pretty mature now?

We should probably update to badger v2 before using it as default.

I just want to note that flatfs has some advantages: When using ZFS as the underlying filesystem, (with a blocksize set to 256 K for ZFS and raw-leaves for IPFS) you can dedup the blockstorage against a copy of the data outside the block storage you might need to hold for a different service, like http. This isn't possible with data stored inside a database.

It would be nice if support for flatfs isn't dropped in the future. :)

At the moment, we plan on keeping flatfs. It has a tendency to "just work everywhere". The main downside is that it's impossible to optimize.

Well, that's neat! :)

I think optimizations depend on the filesystem, you could for example add a fast SSD, like the Intel Optane ones, for the cache in ZFS or for 'small files'-vdev.

This should give a major boost in read performance.

The advantage of something like ZFS is clearly that it can rollback to a clean state in case of an power outage, even with write sync off. So writes can be accepted as a bulk and slowly committed to the storage in an orderly manner, when the device is not delivering read requests - like a raid controller write cache - just with basically the whole free main memory.

Is there a middle ground where flat-fs remains as default for low-power configuration, and badger becomes default otherwise?

Maybe it would be useful to specify some of the target levels for excessive space / memory / compacting behavior that would be acceptable to make the switch.

Yes, but I don't actually think we have to. Once we fix the final memory issue (shouldn't be too hard), low power nodes should be just fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

daviddias picture daviddias  路  3Comments

magik6k picture magik6k  路  3Comments

djdv picture djdv  路  3Comments

kallisti5 picture kallisti5  路  3Comments

emelleme picture emelleme  路  3Comments