Hive: Hive sync: Let's discuss it.

Created on 12 Sep 2019  Â·  20Comments  Â·  Source: hivedb/hive

In the future I want to support syncing Hive with a remote database.

It would be helpful if you could share your needs & ideas.

One of the most obvious use cases is backing up data (for example settings or messages etc.) I think Firebase should be one of the first supported remotes.

enhancement

Most helpful comment

Any plans to sync with Firestore? That would be awesome

All 20 comments

I like the idea. If it has functions that need to be set up on initialization like FCM does, it would be easy for a custom solution to be added. But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

Yes that's true. Much easier would be an implementation which just creates a backup of Hive.

The goal is to support full sync (including support for remote changes)

It could possibly be simpler if you made adaptors for different remote db
types? Sql, NoSQL, that way the similarities could be abstracted or
simplified for the user?

On Fri, Sep 13, 2019 at 12:02 AM Simon Leier notifications@github.com
wrote:

But it still will require a lot of input from the user because of handling
updating Hive when the app reopens but the remote db has changed, etc

Yes that's true. Much easier would be an implementation which just creates
a backup of Hive.

The goal is to support full sync (including support for remote changes)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/leisim/hive/issues/49?email_source=notifications&email_token=AFPYO7IOLUVYC4AL3TQRMHTQJM3IFA5CNFSM4IWHUJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UE4FI#issuecomment-531123733,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFPYO7OCSFDU55Y4W2FP4DDQJM3IFANCNFSM4IWHUJCQ
.

>

Think Digital
323-638-9448
760-678-8833
Facebook.com/ThinkDigitalRepair

Yes, I'll start experimenting once queries and asset DBs are ready...

If anyone has time and wants to contribute, I'll be there to help.

Are s3, onedrive, google drive, dropbox on the plan?

I hope to make it very easy to write a sync solution for every service. This will take time tho.

You could provide an interface so every user can implement what he wants.

It would be great to know which information users actually need. Do you have something in mind?

I'm thinking as I write, so forgive some reasoning holes.

My ideas is having a simple interface with 2 or 4 functions like these:

//to read and write only one element
bool writeToServer(String key, dynamic value, String boxName, Type runtimeType);
T readFromServer(String key, dynamic value, String boxName, Type runtimeType);


// to read and write all values in one time
bool writeToServer(Map<String, dynamic> maps, String boxName, Type runtimeType);
Map<String, dynamic> readFromServer(String boxName);

boxName parameter (and the others) can be useful to the user to know what service call.

I suppose that the first two functions can be used with lazy box.

Now user have to implement this interface and iinjects the implementation within hive, so hive can call this object to sync remotely.

This is just a raw idea.

PS. Sorry for my pseudocode

Looks good! Thanks.

@chemickypes The question of this way is that we have to encrypt and decrypt the data both on client and server.

The best way is reuse the binary format of Hive as I post in this issue.

And also, if we sync to s3, onedrive, google drive, dropbox, we have to do the encryption the decryption on client.

@leedstyh
Since the data is stored unencrypted in memory, it will not be necessary to decrypt it before syncing.

I'm not sure about using the binary format. It would be necessary to run Hive on the server too since the binary format can only be used by Hive.

Nope, the server not necessary to run Hive. The server will not process the data, just store it. Think about syncing to google drive.

@leedstyh I think that we can split this problem into two smaller problems:

  • Use the remote server like an extension of locale hive
  • Use Hive like a delegate and its goal is get the data from the server when it needs, and the user will not distinguish where the data will come from because it has only one access point.

I don't know what @leisim wants to do with Hive.

If the goal is offline editing then we are into the CRDT and vector clocks territory maybe.

Typically the domain model needs to think in terms of Mutations or Ops. This works when no allowing offline editing is allowed because the Server Time is the Global TIME.
When you want to support offline editing you need a way to merge changes.
CRDT, OPS, and Vector clocks is this area. The changes are happening in different time domains now.

Here is something to get the ball rolling maybe..
It is a flutter example that has basic support for offline editing.
https://github.com/memspace/zefyr
It uses Operations and logs them.
But is does not have vector clock support.
The data model is using the quill approach.
https://github.com/memspace/zefyr/blob/master/packages/notus/lib/src/heuristics.dart#L6

https://github.com/pulyaevskiy/quill-delta-dart

  • this is where the real OT ( Operational Transform) guts is.

Now rather than use vector clocks, sometimes you can use Context within the data.
I think this is what zefyr uses, but am not sure.

I happened to stumble on this CRDT implemenation.

docs: https://cluster.ipfs.io/documentation/guides/consensus/
At the bottom its nice to see that the make the distinction between CRDT and RAFT properly.

It means that you can have data on the same types on many devices, and merge them independently.
You dont need to make OT's ( Operational Transforms ) which is very painful and limiting.

This is the Core lib.
https://github.com/ipfs/go-ds-crdt
That lib is used for IPFS Cluster to allow it to synchronise data.
https://github.com/ipfs/ipfs-cluster/blob/master/consensus/crdt/consensus.go

I really hope this is picked up with hive.

I think its an excellent basis for Hive Sync.

Thanks for your valuable input. I'll definitely take a look at these projects and try to implement something similar with hive.

Any plans to sync with Firestore? That would be awesome

are there any updates on this? For a uni project, I was looking into a synchronization layer design using CRDTs. Here is my repo in dart https://github.com/Manuelbaun/sync_layer_crdt_playground.

I actually use some form of delta-crdts, sending only the mutation instead of the operation or the full state. It works nicely, but it adds a lot of overhead. For time tracking, I use Hybrid logical clocks. In my use cases ( just prototyping) my design worked fine but has a lot of work todo. For instance, I am not deleting anything and garbage collection will be needed at some point.

any of cause, there are a lot of design issues 😆 and I didn't made it into a library just yet

Was this page helpful?
0 / 5 - 0 ratings