Hive: Hive sync: Let's discuss it.

Created on 12 Sep 2019 · 20Comments · Source: hivedb/hive

In the future I want to support syncing Hive with a remote database.

It would be helpful if you could share your needs & ideas.

One of the most obvious use cases is backing up data (for example settings or messages etc.) I think Firebase should be one of the first supported remotes.

enhancement

Source

leisim

👍22 ❤3

Most helpful comment

Any plans to sync with Firestore? That would be awesome

zenkog on 3 Dec 2019

👍6 🚀1 ❤1

All 20 comments

I like the idea. If it has functions that need to be set up on initialization like FCM does, it would be easy for a custom solution to be added. But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

ThinkDigitalSoftware on 12 Sep 2019

But it still will require a lot of input from the user because of handling updating Hive when the app reopens but the remote db has changed, etc

Yes that's true. Much easier would be an implementation which just creates a backup of Hive.

The goal is to support full sync (including support for remote changes)

leisim on 13 Sep 2019

It could possibly be simpler if you made adaptors for different remote db
types? Sql, NoSQL, that way the similarities could be abstracted or
simplified for the user?

On Fri, Sep 13, 2019 at 12:02 AM Simon Leier notifications@github.com
wrote:

But it still will require a lot of input from the user because of handling
updating Hive when the app reopens but the remote db has changed, etc

Yes that's true. Much easier would be an implementation which just creates
a backup of Hive.

The goal is to support full sync (including support for remote changes)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/leisim/hive/issues/49?email_source=notifications&email_token=AFPYO7IOLUVYC4AL3TQRMHTQJM3IFA5CNFSM4IWHUJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6UE4FI#issuecomment-531123733,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFPYO7OCSFDU55Y4W2FP4DDQJM3IFANCNFSM4IWHUJCQ
.

>

Think Digital
323-638-9448
760-678-8833
Facebook.com/ThinkDigitalRepair

ThinkDigitalSoftware on 13 Sep 2019

Yes, I'll start experimenting once queries and asset DBs are ready...

If anyone has time and wants to contribute, I'll be there to help.

leisim on 13 Sep 2019

Are s3, onedrive, google drive, dropbox on the plan?

leedstyh on 13 Sep 2019

👍7

I hope to make it very easy to write a sync solution for every service. This will take time tho.

leisim on 13 Sep 2019

You could provide an interface so every user can implement what he wants.

chemickypes on 17 Sep 2019

It would be great to know which information users actually need. Do you have something in mind?

leisim on 17 Sep 2019

I'm thinking as I write, so forgive some reasoning holes.

My ideas is having a simple interface with 2 or 4 functions like these:

//to read and write only one element
bool writeToServer(String key, dynamic value, String boxName, Type runtimeType);
T readFromServer(String key, dynamic value, String boxName, Type runtimeType);


// to read and write all values in one time
bool writeToServer(Map<String, dynamic> maps, String boxName, Type runtimeType);
Map<String, dynamic> readFromServer(String boxName);

boxName parameter (and the others) can be useful to the user to know what service call.

I suppose that the first two functions can be used with lazy box.

Now user have to implement this interface and iinjects the implementation within hive, so hive can call this object to sync remotely.

This is just a raw idea.

PS. Sorry for my pseudocode

chemickypes on 17 Sep 2019

Looks good! Thanks.

leisim on 17 Sep 2019

👍3

@chemickypes The question of this way is that we have to encrypt and decrypt the data both on client and server.

The best way is reuse the binary format of Hive as I post in this issue.

And also, if we sync to s3, onedrive, google drive, dropbox, we have to do the encryption the decryption on client.

leedstyh on 18 Sep 2019

@leedstyh
Since the data is stored unencrypted in memory, it will not be necessary to decrypt it before syncing.

I'm not sure about using the binary format. It would be necessary to run Hive on the server too since the binary format can only be used by Hive.

leisim on 18 Sep 2019

Nope, the server not necessary to run Hive. The server will not process the data, just store it. Think about syncing to google drive.

leedstyh on 18 Sep 2019

👍1

@leedstyh I think that we can split this problem into two smaller problems:

Use the remote server like an extension of locale hive
Use Hive like a delegate and its goal is get the data from the server when it needs, and the user will not distinguish where the data will come from because it has only one access point.

I don't know what @leisim wants to do with Hive.

chemickypes on 18 Sep 2019

If the goal is offline editing then we are into the CRDT and vector clocks territory maybe.

Typically the domain model needs to think in terms of Mutations or Ops. This works when no allowing offline editing is allowed because the Server Time is the Global TIME.
When you want to support offline editing you need a way to merge changes.
CRDT, OPS, and Vector clocks is this area. The changes are happening in different time domains now.

Here is something to get the ball rolling maybe..
It is a flutter example that has basic support for offline editing.
https://github.com/memspace/zefyr
It uses Operations and logs them.
But is does not have vector clock support.
The data model is using the quill approach.
https://github.com/memspace/zefyr/blob/master/packages/notus/lib/src/heuristics.dart#L6

https://github.com/pulyaevskiy/quill-delta-dart

this is where the real OT ( Operational Transform) guts is.

Now rather than use vector clocks, sometimes you can use Context within the data.
I think this is what zefyr uses, but am not sure.

joeblew99 on 19 Sep 2019

I happened to stumble on this CRDT implemenation.

docs: https://cluster.ipfs.io/documentation/guides/consensus/
At the bottom its nice to see that the make the distinction between CRDT and RAFT properly.

It means that you can have data on the same types on many devices, and merge them independently.
You dont need to make OT's ( Operational Transforms ) which is very painful and limiting.

This is the Core lib.
https://github.com/ipfs/go-ds-crdt
That lib is used for IPFS Cluster to allow it to synchronise data.
https://github.com/ipfs/ipfs-cluster/blob/master/consensus/crdt/consensus.go

I really hope this is picked up with hive.

I think its an excellent basis for Hive Sync.

joeblew99 on 24 Sep 2019

🎉1

Thanks for your valuable input. I'll definitely take a look at these projects and try to implement something similar with hive.

leisim on 24 Sep 2019

Any plans to sync with Firestore? That would be awesome

zenkog on 3 Dec 2019

👍6 🚀1 ❤1

+1000 to this! https://github.com/hivedb/hive/issues/49#issuecomment-561233736

ghost on 26 Dec 2019

👍3

are there any updates on this? For a uni project, I was looking into a synchronization layer design using CRDTs. Here is my repo in dart https://github.com/Manuelbaun/sync_layer_crdt_playground.

I actually use some form of delta-crdts, sending only the mutation instead of the operation or the full state. It works nicely, but it adds a lot of overhead. For time tracking, I use Hybrid logical clocks. In my use cases ( just prototyping) my design worked fine but has a lot of work todo. For instance, I am not deleting anything and garbage collection will be needed at some point.

any of cause, there are a lot of design issues 😆 and I didn't made it into a library just yet