Ssdb: SSDB new node in OUT_OF_SYNC / INIT state

Created on 28 Aug 2016 · 5Comments · Source: ideawu/ssdb

The brand new node of the master-master replica sync almost all data, and then stops, in the OUT_OF_SYNC state - how I can get the nodes into a SYNC state?

Server with data:

tools/ssdb-cli -n info                                                                                           

version 
        1.9.2
links   
        9
total_calls
        147
dbsize  
        11445687379
binlogs
            capacity : 20000000
            min_seq  : 470675797
            max_seq  : 490675798
replication
        client 10.0.1.90:56799
            type     : mirror
            status   : SYNC
            last_seq : 490675798
replication
        slaveof 10.0.1.90:8888
            id         : 10.0.1.90|8888
            type       : mirror
            status     : INIT
            last_seq   : 490632888
            copy_count : 0
            sync_count : 0


New server

version
        1.9.2
links   
        11
total_calls
        165
dbsize
        9749034199
binlogs
            capacity : 20000000
            min_seq  : 1
            max_seq  : 5253935
replication
        client 10.0.1.72:34747
            type     : mirror
            status   : OUT_OF_SYNC
            last_seq : 490632888
replication
        slaveof 10.0.1.72:8888
            id         : 10.0.1.72|8888
            type       : mirror
            status     : SYNC
            last_seq   : 490675798
            copy_count : 0
            sync_count : 0
serv_key_range

Source

sirkubax

Most helpful comment

There is a better sync procedure that does not involve downtime of the cluster as a whole, but you need to switch to the other master.

Node A is the one holding the most up to date data, node B is corrupt or simply a new node:

stop node B, remove its data and meta directories
start node B, let it sync with node A.
node A is now OUT_OF_SYNC wrt node B
switch your master service over to node B
stop node A, remove only it's meta folder
start node A, let it sync with node B
now both nodes will be SYNC again.
optional: switch master service back to node A

ghen2 on 3 Sep 2019

👍2

All 5 comments

When a node get into the OUT_OF_SYNC state, you must:

stop it
remove it's data and meta folder
restart it

Since you set those nodes as a master-master architecture, instead, you must:

stop old-server, new-server
remove the old's data and meta folder
remove the new's meta
restart the old and new.

ideawu on 29 Aug 2016

Just to sum-up the procedure, please be carefull where you remove the data from :)

First determine which node has correct/more recent data, we would call this node A.

Procedure for setting up replication:

make backup of both nodes (in case of failure, it may be helpful to recover data)
stop both nodes
delete data and meta folder on node B (example location of those folders is /data/ssdb/var)
delete meta folder on node A
start node A, and wait a moment till it rebuild meta folder
start node B
nodes are in sync when replication status read from command ssdb-cli -n info is SYNC (during synchronization state is COPY)
check log file for potential errors

sirkubax on 10 Jan 2017

👍1

Yes, making backups before any operation is a good practice.

ideawu on 13 Jan 2017

There is a better sync procedure that does not involve downtime of the cluster as a whole, but you need to switch to the other master.

Node A is the one holding the most up to date data, node B is corrupt or simply a new node:

stop node B, remove its data and meta directories
start node B, let it sync with node A.
node A is now OUT_OF_SYNC wrt node B
switch your master service over to node B
stop node A, remove only it's meta folder
start node A, let it sync with node B
now both nodes will be SYNC again.
optional: switch master service back to node A

ghen2 on 3 Sep 2019

👍2

@geert-hendrickx-be your procedure seems good, however in usable scenarios in order to scale dynamically sometimes you want to add new nodes into existing clusters. In your example, if node B is a completely new node then node A still needs to be restarted in order to allow B to sync to it, because node A needs to know B in its configuration file and this requires a restart.