Ssdb: SSDB new node in OUT_OF_SYNC / INIT state

Created on 28 Aug 2016  路  5Comments  路  Source: ideawu/ssdb

The brand new node of the master-master replica sync almost all data, and then stops, in the OUT_OF_SYNC state - how I can get the nodes into a SYNC state?

Server with data:

tools/ssdb-cli -n info                                                                                           

version 
        1.9.2
links   
        9
total_calls
        147
dbsize  
        11445687379
binlogs
            capacity : 20000000
            min_seq  : 470675797
            max_seq  : 490675798
replication
        client 10.0.1.90:56799
            type     : mirror
            status   : SYNC
            last_seq : 490675798
replication
        slaveof 10.0.1.90:8888
            id         : 10.0.1.90|8888
            type       : mirror
            status     : INIT
            last_seq   : 490632888
            copy_count : 0
            sync_count : 0


New server

version
        1.9.2
links   
        11
total_calls
        165
dbsize
        9749034199
binlogs
            capacity : 20000000
            min_seq  : 1
            max_seq  : 5253935
replication
        client 10.0.1.72:34747
            type     : mirror
            status   : OUT_OF_SYNC
            last_seq : 490632888
replication
        slaveof 10.0.1.72:8888
            id         : 10.0.1.72|8888
            type       : mirror
            status     : SYNC
            last_seq   : 490675798
            copy_count : 0
            sync_count : 0
serv_key_range

Most helpful comment

There is a better sync procedure that does not involve downtime of the cluster as a whole, but you need to switch to the other master.

Node A is the one holding the most up to date data, node B is corrupt or simply a new node:

  1. stop node B, remove its data and meta directories
  2. start node B, let it sync with node A.
  3. node A is now OUT_OF_SYNC wrt node B
  4. switch your master service over to node B
  5. stop node A, remove only it's meta folder
  6. start node A, let it sync with node B
  7. now both nodes will be SYNC again.
  8. optional: switch master service back to node A

All 5 comments

When a node get into the OUT_OF_SYNC state, you must:

  1. stop it
  2. remove it's data and meta folder
  3. restart it

Since you set those nodes as a master-master architecture, instead, you must:

  1. stop old-server, new-server
  2. remove the old's data and meta folder
  3. remove the new's meta
  4. restart the old and new.

Just to sum-up the procedure, please be carefull where you remove the data from :)

First determine which node has correct/more recent data, we would call this node A.

Procedure for setting up replication:

  1. make backup of both nodes (in case of failure, it may be helpful to recover data)
  2. stop both nodes
  3. delete data and meta folder on node B (example location of those folders is /data/ssdb/var)
  4. delete meta folder on node A
  5. start node A, and wait a moment till it rebuild meta folder
  6. start node B
  7. nodes are in sync when replication status read from command ssdb-cli -n info is SYNC (during synchronization state is COPY)
  8. check log file for potential errors

Yes, making backups before any operation is a good practice.

There is a better sync procedure that does not involve downtime of the cluster as a whole, but you need to switch to the other master.

Node A is the one holding the most up to date data, node B is corrupt or simply a new node:

  1. stop node B, remove its data and meta directories
  2. start node B, let it sync with node A.
  3. node A is now OUT_OF_SYNC wrt node B
  4. switch your master service over to node B
  5. stop node A, remove only it's meta folder
  6. start node A, let it sync with node B
  7. now both nodes will be SYNC again.
  8. optional: switch master service back to node A

@geert-hendrickx-be your procedure seems good, however in usable scenarios in order to scale dynamically sometimes you want to add new nodes into existing clusters. In your example, if node B is a completely new node then node A still needs to be restarted in order to allow B to sync to it, because node A needs to know B in its configuration file and this requires a restart.

Was this page helpful?
0 / 5 - 0 ratings