redis replication lag

Created on 9 Feb 2015  路  4Comments  路  Source: redis/redis

In my production , the slaves replication is always lag for many hours!
127.0.0.1:6379> info replication

Replication

role:master
connected_slaves:2
slave0:ip=10.xxx.xxx.xxx,port=6379,state=online,offset=416543935501,lag=1
slave1:ip=10.xxx.xxx.xxx,port=6379,state=online,offset=416543965574,lag=1
master_repl_offset:416543969598
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:416542921023
repl_backlog_histlen:1048576

the offset of slave* and master_repl_offset are big different.
how to solve or optimize the problem?

Most helpful comment

The values here all make sense, they just aren't clearly explained and have some weird names.

When you see repl_backlog those are only for PSYNC. So, they could probably be named psync_buffer for the same effect.

repl_backlog_size is the capacity of a buffer holding data for PSYNC. repl_backlog_histlen is how much actual data is in the PSYNC buffer. They will usually be equal since repl_backlog_histlen can only grow as big as repl_backlog_size.

Also notice how the backlog first byte offset (repl_backlog_first_byte_offset) is equal to the maximum PSYNC buffer size (repl_backlog_size) which is also equal to the currently populated PSYNC buffer data (repl_backlog_histlen). So, master_repl_offset - repl_backlog_first_byte_offset = repl_backlog_size: 416543969598 - 416542921023 = 1048575 (yeah, there's an off-by-one error somewhere).

The actual lag is the difference between each slave offset and the master_repl_offset. So, in this case, slave0 is 416543969598 - 416543935501 = 34 KB behind the master and slave1 is 416543969598 - 416543965574 = 4 KB behind the master.

The _actual_ replication lag could be reported nicer in the INFO output, but... it isn't. :-\

All 4 comments

repl_backlog_size is equal the repl_backlog_histlen value, the buffer size(default 1048576 = 1M) is too small

The values here all make sense, they just aren't clearly explained and have some weird names.

When you see repl_backlog those are only for PSYNC. So, they could probably be named psync_buffer for the same effect.

repl_backlog_size is the capacity of a buffer holding data for PSYNC. repl_backlog_histlen is how much actual data is in the PSYNC buffer. They will usually be equal since repl_backlog_histlen can only grow as big as repl_backlog_size.

Also notice how the backlog first byte offset (repl_backlog_first_byte_offset) is equal to the maximum PSYNC buffer size (repl_backlog_size) which is also equal to the currently populated PSYNC buffer data (repl_backlog_histlen). So, master_repl_offset - repl_backlog_first_byte_offset = repl_backlog_size: 416543969598 - 416542921023 = 1048575 (yeah, there's an off-by-one error somewhere).

The actual lag is the difference between each slave offset and the master_repl_offset. So, in this case, slave0 is 416543969598 - 416543935501 = 34 KB behind the master and slave1 is 416543969598 - 416543965574 = 4 KB behind the master.

The _actual_ replication lag could be reported nicer in the INFO output, but... it isn't. :-\

Useful explanation @mattsta, would be great to have it on http://redis.io/commands/info

@antirez Would you be willing to accept a patchset that adds a field with the difference between slavex.offset and master_repl_offset to be able to see at a glance the number of bytes that each slave lags behind the master?

Was this page helpful?
0 / 5 - 0 ratings