Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slave中的dump目录下binlog日志不完整,无法回档到指定时间点 #63

Open
AlbertGithubHome opened this issue Jan 22, 2021 · 8 comments
Labels
enhancement New feature or request

Comments

@AlbertGithubHome
Copy link

Description

http://tendis.cn/#/Tendisplus/%E8%BF%90%E7%BB%B4/backup
参考文档描述,主从结构中,快照配合binlog日志可以回档到任意时间点,但是目前发现slave中的binlog日志并不完整,貌似总有一部分残留在内存中,比如我有10条写记录,但是binlog中只有3条,当我的写记录到达20条时,binlog中大概会出现5条记录,请问这种情况是正常的吗?

Current Behavior

slave信息

> info replication
# Replication
role:slave
master_host:10.2.49.172
master_port:51003
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:26
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:26
rocksdb0_master:ip=10.2.49.172,port=51003,src_store_id=0,state=online,binlog_pos=1,lag=1
rocksdb1_master:ip=10.2.49.172,port=51003,src_store_id=1,state=online,binlog_pos=2,lag=0
rocksdb2_master:ip=10.2.49.172,port=51003,src_store_id=2,state=online,binlog_pos=3,lag=0
rocksdb3_master:ip=10.2.49.172,port=51003,src_store_id=3,state=online,binlog_pos=4,lag=0
rocksdb4_master:ip=10.2.49.172,port=51003,src_store_id=4,state=online,binlog_pos=2,lag=0
rocksdb5_master:ip=10.2.49.172,port=51003,src_store_id=5,state=online,binlog_pos=2,lag=0
rocksdb6_master:ip=10.2.49.172,port=51003,src_store_id=6,state=online,binlog_pos=2,lag=0
rocksdb7_master:ip=10.2.49.172,port=51003,src_store_id=7,state=online,binlog_pos=4,lag=0
rocksdb8_master:ip=10.2.49.172,port=51003,src_store_id=8,state=online,binlog_pos=4,lag=0
rocksdb9_master:ip=10.2.49.172,port=51003,src_store_id=9,state=online,binlog_pos=2,lag=0

binlog查询信息,和info中的binlog_pos总是差一点

# a @ a-pc in ~/WorkSpace/tendisplus/tendisplus-2.1.2-rocksdb-v5.13.4/scripts/slave/dump
$ ../../../bin/binlog_tool --logfile=9/binlog-9-0000001-20210121115133.log
storeid:9 binlogid:1 txnid:34333 chunkid:15759 ts:1611231479844 cmdstr:set
  op:1 fkey:i skey: opvalue:9

# a @ a-pc in ~/WorkSpace/tendisplus/tendisplus-2.1.2-rocksdb-v5.13.4/scripts/slave/dump
$ ../../../bin/binlog_tool --logfile=8/binlog-8-0000002-20210121113821.log
storeid:8 binlogid:2 txnid:14092 chunkid:3168 ts:1611212979523 cmdstr:set
  op:1 fkey:f skey: opvalue:6
storeid:8 binlogid:3 txnid:35952 chunkid:11958 ts:1611232954249 cmdstr:set
  op:1 fkey:q skey: opvalue:17

Steps to Reproduce (for bugs)

新建立一个主从tendis结构,然后在master上set20个key的值,查询slave的binlog日志就可以

Your Environment

  • Ubuntu 16.04
  • 下载的最新release包
  • master-slave都在一台物理机上
@TendisDev
Copy link
Collaborator

# a @ a-pc in ~/WorkSpace/tendisplus/tendisplus-2.1.2-rocksdb-v5.13.4/scripts/slave/dump
$ ../../../bin/binlog_tool --logfile=9/binlog-9-0000001-20210121115133.log
storeid:9 binlogid:1 txnid:34333 chunkid:15759 ts:1611231479844 cmdstr:set
  op:1 fkey:i skey: opvalue:9

# a @ a-pc in ~/WorkSpace/tendisplus/tendisplus-2.1.2-rocksdb-v5.13.4/scripts/slave/dump
$ ../../../bin/binlog_tool --logfile=8/binlog-8-0000002-20210121113821.log
storeid:8 binlogid:2 txnid:14092 chunkid:3168 ts:1611212979523 cmdstr:set
  op:1 fkey:f skey: opvalue:6
storeid:8 binlogid:3 txnid:35952 chunkid:11958 ts:1611232954249 cmdstr:set
  op:1 fkey:q skey: opvalue:17

这里之操作了8/9/两个目录的binlog的其中一个文件

0/~7/没有内容?

另外,binlog也有cache,有可能还没有刷到磁盘中,可以执行binlogflush强制刷盘确实是否存在buffer中

@AlbertGithubHome
Copy link
Author

@TendisDev 1/~7/也是有内容的,不过对照我测试的数据总是会缺少一些,执行 binlogflush 后binlog文件没有发生任何变化,但是当我又设置了4个新key的值时,binlog日志中出现了两条新设置的值和两条很久以前设置的值,这种表现就像是cache中总会残留一部分的binlog不刷新到磁盘,必须通过新的数据变化来把之前的binlog刷新到磁盘,难道是操作的数据量太小了?

@TendisDev
Copy link
Collaborator

可以将kvstorecount 1加入到配置中,只有一个rocksdb,再试一下
这样所有binlog都会一个目录中,便于分析

然后将执行的命令,配置,binlogtool的结果都发出来一起看看

@AlbertGithubHome
Copy link
Author

AlbertGithubHome commented Jan 26, 2021

@TendisDev 配置kvstorecount 1之后缺少了最后一条binlog

mater操作

10.2.49.172:51003> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.2.49.172,port=51004,state=online,offset=0,lag=0
master_repl_offset:0
rocksdb0_slave0:ip=10.2.49.172,port=51004,dest_store_id=0,state=online,binlog_pos=0,lag=0,binlog_lag=0

10.2.49.172:51003> set a 1
OK
10.2.49.172:51003> set a 2
OK
10.2.49.172:51003> set a 3
OK
10.2.49.172:51003> set a 1
OK
10.2.49.172:51003> set b 2
OK
10.2.49.172:51003> set c 3
OK
10.2.49.172:51003> set d 4
OK
10.2.49.172:51003> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.2.49.172,port=51004,state=online,offset=7,lag=0
master_repl_offset:7
rocksdb0_slave0:ip=10.2.49.172,port=51004,dest_store_id=0,state=online,binlog_pos=7,lag=0,binlog_lag=0

slave操作

10.2.49.172:51004> slaveof 10.2.49.172 51003
OK
10.2.49.172:51004> info replication
# Replication
role:slave
master_host:10.2.49.172
master_port:51003
master_link_status:up
master_last_io_seconds_ago:339
master_sync_in_progress:0
slave_repl_offset:0
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
rocksdb0_master:ip=10.2.49.172,port=51003,src_store_id=0,state=online,binlog_pos=0,lag=1611647851

10.2.49.172:51004> info replication
# Replication
role:slave
master_host:10.2.49.172
master_port:51003
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:7
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:7
rocksdb0_master:ip=10.2.49.172,port=51003,src_store_id=0,state=online,binlog_pos=7,lag=0

10.2.49.172:51004> get a
"1"
10.2.49.172:51004> get d
"4"
10.2.49.172:51004> binlogflush all
OK
(1.03s)

查询binlog日志得到如下内容:

# a @ a-pc in ~/WorkSpace/tendisplus/tendisplus-2.1.2-rocksdb-v5.13.4/scripts/slave/dump/0 [15:59:53]
$ ../../../../bin/binlog_tool --logfile=binlog-0-0000001-20210126155152.log
storeid:0 binlogid:1 txnid:75 chunkid:15495 ts:1611647885630 cmdstr:set
  op:1 fkey:a skey: opvalue:1
storeid:0 binlogid:2 txnid:81 chunkid:15495 ts:1611647888862 cmdstr:set
  op:1 fkey:a skey: opvalue:2
storeid:0 binlogid:3 txnid:89 chunkid:15495 ts:1611647891919 cmdstr:set
  op:1 fkey:a skey: opvalue:3
storeid:0 binlogid:4 txnid:104 chunkid:15495 ts:1611647898046 cmdstr:set
  op:1 fkey:a skey: opvalue:1
storeid:0 binlogid:5 txnid:121 chunkid:3300 ts:1611647905140 cmdstr:set
  op:1 fkey:b skey: opvalue:2
storeid:0 binlogid:6 txnid:129 chunkid:7365 ts:1611647907905 cmdstr:set
  op:1 fkey:c skey: opvalue:3

即使执行了 binlogflush all命令,也缺少set d 4 这个操作的binlog日志,当kvstorecount的值越大时,这种缺失就越明显

@TendisDev
Copy link
Collaborator

因为Tendis目前实现上总保留了一个binlog在rocksdb中,由这个参数控制slaveBinlogKeepNum,所以最后一个没有导出。
如果想导出,可以保证有持续的操作,或者额外写一个监控数据产生一条binlog

后续计划增加一个心跳binlog来保证长时间没有操作的实例,binlog可以及时导出。

@AlbertGithubHome
Copy link
Author

我看到文档中这个参数默认是1,也就是说每个rocksdb会保留一条,我改成0会有什么影响吗?生产环境中额外写一条监控数据也无法准确对应到每个rocksdb啊,当然在持续操作的系统中应该没有问题

@TendisDev
Copy link
Collaborator

目前这个最小值就是1,这个问题我们优化下

@TendisDev TendisDev added the enhancement New feature or request label Jan 27, 2021
@AlbertGithubHome
Copy link
Author

thanks O(∩_∩)O

TendisDev added a commit that referenced this issue Apr 21, 2021
[bugfix] lua support cjson
[bugfix] lua support cjson #63
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants