2.1.2 断点续传

2.2 PSYNC协议

2.3 心跳机制

3 哨兵Sentinel

1.配置主从节点

主从关系的配置有三种方式：

配置式：根据缓存学习（五）：Redis安装、配置，在redis.conf配置slaveof参数
启动参数式：在启动redis-server时，使用--slaveof选项配置
命令式：使用slaveof命令

需要注意的是，如果主节点配置了requirepass，从节点需要配置materauth才能连接，下面是一个连接示例（6379端口实例为主节点，此处启动的6380端口实例为从节点）：

root@Yhc-Surface:~# redis-server --port 6380 --slaveof localhost 6379
...
24:S 28 Apr 2019 14:31:19.041 * Connecting to MASTER localhost:6379
24:S 28 Apr 2019 14:31:19.046 * MASTER <-> REPLICA sync started
24:S 28 Apr 2019 14:31:19.048 * Non blocking connect for SYNC fired the event.
24:S 28 Apr 2019 14:31:19.049 * Master replied to PING, replication can continue...
24:S 28 Apr 2019 14:31:19.050 * Partial resynchronization not possible (no cached master)
24:S 28 Apr 2019 14:31:19.056 * Full resync from master: 1197e315344ccb5e97581f126ca28388080feba1:0
24:S 28 Apr 2019 14:31:19.082 * MASTER <-> REPLICA sync: receiving 1070 bytes from master
24:S 28 Apr 2019 14:31:19.084 * MASTER <-> REPLICA sync: Flushing old data
24:S 28 Apr 2019 14:31:19.084 * MASTER <-> REPLICA sync: Loading DB in memory
24:S 28 Apr 2019 14:31:19.086 * MASTER <-> REPLICA sync: Finished with success

可以看到，6380节点启动后自动连接主节点并进行了一次全量复制。此时在6379插入一条数据，在6380进行查询：

127.0.0.1:6379> set hello world
OK

127.0.0.1:6380> get hello
"world"

断开连接只需要在从节点执行slaveof no one命令即可，同理，如果要更换主节点，也只需要使用slaveof命令即可，但是更换主节点后，原本的数据会丢失（断开与主节点的连接则不会清空数据）：

127.0.0.1:6380> slaveof localhost 6381
OK
127.0.0.1:6380> get hello
(nil)

在Redis副本机制中，会在初次连接以及长时间断开连接时进行全量复制，之后每次执行写入操作都会自动复制，如果短期断开连接，则从会在重新连接后，进行增量同步，无论哪种复制方式，都默认是异步的。在默认情况下，从节点只处理读请求。

Redis副本有三种结构：

最简单的是一主一从结构，这种方式可以帮助主节点分担至少一半的读请求，此外还可以在主节点处关闭持久化，将持久化的压力转移到从节点上，不过这样做的一个后果是，如果主节点下线后又立即重启，会因为没有日志或快照，导致主节点数据清空，经过同步后把从节点的数据也清空了。这种时候，需要先断开主从节点之间的连接，然后重启主节点，再倒转主从关系。

第二种结构是星型结构，即一主多从，这种结构适合读多写少的场景，缺点是初次连接时容易造成主节点拥塞，一个改进思路是让从节点分批上线。

第三种结构是树状结构，即某些从节点作为其他从节点的主节点，实现级联复制，这样一来可以避免多写对主节点造成的压力，二来可以实现指数级的数据扩散速度，缺点就是比较难管理。

Redis副本的相关信息可以通过info replication或role命令查看，示例如下：

127.0.0.1:6379> role
1) "master"
2) (integer) 8637
3) 1) 1) "127.0.0.1"
      2) "6380"
      3) "8637"
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=6380,state=online,offset=8637,lag=1
master_replid:32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:8637
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:8637

对于role命令，其在主节点上输出有三部分，1）是其角色，有master和slave、sentinel三种，2）是主节点复制偏移量，3）是连接的从节点。在从节点上输出有5部分，分别是：角色（slave）、主节点IP、主节点端口、复制状态（connected、connecting、sync三种）、一共从主节点接收了多少数据。sentinel的下面再介绍。

对于info replication，内容比较长，但是自描述性较好就不做介绍了。

2.复制原理

2.1 复制流程

2.1.1 全量复制

首先看一下主从双方的日志，首先是主节点；

192:M 28 Apr 2019 15:25:47.935 * Replica 127.0.0.1:6380 asks for synchronization
192:M 28 Apr 2019 15:25:47.936 * Full resync requested by replica 127.0.0.1:6380
192:M 28 Apr 2019 15:25:47.936 * Starting BGSAVE for SYNC with target: disk
192:M 28 Apr 2019 15:25:47.948 * Background saving started by pid 200
200:C 28 Apr 2019 15:25:47.959 * DB saved on disk
192:M 28 Apr 2019 15:25:48.050 * Background saving terminated with success
192:M 28 Apr 2019 15:25:48.052 * Synchronization with replica 127.0.0.1:6380 succeeded

然后是从节点：

196:S 28 Apr 2019 15:25:47.922 * Connecting to MASTER localhost:6379
196:S 28 Apr 2019 15:25:47.931 * MASTER <-> REPLICA sync started
196:S 28 Apr 2019 15:25:47.931 * Non blocking connect for SYNC fired the event.
196:S 28 Apr 2019 15:25:47.933 * Master replied to PING, replication can continue...
196:S 28 Apr 2019 15:25:47.933 * Partial resynchronization not possible (no cached master)
196:S 28 Apr 2019 15:25:47.950 * Full resync from master: 32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2:0
196:S 28 Apr 2019 15:25:48.051 * MASTER <-> REPLICA sync: receiving 175 bytes from master
196:S 28 Apr 2019 15:25:48.053 * MASTER <-> REPLICA sync: Flushing old data
196:S 28 Apr 2019 15:25:48.056 * MASTER <-> REPLICA sync: Loading DB in memory
196:S 28 Apr 2019 15:25:48.057 * MASTER <-> REPLICA sync: Finished with success

根据以上内容，可以推断出如下复制流程：

首先，从节点连接主节点，由于是初次连接，因此，连接成功后需要进行全量复制，为了确保可以进行复制，首先进行了一次PING
如果PING成功，则由从节点发出SYNC信号，发起全量同步事件
主节点收到SYNC后，开始BGSAVE操作，将当前数据保存为快照
快照保存完毕后，开始发送给从节点，从节点清空旧数据并载入主节点发来的新数据

SYNC是Redis的同步协议，不过目前已经改用PSYNC，后者支持部分同步，如果直接向6379端口发送SYNC，会有如下输出：

root@Yhc-Surface:~# echo 'SYNC' | nc localhost 6379
$410
REDIS0009       redis-ver5.0.4
redis-bits@ctime¨]used-memB repl-stream-db repl-id(32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2repl-offsetMaof-pre:hobbymalet@K@__   K   programming
sexualage redisgood helloworld testmyteststream  jb        <<     nameagesexual   zhangsan  male  jb  dO*1
$4
PING
*1
$4
PING
*1
$4
PING
... //每隔几秒输出一次 *1\r\n$4\r\nPING\r\n

SYNC实际就是传输RDB的内容。PSYNC会在下面介绍。

可以看到，当进行全量同步时，有如下日志：

Full resync from master: 32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2:0

这里的master被一个字符串标记了，以冒号为分隔，由两部分组成，前面较长的一串被称为副本ID，是一个伪随机串，用来标记数据集，后面的0是偏移量，用来标记从数据集的哪一部分开始复制，这里由于是全量复制，所以肯定是从0开始。

该流程建立在没有开启 repl-diskless-sync 的情况下，该选项开启后不会先保存RDB再发送，而是直接发送数据

以上两段日志未能体现的是：1）如果在复制期间有写请求到达主节点，这些请求正常响应后，都会存入backlog队列，等到快照发送完毕后再发给从节点 2）从节点如果开启了AOF，则会再同步完毕后立刻执行一次rewrite

2.1.2 断点续传

上面的复制流程是全量复制的，当主从节点断开一段时间后（从节点关闭，不能使用slave no one，后者会导致从节点进入主节点模式，导致副本ID变化，使得每次重连都会进行全量同步），如果重新进行连接，显然不需要完整复制，这种方式成为断点续传。首先还是看主节点日志：

192:M 28 Apr 2019 15:47:47.750 * Synchronization with replica 127.0.0.1:6380 succeeded
192:M 28 Apr 2019 15:48:38.363 # Connection with replica 127.0.0.1:6380 lost.
192:M 28 Apr 2019 15:48:40.331 * Replica 127.0.0.1:6380 asks for synchronization
192:M 28 Apr 2019 15:48:40.332 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 70 bytes of backlog starting from offset 1897.

然后是从节点日志：

212:S 28 Apr 2019 15:48:40.323 * Connecting to MASTER localhost:6379
212:S 28 Apr 2019 15:48:40.328 * MASTER <-> REPLICA sync started
212:S 28 Apr 2019 15:48:40.328 * Non blocking connect for SYNC fired the event.
212:S 28 Apr 2019 15:48:40.329 * Master replied to PING, replication can continue...
212:S 28 Apr 2019 15:48:40.330 * Trying a partial resynchronization (request 32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2:1897).
212:S 28 Apr 2019 15:48:40.332 * Successful partial resynchronization with master.
212:S 28 Apr 2019 15:48:40.334 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

可以看到，这次从节点要求从偏移量为1897处开始复制，主节点确认副本ID后，直接发送了70字节数据，而非全量复制

2.2 PSYNC协议

在2.1.1小节，我们尝试直接向6379端口发送SYNC，获得了RDB内容，但是如果直接发送PSYNC，会提示参数不足：

root@Yhc-Surface:~# echo 'PSYNC' | nc localhost 6379
-ERR wrong number of arguments for 'psync' command

PSYNC命令的参数就是副本ID和偏移量，现在我们再试一次：

root@Yhc-Surface:~# echo 'PSYNC 32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2 1897' | nc localhost 6379
+CONTINUE
... //大量的 *1\r\n$4\r\nPING\r\n
*2
$6
SELECT
$1
0
*3
$3
set
$5
redis
$4
good
... //大量的 *1\r\n$4\r\nPING\r\n
*9
$4
xadd
$12
myteststream
$15
1556437967499-0
$4
name
$8
zhangsan
$3
age
$2
18
$6
sexual
$4
male
... //下略

全量复制时，副本ID是问号"?"，偏移量是-1，效果跟发送SYNC类似：

root@Yhc-Surface:~# echo 'PSYNC ? -1' | nc localhost 6379
+FULLRESYNC 32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2 4423
$410
REDIS0009       redis-ver5.0.4
redis-bits@ctimeGaused-memXC repl-stream-db repl-id(32e1f0aa4381c3fd906a21b7a748ff1fa93be2b2repl-offsetGaof-pre:hobbymalet@K@__   K   programming
sexualage redisgood helloworld testmyteststream  jb        <<     nameagesexual   zhangsan  male  jb  A|

从以上两个例子来看，如果是全量复制，则返回内容的第一行是：+FULLRESYNC replicationId offset，如果是部分复制，则是+CONTINUE，无法识别的命令例子里没有体现，实际返回+ERR。

2.3 心跳机制

从主从节点中还可以看到的是大量的PING报文，这就是Redis副本机制中的心跳机制。心跳机制中，主从双方各模拟为对方的客户端，主节点每隔repl-ping-slave-period时间PING一次从节点，而从节点每秒发送一次replconf ack offset给主节点。

心跳机制主要作用有：

探测主、从节点的网络状态
从节点通过replconf上报自己的复制偏移量
可以用来控制从节点数量和延迟

3 哨兵Sentinel

3.1 简介

一个一主多从的Redis集群本身还是没有高可用性，主要表现在：用户无法及时判断节点不可达，从节点晋升需要人工干预，还需要对所有涉及到的应用进行修改，工作量很大。一个思路是使用代理，对外提供统一的访问地址，可以减少节点替换对应用端的影响，不过另外两个问题无法解决。

人工干预即通过编码或手动操作实现failover逻辑，即主节点因故下线后，从节点晋升为新的主节点对外提供服务。一种可行方案是编写程序定时与各节点通信，一旦从节点下线，就将其从节点列表中暂时移除，直到其恢复正常重新加入集群；一旦主节点下线，就从从节点中挑选一个成为新的主节点。

该方案的问题主要有两个：

1）如何选择合适的从节点晋升？新的主节点应该有以下特性：性能尽可能好（包括处理能力强、网络延迟低等），该节点上的数据版本尽可能新，不过符合以上要求的节点可能不止一个，究竟该如何选择？

2）监控程序自身的可用性如何保证？如果只有一个节点，则仍不够健壮，如果是多个节点的集群，它们又该如何管理？难道再加一个监控程序？这样下去就是无尽的死循环。

Sentinel（哨兵）是Redis提供的副本监控、异常告警和故障转移机制，能够解决以上问题。

Sentinel本身没有主/从节点之分，是完全去中心化的，统称Sentinel节点，其集群称为Sentinel节点集合。Redis的主/从节点被称为数据节点。Sentinel节点集合与数据节点组合为Redis Sentinel。

下面分析以下几种拓扑下，哨兵的工作情况：

1）一主一从，每一个数据节点和一个哨兵节点处于同一分区：

+----+         +----+
| M1 |---------| R1 |
| S1 |         | S2 |
+----+         +----+

Configuration: quorum = 1

假设此时两个分区之间网络中断，那么哨兵也无法互相感知，从而导致不可用

2）一主多从，每个数据节点都和一个哨兵节点处于同一分区

       +----+
       | M1 |
       | S1 |
       +----+
          |
+----+    |    +----+
| R2 |----+----| R3 |
| S2 |         | S3 |
+----+         +----+

Configuration: quorum = 2

这种情况下，即便发生网络分区，使得M1、S1都断开，也可以保证有足够多的哨兵节点达成一致，选出新的主节点

3）哨兵与客户端处于同一分区

            +----+         +----+
            | M1 |----+----| R1 |
            |    |    |    |    |
            +----+    |    +----+
                      |
         +------------+------------+
         |            |            |
         |            |            |
      +----+        +----+      +----+
      | C1 |        | C2 |      | C3 |
      | S1 |        | S2 |      | S3 |
      +----+        +----+      +----+

      Configuration: quorum = 2

这种结构也具有良好的可用性，M1断开后，可以顺利选举R1为主节点

4）客户端少于2个的情况

此时需要在服务器节点处也部署哨兵，也可以完成故障转移

            +----+         +----+
            | M1 |----+----| R1 |
            | S1 |    |    | S2 |
            +----+    |    +----+
                      |
               +------+-----+
               |            |  
               |            |
            +----+        +----+
            | C1 |        | C2 |
            | S3 |        | S4 |
            +----+        +----+

      Configuration: quorum = 3

Sentinel节点可以通过redis-server的--sentinel选项运行，类似于数据节点，它也有配置文件，默认为sentinel.conf。

3.2 配置

Sentinel节点的许多配置和数据节点的类似，可参照缓存学习（五）：Redis安装、配置。

1）网络配置

bind：含义和数据节点的一致，但是默认不配置，官方认为哨兵节点不应该被外界访问，至少应该处于足够的保护下
protected-mode：含义与数据节点一致，默认不配置
port：哨兵监听的端口，默认26379
sentinel announce-ip、sentinel announce-port：作用和slave-announce-ip、slave-announce-port类似

2）基本配置

daemonize：是否后台运行，默认no
pidfile、logfile：和数据节点作用一致
dir：工作目录
sentinel rename-command ：和数据节点的rename-command作用一样

3）监控配置

sentinel monitor <master-name> <ip> <redis-port> <quorum>：指定监控的主节点，quorum的意思是，如果有quorum个节点认为主节点已下线（主观下线），则判定它客观下线，默认为2，一般设置为节点总数的一半加1
sentinel auth-pass <master-name> <password>：和master-auth配置作用一致
sentinel down-after-milliseconds <master-name> <milliseconds>：设置经过多少毫秒后认为主节点逻辑下线，默认30000
sentinel parallel-syncs <master-name> <numreplicas>：用来限制新的主节点选举出来之后，同时最多有多少节点可以执行复制，默认1，即顺序复制。该值可以设置低一些，因为RDB加载过程是阻塞的，如果同时复制的节点太多，一方面可能对新主节点造成较大压力，另一方面可能造成一段时间无法响应读请求
sentinel failover-timeout <master-name> <milliseconds>：故障转移的超时时间，默认3分钟，有四个作用，以T代指该配置的值：
- 如果对某个节点failover失败，需要过2*T时间才能再对同一节点发起failover
- 假如从节点晋升（即slaveof no one）的时候出现错误，即新主节点也下线了，则最多尝试T时间，然后认为failover失败
- 假如slave no one执行没有失败，但是在T时间内仍然无法通过info命令确认晋升结果，则认为failover失败
- 假如其他从节点在重新进行全量复制时，持续时间超过T，则认为failover失败

4）通知配置：

sentinel notification-script <master-name> <script-path>：指定告警脚本，包括客观下线sdown和主观下限odown事件，这里的脚本是shell脚本，不是lua脚本，脚本最多运行60秒，超时则结束并重试（脚本返回值为1），最多重试10次，如果脚本返回值大于等于2，则不会重试
sentinel client-reconfig-script <master-name> <script-path>：用于在failover后通知客户端修改访问地址，这里的脚本是shell脚本，不是lua脚本，传入的参数有：<master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port>，其中state是”failover“，role要么是”leader“，要么是”observer“
sentinel deny-scripts-reconfig：是否允许在运行时修改上面两个脚本，默认yes，代表不允许修改

以上配置都可以在Redis Shell中用sentinel set命令修改，并且会立即刷新到配置文件中。

3.3 搭建

这里有一篇偏向生产环境的文章可供参考：https://blog.51cto.com/dengaosky/2091877，这里的以3个Redis节点、3个Sentinel节点为例，Redis使用6379、6380、6381端口，主节点为6379，哨兵使用26379,、26380、26381端口。在生产环境中，最好分散在不同物理机上部署3个以上的奇数个Sentinel节点。

首先启动三个Redis节点：

root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/redis.conf
root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/redis.conf --port 6380 --replicaof localhost 6379
root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/redis.conf --port 6381 --replicaof localhost 6379

然后启动三个哨兵节点：

root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/sentinel.conf --sentinel
339:X 28 Apr 2019 21:11:08.393 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
339:X 28 Apr 2019 21:11:08.393 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=339, just started
339:X 28 Apr 2019 21:11:08.395 # Configuration loaded
root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/sentinel-26380.conf --sentinel
343:X 28 Apr 2019 21:11:31.790 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
343:X 28 Apr 2019 21:11:31.791 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=343, just started
343:X 28 Apr 2019 21:11:31.791 # Configuration loaded
root@Yhc-Surface:~/redis-5.0.4# redis-server ~/redis-5.0.4/sentinel-26381.conf --sentinel
345:X 28 Apr 2019 21:11:35.856 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
345:X 28 Apr 2019 21:11:35.856 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=345, just started
345:X 28 Apr 2019 21:11:35.857 # Configuration loaded

然后验证搭建成功，首先进入主节点（6379），使用role命令：

127.0.0.1:6379> role
1) "master"
2) (integer) 100921
3) 1) 1) "127.0.0.1"
      2) "6381"
      3) "100522"
   2) 1) "127.0.0.1"
      2) "6380"
      3) "100522"

现在还看不到哨兵节点的信息，不过不要紧，我们登入26379节点，执行sentinel masters和info sentinel命令：

127.0.0.1:26379> sentinel masters
1)  1) "name"
    2) "mymaster"
    3) "ip"
    4) "127.0.0.1"
    5) "port"
    6) "6379"
    ... //下略

127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=2,sentinels=3

可见不但主节点识别成功，连哨兵节点的数目也成功识别出来了。实际上，哨兵是以客户端的形式注册到主节点上的（如果使用info clients，可以看到clients:3的字样），然后订阅指定channel，新加入的sentinel节点向channel中发布一条包含自己信息的消息，这样其他哨兵就可以感知到新成员。但是删除就会麻烦一些，除了需要关闭目标哨兵外，还需要连接剩下的每一个哨兵节点，执行 sentinel reset * 命令，官方要求每次执行至少间隔30秒。

如果我们此时关闭6379实例，可以看到哨兵日志出现如下内容：

84:X 29 Apr 2019 13:48:22.961 # +sdown master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.047 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
84:X 29 Apr 2019 13:48:23.048 # +new-epoch 1
84:X 29 Apr 2019 13:48:23.048 # +try-failover master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.057 # +vote-for-leader 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.072 # ad9a9c0fb810571249f4711cbe13408ff06258d7 voted for 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.074 # fcdc259825dd36869af4f2a3168a925186fd20b0 voted for 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.115 # +elected-leader master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.116 # +failover-state-select-slave master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.175 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.176 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.236 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.650 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.650 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:23.733 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:24.195 # -odown master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:24.682 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:24.682 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:24.783 # +failover-end master mymaster 127.0.0.1 6379
84:X 29 Apr 2019 13:48:24.783 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
84:X 29 Apr 2019 13:48:24.785 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380

从而得到整个failover的流程：

首先，当哨兵节点检测到主数据节点不可达时，会将其设为sdown，即主观下线，然后收集其他哨兵的意见

当有quorum个哨兵都认为主节点不可达后，开始执行客观下线，选举产生新的主节点（本例中为6380），并执行晋升

然后对剩下的6381节点进行配置修改，使之成为6380的从节点，到此完成6379的客观下线。

3.4 命令

sentinel masters：这个命令上面出现过，可以用来显示所有被监控的主节点的信息
sentinel master master-name：显示指定主节点的信息
sentinel slaves master-name：显示指定主节点的从节点信息
sentinel sentinels master-name：显示监控指定主节点的哨兵节点信息
sentinel get-master-addr-by-name master-name：显示指定主节点的IP、端口信息
sentinel reset pattern：重置所有名称符合匹配条件的主节点，清除其状态，重置其从节点列表等
sentinel failover master-name：对指定主节点强制故障转移
sentinel ckquorum master-name：检查监控主节点的哨兵个数是否达到要求
sentinel flushconfig：将当前哨兵节点的配置写入磁盘
sentinel remove master-name：当前哨兵节点不再监控指定主节点
sentinel monitor master-name host port quorum：令当前哨兵节点监控指定主节点
sentinel set master-name key value：修改指定主节点的配置，相当于在主节点上使用config set
sentinel is-master-down-by-addr host port epoch replication-Id：询问其他哨兵节点，指定的主节点是否下线
- 如果replication-Id为“*”，表示哨兵节点交换对主节点下线的决定
- 如果replication-Id为当前哨兵节点的id，表示本节点希望称为Leader
- 返回结果有三部分，1）下线状态，1代表下线，0代表在线；2）Leader节点id，*代表结果的内容是主节点是否在线，如果是某个节点的id，则代表该节点是Leader；3）Leader节点版本。关于Leader节点，会在原理部分介绍。

3.5 哨兵实现原理

3.5.1 Pub/Sub

上面提到，哨兵机制的实质就是连接到主节点的发布/订阅客户端：

127.0.0.1:6379> pubsub channels *
1) "__sentinel__:hello"

如果订阅该频道，就能看到，每隔2秒，各哨兵节点之间就会交流对主节点的判断，哨兵节点间的自动发现机制也是通过该频道实现的。消息格式如下：

哨兵节点IP,哨兵节点端口,哨兵节点ID,哨兵节点纪元,主节点名,主节点IP,主节点端口,主节点纪元

所谓纪元，就是当前正在进行的选举次数，每次执行选举，无论成败都会将纪元加1。

而当我们在哨兵节点处订阅时，会发现更多频道：

127.0.0.1:26380> psubscribe *
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "*"
3) (integer) 1
1) "pmessage"
2) "*"
3) "+new-epoch"
4) "11"
1) "pmessage"
2) "*"
3) "+config-update-from"
4) "sentinel 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 127.0.0.1 26379 @ mymaster 127.0.0.1 6379"
1) "pmessage"
2) "*"
3) "+switch-master"
4) "mymaster 127.0.0.1 6379 127.0.0.1 6381"
...

当哨兵遇到某些事件时，就会向对应频道发送消息，有如下频道：

+reset-master：表示主节点已重置
+slave：检测到一个新从节点
+failover-state-reconf-slaves：故障转移状态已更改为reconf-slaves
+failover-detected：检测到故障转移
+slave-reconf-sent：从节点收到了Leader哨兵发来的slaveof命令
+slave-reconf-inprog：从节点已经更换了主节点
+slave-reconf-done：从节点开始与新的主节点同步
-dup-sentinel：主服务器上的哨兵节点由于重复被删除
+sentinel：检测到新哨兵节点加入
+sdown：认定主节点主观下线
-sdown：撤销主观下线
+odown：认定主节点客观下线
-odown：撤销客观下线
+new-epoch：进入新纪元
+try-failover：开始新的故障转移
+elected-leader：节点被选举为Leader，可以发起故障转移
+failover-state-select-slave：故障转移状态已更改为select-slave，开始寻找合适的从节点
no-good-slave：没有可以晋升的从节点
selected-slave：已经找到可以晋升的从节点
failover-state-send-slaveof-noone：正在执行晋升
failover-end-for-timeout：由于超时导致故障转移失败，已经在执行slaveof的从节点最终还是会指向新的主节点
failover-end ：故障转移成功结束
switch-master ：显示新旧主节点的IP、端口
+tilt ：进入Tilt模式
-tilt ：推出Tilt模式

3.5.2 Leader与Epoch

在哨兵的日志中，可以看到如下内容：

84:X 29 Apr 2019 13:48:23.057 # +vote-for-leader 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.072 # ad9a9c0fb810571249f4711cbe13408ff06258d7 voted for 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.074 # fcdc259825dd36869af4f2a3168a925186fd20b0 voted for 7e7d8782d2e284bbaaad3ab6915e96bc71e7cf23 1
84:X 29 Apr 2019 13:48:23.115 # +elected-leader master mymaster 127.0.0.1 6379

这就是Leader选举的过程。Leader选举通过Raft算法进行，我们先考虑一下没有Leader时会有怎样的问题，假设有3个从节点和3个哨兵，在极端情况下，每次选举都正好有一个哨兵支持一个从节点晋升。此时，永远无法完成故障转移。

如果采用Leader机制，只需要选举出Leader，然后由Leader指定一个从节点晋升即可。仍然考虑上面的例子，3个哨兵节点进行选举，虽然也有可能出现每个节点仅支持自己成为Leader的可能，但是这种情况基本不会发生，原因是选举发生在主观下线后，即谁先发起下线，一般就由谁担任Leader。

而在Leader选举之前，我们可以看到更新Epoch的操作，epoch（纪元）的作用是唯一标记故障转移版本。现在考虑一个极端情况，假如Leader节点自己也遇到故障下线了，那么此时剩下的哨兵肯定会重新选出Leader，重新发起一次故障转移，如果没有纪元，当原先的Leader恢复上线后，就会继续工作，挑选一个节点进行晋升，从而导致集群出现两个主节点，引入纪元后，由于每次故障转移都必须更新纪元，如果出现上述情形，当旧Leader上线并发起晋升，会发现自己的纪元版本比较低，从而放弃晋升。

纪元的另一个应用是配置传播，当故障转移完成后，需要将配置发布到__sentinel__:hello频道，各个哨兵节点收到消息后，会检查新配置和本地配置的纪元值，如果新配置更大，则覆盖掉本地的配置。

3.5.3 监控和从节点选取

每隔10秒，哨兵节点就会对主从节点发送INFO命令，获取拓扑结构；每隔1秒，哨兵节点就会检测主节点、从节点和其他哨兵节点是否存活，并对失效节点执行下线操作。

新主节点的选择，需要遵循以下标准：

过滤掉不够健康的节点：例如主观下线节点、5秒内没有回复过PING的节点、与主节点失联超过10倍故障转移时间的节点
优先选择slave-priority值更小的节点（优先级更高）
选择复制偏移量最大的节点
选择replication-Id最小的节点

3.6 Jedis对哨兵的支持

虽然Redis官方不建议让外界访问到哨兵，但是还是会有相关需求，Jedis也提供了对哨兵的支持：JedisSentinelPool。

以下是使用示例：

public static void main(String[] args) {
    Set<String> sentinels=new HashSet<>();
    sentinels.add("localhost:26379");
    sentinels.add("localhost:26380");
    sentinels.add("localhost:26381");
    JedisSentinelPool pool=new JedisSentinelPool("mymaster",sentinels);
    Jedis sentinel=pool.getResource();
    System.out.println(sentinel.sentinelMasters());
}

执行后却报出了错误，无法执行sentinel masters命令：

Exception in thread "main" redis.clients.jedis.exceptions.JedisDataException: ERR unknown command `SENTINEL`, with args beginning with: `masters`, 
	at redis.clients.jedis.Protocol.processError(Protocol.java:132)
	at redis.clients.jedis.Protocol.process(Protocol.java:166)
	at redis.clients.jedis.Protocol.read(Protocol.java:220)
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:309)
	at redis.clients.jedis.Connection.getRawObjectMultiBulkReply(Connection.java:276)
	at redis.clients.jedis.Connection.getObjectMultiBulkReply(Connection.java:280)
	at redis.clients.jedis.Jedis.sentinelMasters(Jedis.java:2985)
	at SentinelTest.main(SentinelTest.java:15)

看一下源码来研究这个问题，跟踪JedisSentinelPool构造方法可以发现，它调用了initSentinel方法：

    private HostAndPort initSentinels(Set<String> sentinels, String masterName) {
        ...
        HostAndPort hap;
        while(var5.hasNext()) {
            sentinel = (String)var5.next();
            hap = HostAndPort.parseString(sentinel);
            this.log.debug("Connecting to Sentinel {}", hap);
            Jedis jedis = null;

            try {
                jedis = new Jedis(hap);
                List<String> masterAddr = jedis.sentinelGetMasterAddrByName(masterName);
                sentinelAvailable = true;
                if (masterAddr != null && masterAddr.size() == 2) {
                    master = this.toHostAndPort(masterAddr);
                    break;
                }
            } catch (JedisException var13) {
                
            } finally {
                if (jedis != null) {
                    jedis.close();
                }

            }
        }
        ...
        var5 = sentinels.iterator();
        while(var5.hasNext()) {
            sentinel = (String)var5.next();
            hap = HostAndPort.parseString(sentinel);
            JedisSentinelPool.MasterListener masterListener = new JedisSentinelPool.MasterListener(masterName, hap.getHost(), hap.getPort());
            masterListener.setDaemon(true);
            this.masterListeners.add(masterListener);
            masterListener.start();
        }
        return master;
}

这里做了两件事：1）寻找主节点，让连接池实际连接到主节点上（这一步在initPool方法完成）；2）添加MasterListener，以处理故障转移相关事件

由于连接池实际是指向主节点的，所以不能执行sentinel XX命令。而MasterListener实际就是订阅了+switch-master频道，以获得新主节点的信息，然后再调用initPool重新初始化连接池，连接到新的主节点上。

this.j.subscribe(new JedisPubSub() {
    public void onMessage(String channel, String message) {
        JedisSentinelPool.this.log.debug("Sentinel {}:{} published: {}.", new Object[]{MasterListener.this.host, MasterListener.this.port, message});
        String[] switchMasterMsg = message.split(" ");
        if (switchMasterMsg.length > 3) {
            if (MasterListener.this.masterName.equals(switchMasterMsg[0])) {
                JedisSentinelPool.this.initPool(JedisSentinelPool.this.toHostAndPort(Arrays.asList(switchMasterMsg[3], switchMasterMsg[4])));
            } else {
                JedisSentinelPool.this.log.debug("Ignoring message on +switch-master for master name {}, our master name is {}", switchMasterMsg[0], MasterListener.this.masterName);
            }
        } else {
            JedisSentinelPool.this.log.error("Invalid message received on Sentinel {}:{} on channel +switch-master: {}", new Object[]{MasterListener.this.host, MasterListener.this.port, message});
        }
    }
}, new String[]{"+switch-master"});

也就是说，我们如果通过JedisSentinelPool获取Jedis实例，可以保证一直获取到集群的主节点，而不必担心节点下线等问题。

如果要连接到哨兵节点，还是需要使用JedisPool。

原文链接：https://blog.csdn.net/u010670411/article/details/89637227