最近在研究redis,那就把自己搭建redis集群的过程粘出来,希望几个月后的自己还记得。
redis集群搭建
Redis在3.0版本开始正式引用集群特性,Redis集群是一个分布式,Redis 集群是一个提供在多个Redis间节点间共享数据的程序集。
Redis集群并不支持处理多个keys的命令,因为这需要在不同的节点间移动数据,从而达不到像Redis那样的性能,在高负载的情况下可能会导致不可预料的错误.
Redis 集群通过分区来提供一定程度的可用性,在实际环境中当某个节点宕机或者不可达的情况下继续处理命令. Redis 集群的优势:
- 自动分割数据到不同的节点上。
- 整个集群的部分节点失败或者不可达的情况下能够继续处理命令。
环境A:3台服务器,每台服务器启动6379和6380两个redis服务实例,适用于测试环境
192.168.11.137:6379/6380 liu-node1
192.168.11.138:6379/6380 liu-node2
192.168.11.139:6379/6380 liu-node3
另外预留一台服务器做集群添加节点测试。
192.168.11.140:6379/6380 liu-test
注意:1、每个redis节点采用相同的硬件配置、相同的密码(如果设定了密码)、相同的redis版本。
- 所有redis服务器必须没有任何数据
- 现启动为单机的redis且没有任何的key vaule
- 利用redis的不同端口设置redis的从服务器为6380端口
1、搭建集群的第一件事是需要一些运行在集群模式的redis实例。在所有3台主机中都安装redis。分别在三台机器中创建一个新的目录/redis-cluste。并创建2个以端口号为名字的子目录6379,6380,稍后在每个子目录中运行redis实例。(node2,node3相同)
[root@liu-node1 ~]# mkdir /redis-cluster
[root@liu-node1 redis-cluster]# mkdir 6379 6380
[root@liu-node1 redis-cluster]# wget https://download.redis.io/releases/redis-6.0.10.tar.gz
[root@liu-node1 redis-cluster]#wget https://github.com/jemalloc/jemalloc/releases/download/5.2.1/jemalloc-5.2.1.tar.bz2
[root@liu-node1 redis-cluster]# yum -y install gcc-c++ automake autoconf libtool make
[root@liu-node1 redis-cluster]#bzip2 -dv jemalloc-5.2.1.tar.bz2
[root@liu-node1redis-cluster]#tar xvf jemalloc-5.2.1.tar
[root@liu-node1 redis-cluster]# cd jemalloc-5.2.1/
[root@liu-node1 jemalloc-5.2.1]#./autogen.sh
[root@liu-node1 jemalloc-5.2.1]#make -j 6
[root@liu-node1 jemalloc-5.2.1]#make install
[root@liu-node1 redis-cluster]#tar xzf redis-6.0.10.tar.gz
[root@liu-node1 redis-cluster]#cd redis-6.0.10/deps
[root@liu-node1 deps]#make lua hiredis linenoise
[root@liu-node1 deps]# cd /root/redis-6.0.10/
[root@liu-node1 redis-6.0.10]# make
2、在每个节点的6379,6380子文件中各创建一个redis.conf文件,文件的内容如下(配置中的端口号需要改成与文件相同的号码)
[root@liu-node3 6379]# cat redis.conf
bind 0.0.0.0
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
daemonize yes
masterauth liu
requirepass liu
[root@liu-node3 6380]# cat redis.conf
port 6380
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
daemonize yes
masterauth liu
requirepass liu
配置中的cluster-enabled选项用于打开实例的集群模式,而 cluster-conf-file 选项则设定了保存节点配置文件的路径, 默认值为 nodes.conf.节点配置文件无须人为修改, 它由 Redis 集群在启动时创建, 并在有需要时自动进行更新。
3、使用类似以下命令, 在每个标签页中打开一个实例:
[root@liu-node1 6379]# ./../redis-6.0.10/src/redis-server ./redis.config
[root@liu-node1 6379]# cd ../6380/
[root@liu-node1 6380]# ./../redis-6.0.10/src/redis-server ./redis.config
[root@liu-node1 6380]# ps -ef | grep redis
查看进程的状态
搭建集群
现在已经有了6个正在运行中的redis实例,接下来使用这些实例来创建集群,并为每个节点编写配置文件。
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster create 192.168.11.137:6379 192.168.11.137:6380 192.168.11.138:6379 192.168.11.138:6380 192.168.11.139:6379 192.168.11.139:6380 --cluster-replicas 1
-a: 指定设置的密码(如果密码参数没有设定就不需要指定)
--cluster-replicas 1:表示每个master对应一个slave节点
排错:
问题:如果创建集群的时候显示错误:[err]node is not empty,either the node already knowa other...
解决:这是由于上次redis集群没有配置成功,生成了每个节点的配置文件和db的备份文件,所以才产生这个错误。
A、首先删除每个redis节点的备份文件,数据库文件和集群配置文件(在与端口同名的子目录中)
B、使用redis-cli -c -h -p 登录到每个redis节点,使用命令清空数据
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu cluster reset
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK
C、重新启动redis服务,在尝试使用redis集群连接命令
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster create 192.168.11.137:6379 192.168.11.137:6380 192.168.11.138:6379 192.168.11.138:6380 192.168.11.139:6379 192.168.11.139:6380 --cluster-replicas 1
4、查看主从状态
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu info replication
5、验证集群状态
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu cluster info
查看任意节点的集群状态(指定要查看的节点)
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster info 192.168.11.139:6379
6、查看集群node对应关系
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu cluster nodes
7、验证集群写入key
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu set name liu
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(error) MOVED 5798 192.168.11.138:6379
经过算法计算,key的槽位需要写入指定的node,根据报错看出,槽位不在当前node,在ip地址为138的node上
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu -h 192.168.11.138 set name liu
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK
对应的slave节点可以通过keys *查看,但是get key就会失败
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu -h 192.168.11.139 -p 6380 keys "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1) "name"
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu -h 192.168.11.139 -p 6380 get name
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(error) MOVED 5798 192.168.11.138:6379
8、模拟master故障,对应的slave节点自动提升为新的master
模拟node2(192.168.11.138:6379)节点故障,需要一定的故障转移时间
[root@liu-node2 redis-6.0.10]# ./src/redis-cli -a liu
查看集群状态
[root@liu-node2 redis-6.0.10]# ./src/redis-cli -a liu --cluster info 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at 192.168.11.138:6379: Connection refused
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 5461 slots | 1 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 1 keys | 5462 slots | 0 slaves. //192.168.11.139:6380为新的master
192.168.11.139:6379 (c0be9fbc...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.
查看新节点的信息
[root@liu-node2 redis-6.0.10]# ./src/redis-cli -a liu -h 192.168.11.139 -p 6380 info replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:0 //此时已经没有slave节点了
master_replid:e6e46433c5bab449fca958db986c258e52865aad
master_replid2:fbb5ea8abd4a04518e6d8b4c7fe45f216861a518
master_repl_offset:5977
second_repl_offset:5978
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:5977
恢复故障节点node2(192.168.11.138:6379)
[root@liu-node2 6379]# ./../redis-6.0.10/src/redis-server redis.config
[root@liu-node2 6379]# ps -ef | grep redis
root 13526 1 0 20:23 ? 00:00:18 ./src/redis-server *:6380 [cluster]
root 15107 1 0 22:11 ? 00:00:00 ./../redis-6.0.10/src/redis-server 0.0.0.0:6379 [cluster]
root 15115 13938 0 22:11 pts/1 00:00:00 grep --color=auto redis
查看集群的节点,可以看到node2自动生成slave节点(需要在node1中执行)
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
c96ab3f80ff943d60c8463689266d949dd6a665c192.168.11.139:6380@16380 master- 0 1611285123544 8 connected 5461-10922
e3c5b4492cafc7b54de7f2647e8543fd9a86017f 192.168.11.137:6379@16379 myself,master - 0 1611285122000 2 connected 0-5460
2e6ceaec6ffd22fc24689f5fa2f19bb94cff6d9f192.168.11.138:6379@16379 slavec96ab3f80ff943d60c8463689266d949dd6a665c 0 1611285122000 8 connected
de9c0fa3adf78cebbaa96f1cf343312abf0e0a67 192.168.11.137:6380@16380 slave c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c 0 1611285123543 5 connected
ffe733d21b156ba29a3875aae5ec0e1b37ea7027 192.168.11.138:6380@16380 slave e3c5b4492cafc7b54de7f2647e8543fd9a86017f 0 1611285123543 2 connected
c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c 192.168.11.139:6379@16379 master - 0 1611285122535 5 connected 10923-16383
Cluster集群节点的维护
集群运行时间长久之后,难免由于硬件故障、网络规划、业务增长等原因对已有集群进行相应的调整, 比如增加Redis node节点、减少节点、节点迁移、更换服务器等。增加节点和删除节点会涉及到已有的槽位重新分配及数据迁移。
案例:因公司业务发展迅猛,现有的三主三从的redis cluster架构可能无法满足现有业务的并发写入需求,因此公司紧急采购两台服务器liu-test (192.168.11.140:6379,192.168.11.140:6380),需要将其动态添加到集群当中,但不能影响业务使用和数据丢失。
添加节点到集群
增加Redis node节点,需要与之前的Redis node版本相同、配置一致,然后分别再启动两台Redis node,应为一主一从。
使用以下命令添加新节点,把要添加的新redis
格式:add-node new_host:new_port existing_host:existing_port
说明: new_host:new_port#为新添加的主机的IP和端口
existing_host:existing_port #为已有的集群中任意节点的IP和端口
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster add-node 192.168.11.140:6379 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 192.168.11.140:6379 to cluster 192.168.11.137:6379
>>> Performing Cluster Check (using node 192.168.11.137:6379)
M: e3c5b4492cafc7b54de7f2647e8543fd9a86017f 192.168.11.137:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: c96ab3f80ff943d60c8463689266d949dd6a665c 192.168.11.139:6380
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 2e6ceaec6ffd22fc24689f5fa2f19bb94cff6d9f 192.168.11.138:6379
slots: (0 slots) slave
replicates c96ab3f80ff943d60c8463689266d949dd6a665c
S: de9c0fa3adf78cebbaa96f1cf343312abf0e0a67 192.168.11.137:6380
slots: (0 slots) slave
replicates c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c
S: ffe733d21b156ba29a3875aae5ec0e1b37ea7027 192.168.11.138:6380
slots: (0 slots) slave
replicates e3c5b4492cafc7b54de7f2647e8543fd9a86017f
M: c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c 192.168.11.139:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.11.140:6379 to make it join the cluster.
[OK] New node added correctly.
看到已经成功加入集群,获取集群节点信息
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
c96ab3f80ff943d60c8463689266d949dd6a665c 192.168.11.139:6380@16380 master - 0 1611300791618 8 connected 5461-10922
e3c5b4492cafc7b54de7f2647e8543fd9a86017f 192.168.11.137:6379@16379 myself,master - 0 1611300789000 2 connected 0-5460
2e6ceaec6ffd22fc24689f5fa2f19bb94cff6d9f 192.168.11.138:6379@16379 slave c96ab3f80ff943d60c8463689266d949dd6a665c 0 1611300790509 8 connected
de9c0fa3adf78cebbaa96f1cf343312abf0e0a67 192.168.11.137:6380@16380 slave c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c 0 1611300791618 5 connected
ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18 192.168.11.140:6379@16379 master - 0 1611300791000 0 connected
ffe733d21b156ba29a3875aae5ec0e1b37ea7027 192.168.11.138:6380@16380 slave e3c5b4492cafc7b54de7f2647e8543fd9a86017f 0 1611300791000 2 connected
c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c 192.168.11.139:6379@16379 master - 0 1611300791115 5 connected 10923-16383
可以看到已经成功加入,但是没有slot位,并且新增加的主机是master
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster info 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 5461 slots | 1 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 1 keys | 5462 slots | 1 slaves.
192.168.11.140:6379 (ca5aeb97...) -> 0 keys | 0 slots | 0 slaves.
192.168.11.139:6379 (c0be9fbc...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 4 masters.
0.00 keys per slot on average.
重新分配槽位
新的node节点加到集群之后默认是master节点,但是没有slots数据,需要重新分配添加主机之后需要对添加至集群种的新主机重新分片否则其没有分片也就无法写入数据。
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster reshard 192.168.11.140:6379
......
Do you want to proceed with the proposed reshard plan (yes/no)? yes #确认分配
......
确定slot分配成功
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster info 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 4096 slots | 1 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 0 keys | 4096 slots | 1 slaves.
192.168.11.140:6379 (ca5aeb97...) -> 1 keys | 4096 slots | 0 slaves.
192.168.11.139:6379 (c0be9fbc...) -> 0 keys | 4096 slots | 1 slaves.
[OK] 1 keys in 4 masters.
0.00 keys per slot on average.
为新的master添加slave
需要再向当前的Redis集群中添加一个Redis单机服务器192.168.11.140:6380,用于解决当前192.168.11.140:6379单机的潜在宕机问题,即实现响应的高可用功能,有两种方式:
- 直接加为slave节点
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster add-node 192.168.11.140:6380 192.168.11.137:6379 --cluster-slave --cluster-master-id ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18
注意指定master-id的时候别写错了!
验证是否成功
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster check 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 4096 slots | 1 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 0 keys | 4096 slots | 1 slaves.
192.168.11.140:6379 (ca5aeb97...) -> 1 keys | 4096 slots | 1 slaves.
192.168.11.139:6379 (c0be9fbc...) -> 0 keys | 4096 slots | 1 slaves.
[OK] 1 keys in 4 masters.
1、先将节点加入集群,在修改为slave
把192.168.11.140:6380添加到集群中
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster add-node 192.168.11.140:6380 192.168.11.137:6379
更新新节点的状态为slave: 需要手动的之风为某个master的slave,否则其默认角色为master
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu -h192.168.11.140 -p 6339 -a 123456 #登录到新添加节点
192.168.11.140:6380> CLUSTER NODES #查看当前集群节点,找到目标master 的ID
192.168.11.140:6380> CLUSTER REPLICATE 886338acd50c3015be68a760502b239f4509881c
#将其设置slave,命令格式为cluster replicate MASTERID
192.168.11.140:6380> CLUSTER NODES#再次查看集群节点状态,验证节点是否已经更改为指定master 的slave
动态缩容
案例:由于10.0.0.8服务器使用年限已经超过三年,已经超过厂商质保期而且硬盘出现异常报警,经运维部架构师
提交方案并同开发同事开会商议,决定将现有Redis集群的8台主服务器中的master192.168.11.140:6379和对应的slave192.168.11.140:6380 临时下线,三台服务器的并发写入性能足够支出未来1-2年的业务需求
删除节点过程:
添加节点的时候是先添加node节点到集群,然后分配槽位,删除节点的操作与添加节点的操作正好相反,是先将被删除的Redis node上的槽位迁移到集群中的其他Redis node节点上,然后再将其删除,如果一个Redis node节点上的槽位没有被完全迁移,删除该node的时候会提示有数据且无法删除。
迁移master的槽位到其他master
被迁移Redis master源服务器必须保证没有数据,否则迁移报错并会被强制中断。
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster reshard 192.168.11.137:6379
再将1365个slot从192.168.11.140:6379移动到node2和node3节点(红色是liu-test节点,蓝色是node2和node3)
[root@liu-node1 redis-6.0.10]#./src/redis-cli -a liu --cluster reshard 192.168.11.137:6379 --cluster-slots 1365 --cluster-from ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18 --cluster-to c0be9fbcb7a052f2a7c2eb146e631e30947bfc9c
[root@liu-node1 redis-6.0.10]#./src/redis-cli -a liu --cluster check 192.168.11.137:6379
[root@liu-node1 redis-6.0.10]#./src/redis-cli -a liu --cluster reshard 192.168.11.137:6379 --cluster-slots 1365 --cluster-from ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18 --cluster-toc96ab3f80ff943d60c8463689266d949dd6a665c
确认liu-test(140:6379)的所有slot都移走了,上面的slave也成为其他master的slave
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster check 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 5462 slots | 2 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 0 keys | 5461 slots | 1 slaves.
192.168.11.140:6379 (ca5aeb97...) -> 0 keys | 0 slots | 0 slaves.
192.168.11.139:6379 (c0be9fbc...) -> 1 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 4 masters.
0.00 keys per slot on average.
从集群删除服务器
虽然槽位已经迁移完成,但是服务器IP信息还在集群当中,因此还需要将IP信息从集群删除
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster del-node 192.168.11.137:6379 ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node ca5aeb97c8b3a5a578dfd5615e79fe27ae281e18 from cluster 192.168.11.137:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
查看集群信息
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster check 192.168.11.137:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.11.137:6379 (e3c5b449...) -> 0 keys | 5462 slots | 2 slaves.
192.168.11.139:6380 (c96ab3f8...) -> 0 keys | 5461 slots | 1 slaves.
192.168.11.139:6379 (c0be9fbc...) -> 1 keys | 5461 slots | 1 slaves.
[OK] 1 keys in 3 masters.
0.00 keys per slot on average.
删除节点信息
[root@liu-test redis-6.0.10]#rm -rf /redis-cluster/6379
删除多余的slave节点
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu --cluster del-node 192.168.11.137:6379 eac3c947cefe1c0d481f32c826c1872baa2006d8(140:6380的id)
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node eac3c947cefe1c0d481f32c826c1872baa2006d8 from cluster 192.168.11.137:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
查看集群状态
[root@liu-node1 redis-6.0.10]# ./src/redis-cli -a liu -h 192.168.11.137 -p 6379 cluster info
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:14
cluster_my_epoch:14
cluster_stats_messages_ping_sent:51362
cluster_stats_messages_pong_sent:51568
cluster_stats_messages_auth-ack_sent:1
cluster_stats_messages_update_sent:20
cluster_stats_messages_sent:102951
cluster_stats_messages_ping_received:51561
cluster_stats_messages_pong_received:51362
cluster_stats_messages_meet_received:7
cluster_stats_messages_fail_received:1
cluster_stats_messages_auth-req_received:1
cluster_stats_messages_update_received:1
cluster_stats_messages_received:102933