Redis

发布日期: 2023-04-18

Redis Cluster 集群架构

一、Redis Cluster 是什么？

Redis Cluster 主要设计通过负载均衡解决单机内存、并发、流量等瓶颈问题

解决了 Sentinel 集群架构存在的存储容量无法水平扩展问题

二、Redis Cluster 工作原理

Redis Cluster 通过将数据按照特定规则存储到不同节点，以此实现写性能、存储容量水平扩展的目的

2.1 数据分布

常见的分区规则有 2 种，哈希分区、顺序分区，常用的是 哈希分区，常见的哈希分区规则有以下是那种

2.1.1 节点取余分区

简单来说，就是根据 hash(key) % N，然后决定将数据分配到那个节点上

这种分区方式有一个问题，当节点数量变化时，数据节点映射关系需要重新计算，会导致数据的重新迁移

2.1.2 一致性哈希分区

一致性哈希分区的原理，简单来说呢，首先为系统中每个节点生成一个 token，范围在 0-2 的 32 次方，这些节点的 token 组成了一个哈希环。

读写操作时，现根据 key 计算出 hash 值，然后顺时针找到第一个大于该 hash 值的 token 节点，然后读写数据。

一致性哈希分区的好处，在于 加减节点只会影响哈希环中的相邻节点，对其他节点无影响

不过，它也存在以下问题：

加减节点会导致数据无法命中，需要手动处理或者忽略这部分数据，因此一致性哈希适用于缓存场景
当使用少量节点时，节点变化会影响大范围的哈希环中数据映射，因为少量节点不适合一致性哈希
普通的一致性哈希分区在增减节点时，需要增加一倍或减少一倍才能保证数据和负载均衡

2.1.3 虚拟槽分区

由于，一致性哈希的诸多问题，所以一般分布式系统采用虚拟槽对它进行改进

虚拟槽是通过分散度良好的哈希函数，将所有数据映射到固定范围的整数集合中，证书定位槽，这个范围一般远大于节点数

Redis 槽范围在 0 ~ 16383，槽是集群内数据管理和迁移的基本单位，采用大范围槽的主要目的是为了数据拆分和扩展

虚拟槽有如下特点：

解耦数据和节点的关系，简化扩容和缩容的难度
节点自身维护槽映射关系，不需要客户端或代理维护映射
支持节点、槽、键关系查询，以便于数据路由和在线伸缩

虚拟槽同时也有以下缺点：

由于最小单位是 key，所以不支持例如 list、hash、set 大键值映射到不通的数据分区
key 批量操作功能有限，只有映射到同一个 slot 的 key 才能批量操作
key 事务支持有限，进支持多 key 在同一节点的事务操作
不支持多数据空间，单机可以支持 16 个数据库空间，Cluster 只支持 0 数据库
复制结构仅限一层，从节点只能复制主节点，不能嵌套树状复制结构

2.2 节点通信

2.2.1 通信流程

在分布式存储中有一类数据称之为元数据，所谓元数据是指，节点负责那些数据？以及是否出故障等状态信息

常见的元数据维护方式有两种，集中式、P2P式

Redis 采用的是基于 Gossip 协议的 P2P 式通信方式，简单来说，就是通过节点之间不断的交换数据，从而达到节点间数据最终保持一致的目的

Redis 集群中每隔几点都会监听一个 TCP 端口，这个端口就是用来接收和响应其他节点的 ping 消息，消息包括节点故障、新节点加入、主从角色变化、槽信息变化等事件

2.2.2 Gossip 信息

Gossip 信息可分为以下几种：

ping 消息：最频繁的消息，节点间检测是否在线和交换自身状态信息
pong 消息：收到 ping、meet 消息时，作为响应消息回复给发送者的信息
meet 消息：通知新节点加入，消息发送者通知接受者加入集群，接受者收到信息后会加当前集群并周期性的发送 ping、pong 消息交换
fail 消息：当节点判定集群内另一节点下线时，会发送 fail 消息通知其他节点将下线节点标记为下线状态

2.2.3 节点选择

Gossip 的信息交换机制具有天然的分布式特性，但是由于 ping、pong 消息会携带当前节点及其他节点的状态数据。但如果频率过高则会导致网络带宽被大量占用，如果频率过低则影响故障判定以及新节点发现速度

节点选择

信息交换的成本主要体现在两个方面，单位时间内发送消息的节点数，每各消息携带的数据量

如果带宽资源紧张的话，可以考虑提高 cluster_node_timeout 参数值，将默认的 15s 调整为 30s

但是，要注意如果 cluster_node_timeout 过大则会导致故障判定以及新节点发现速度

2.3 请求路由

2.3.1 请求重定向

集群模式下，Redis接收到Key操作时，首先会计算Key对应的槽，然后根据槽找到对应的节点，如果节点是自身，则直接处理键命令，否则返回MOVED重定向错误，通知客户端访问正确的节点

如果槽落到了自身节点情况下

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli 
127.0.0.1:6379> set key:test:1 value1
OK
127.0.0.1:6379> cluster keyslot key:test:1
(integer) 5191
127.0.0.1:6379> cluster nodes
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552899281435 11 connected 6825-6826 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552899282436 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552899281435 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552899280433 10 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552899279431 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

槽落到了其他节点，会返回客户端正确的槽管理节点

none

127.0.0.1:6379> set key:test:2 value1
(error) MOVED 9252 127.0.0.1:6380

当然，也可以加上 -c 参数让客户端自动处理重定向错误

none

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -c
127.0.0.1:6379> set key:test:2 value1
-> Redirected to slot [9252] located at 127.0.0.1:6380
OK
127.0.0.1:6380> get key:test:2
"value1"

2.3.2 ASK 重定向（todo）

三、Redis Cluster 集群管理

3.1 手动搭建

3.1.1 准备节点

安装过程就不记录了，一下是目录规范

bash

root@VM-128-6-ubuntu:/tmp/redis# tree 
.
|-- config
|   |-- redis_server_6379.conf
|   |-- redis_server_6380.conf
|   |-- redis_server_6381.conf
|   |-- redis_server_6382.conf
|   |-- redis_server_6383.conf
|   `-- redis_server_6384.conf
|-- data
`-- run

各节点配置都是类似的，由于实验环境是单机，所以端口号要区分开

bash

daemonize yes
port 6379
bind 127.0.0.1
dbfilename "6379.rdb"
appendfilename "6379.aof"
dir "/tmp/redis/data"
logfile "/tmp/redis/data/6379.log"
pidfile "/tmp/redis/run/redis_6379.pid"
appendonly yes

cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file "nodes-6379.conf"

启动节点

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6379.conf 
root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6380.conf 
root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6381.conf 
root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6382.conf 
root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6383.conf 
root@VM-128-6-ubuntu:/tmp/redis# redis-server config/redis_server_6384.conf

观察日志，可以看到是启动成功的，而且No cluster configuration found表示没有发现集群配置文件，此时他会自动生成一份配置文件。

bash

2777:M 18 Mar 12:09:52.249 * Increased maximum number of open files to 10032 (it was originally set to 1024).
2777:M 18 Mar 12:09:52.250 * No cluster configuration found, I'm 948b60efa1db1f09421c22d5030401a8dc09c0b3
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 3.0.7 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in cluster mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 2777
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

2777:M 18 Mar 12:09:52.261 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower val
ue of 128.
2777:M 18 Mar 12:09:52.261 # Server started, Redis version 3.0.7
2777:M 18 Mar 12:09:52.261 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.o
vercommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2777:M 18 Mar 12:09:52.261 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage
 issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc
.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2777:M 18 Mar 12:09:52.261 * The server is now ready to accept connections on port 6379

查看目录结构

bash

root@VM-128-6-ubuntu:/tmp/redis# tree 
.
|-- config
|   |-- redis_server_6379.conf
|   |-- redis_server_6380.conf
|   |-- redis_server_6381.conf
|   |-- redis_server_6382.conf
|   |-- redis_server_6383.conf
|   `-- redis_server_6384.conf
|-- data
|   |-- 6379.aof
|   |-- 6379.log
|   |-- 6380.aof
|   |-- 6380.log
|   |-- 6381.aof
|   |-- 6381.log
|   |-- 6382.aof
|   |-- 6382.log
|   |-- 6383.aof
|   |-- 6383.log
|   |-- 6384.aof
|   |-- 6384.log
|   |-- nodes-6379.conf
|   |-- nodes-6380.conf
|   |-- nodes-6381.conf
|   |-- nodes-6382.conf
|   |-- nodes-6383.conf
|   `-- nodes-6384.conf
`-- run
    |-- redis_6379.pid
    |-- redis_6380.pid
    |-- redis_6381.pid
    |-- redis_6382.pid
    |-- redis_6383.pid
    `-- redis_6384.pid

3 directories, 30 files

发现多出了集群node相关配置

bash

# cat data/nodes-6379.conf 
948b60efa1db1f09421c22d5030401a8dc09c0b3 :0 myself,master - 0 0 0 connected
vars currentEpoch 0 lastVoteEpoch 0

节点ID: 前四十位哈希值，用于表示集群内节点ID，不同于运行ID，节点ID只会在集群初始化时生成，后续都会重用

3.1.2 节点握手

节点握手，指一群运行在集群模式下的节点通过 gossip 协议打开感知其他节点的通讯方式，下面是手动配置方式

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-cli 
# 可以看到最初集群中只有自己一个人
127.0.0.1:6379> cluster nodes
948b60efa1db1f09421c22d5030401a8dc09c0b3 :6379 myself,master - 0 0 0 connected
# 执行 meet 就是要通过 gossip 协议发送握手通信，这是异步过程
127.0.0.1:6379> cluster meet 127.0.0.1 6380
OK
# 此时看到集群内有两个节点了
127.0.0.1:6379> cluster nodes
948b60efa1db1f09421c22d5030401a8dc09c0b3 127.0.0.1:6379 myself,master - 0 0 0 connected
040cd061d69dff6c3f11142c132501f43c94601e 127.0.0.1:6380 master - 0 1552883009915 1 connected
127.0.0.1:6379> cluster meet 127.0.0.1 6381
OK
127.0.0.1:6379> cluster meet 127.0.0.1 6382
OK
127.0.0.1:6379> cluster meet 127.0.0.1 6383
OK
127.0.0.1:6379> cluster meet 127.0.0.1 6384
OK
# 此时所有节点全部加入集群
127.0.0.1:6379> cluster nodes
a5561be500bd02dee9e407b6ab397226b09060f7 127.0.0.1:6384 master - 0 1552883023341 0 connected
948b60efa1db1f09421c22d5030401a8dc09c0b3 127.0.0.1:6379 myself,master - 0 0 3 connected
2be0421d33b46ecfe208b932b59538fd553583c4 127.0.0.1:6382 master - 0 1552883024996 3 connected
040cd061d69dff6c3f11142c132501f43c94601e 127.0.0.1:6380 master - 0 1552883022941 1 connected
06cb8009df15484922e2f307ec2c42d69d5b8e30 127.0.0.1:6383 master - 0 1552883023942 4 connected
44afcd878b18f9c2b6a048acf04f81cdc9bf52f7 127.0.0.1:6381 master - 0 1552883021937 2 connected

握手的内部流程大致是这样:

客户端发送 cluster meet <ip> <port> 到节点 6379
节点 6379 本地创建节点信息对象，然后发送 meet 给 6380
节点 6380 收到 meet 信息后，保存 6379 节点信息，然后回复 pong 消息

节点握手建立完成后，此时并不能正常工作，因为此时处于下线状态，所有数据的读写都被禁止

bash

127.0.0.1:6379> set hello world
(error) CLUSTERDOWN The cluster is down

通过cluster info可以看到详细状态信息，其中可以看到被分配的slot为0，由于目前没有分配slot到节点上，因此集群无法完成数据映射，只有当slot分配到节点后才能正常使用

bash

127.0.0.1:6379> cluster info
cluster_state:fail
cluster_slots_assigned:0
cluster_slots_ok:0
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:0
cluster_current_epoch:5
cluster_my_epoch:3
cluster_stats_messages_sent:771
cluster_stats_messages_received:771

3.1.3 分配槽（主节点）

通过 cluster addslots <槽数范围> 命令在节点上创建分配槽

bash

# 在 Redis Cluster 中分配了槽的节点称之为主节点
root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6379 cluster addslots {0..5461}
OK
root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6380 cluster addslots {5462..10922}
OK
root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6381 cluster addslots {10923..16383}
OK

分配后状态变为 ok

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6380 cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:5
cluster_my_epoch:1
cluster_stats_messages_sent:2769
cluster_stats_messages_received:2799

还剩下的三个节点，在 Redis Cluster 中分配了槽的节点称之为主节点，而从节点需要复制主节点槽数据从而保证故障时自动转移

3.1.4 复制槽（从节点）

所以，接下来需要配置从节点复制主节点槽数据，首先，获取所有节点 ID

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6380 cluster nodes
040cd061d69dff6c3f11142c132501f43c94601e 127.0.0.1:6380 myself,master - 0 0 1 connected 5462-10922
44afcd878b18f9c2b6a048acf04f81cdc9bf52f7 127.0.0.1:6381 master - 0 1552884575649 2 connected 10923-16383
948b60efa1db1f09421c22d5030401a8dc09c0b3 127.0.0.1:6379 master - 0 1552884573645 3 connected 0-5461
06cb8009df15484922e2f307ec2c42d69d5b8e30 127.0.0.1:6383 master - 0 1552884574647 4 connected
a5561be500bd02dee9e407b6ab397226b09060f7 127.0.0.1:6384 master - 0 1552884572644 0 connected
2be0421d33b46ecfe208b932b59538fd553583c4 127.0.0.1:6382 master - 0 1552884571641 5 connected

然后进行复制配置

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6382 cluster replicate 948b60efa1db1f09421c22d5030401a8dc09c0b3
OK
root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6383 cluster replicate 040cd061d69dff6c3f11142c132501f43c94601e
OK
root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6384 cluster replicate 44afcd878b18f9c2b6a048acf04f81cdc9bf52f7
OK

现在集群结构如下

bash

6379 ---replicate-->  6382
6380 ---replicate-->  6383
6381 ---replicate-->  6384

查看节点状态

bash

root@VM-128-6-ubuntu:/tmp/redis# redis-cli -h 127.0.0.1 -p 6380 cluster nodes
040cd061d69dff6c3f11142c132501f43c94601e 127.0.0.1:6380 myself,master - 0 0 1 connected 5462-10922
44afcd878b18f9c2b6a048acf04f81cdc9bf52f7 127.0.0.1:6381 master - 0 1552884776018 2 connected 10923-16383
948b60efa1db1f09421c22d5030401a8dc09c0b3 127.0.0.1:6379 master - 0 1552884775015 3 connected 0-5461
06cb8009df15484922e2f307ec2c42d69d5b8e30 127.0.0.1:6383 slave 040cd061d69dff6c3f11142c132501f43c94601e 0 1552884771009 4 connected
a5561be500bd02dee9e407b6ab397226b09060f7 127.0.0.1:6384 slave 44afcd878b18f9c2b6a048acf04f81cdc9bf52f7 0 1552884774014 2 connected
2be0421d33b46ecfe208b932b59538fd553583c4 127.0.0.1:6382 slave 948b60efa1db1f09421c22d5030401a8dc09c0b3 0 1552884773012 5 connected

到目前位置，手动建立 Redis 集群完成了，3 个节点负责处理槽和相关数据，3 个节点负责故障转移

手动搭建 Redis 集群主要为了理解其中过程和原理，为了提供生产效率，Redis 提供了 redis-trib.rb 工具来快速构建集群

3.2 脚本搭建

使用 redis-trib.rb 脚本安装更为便捷

3.2.1 环境配置

安装 Ruby 环境

bash

$ wget https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.1.tar.bz2
$ tar xf ruby-2.3.1.tar.bz2
$ ./configure --prefix=/usr/local/ruby
$ make && make install
$ cat >> /etc/profile << EOF
export  PATH=$PATH:/usr/local/ruby/bin
EOF
$ . /etc/profile

安装 rubygem redis 依赖

none

$ wget https://rubygems.org/downloads/redis-3.3.0.gem
$ gem install -l redis-3.3.0.gem
$ gem list|grep redis
redis (3.3.0)
$ cp /opt/redis/src/redis-trib.rb  /usr/local/bin/

测试

bash

$ redis-trib.rb 
Usage: redis-trib <command> <options> <arguments ...>
  create          host1:port1 ... hostN:portN
                  --replicas <arg>
  check           host:port
  info            host:port
  fix             host:port
                  --timeout <arg>
  reshard         host:port
                  --from <arg>
                  --to <arg>
                  --slots <arg>
                  --yes
                  --timeout <arg>
                  --pipeline <arg>
  rebalance       host:port
                  --weight <arg>
                  --auto-weights
                  --use-empty-masters
                  --timeout <arg>
                  --simulate
                  --pipeline <arg>
                  --threshold <arg>
  add-node        new_host:new_port existing_host:existing_port
                  --slave
                  --master-id <arg>
  del-node        host:port node_id
  set-timeout     host:port milliseconds
  call            host:port command arg arg .. arg
  import          host:port
                  --from <arg>
                  --copy
                  --replace
  help            (show this help)
For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.

3.2.2 节点准备

还是老样子，6个节点，配置除了端口号基本相同

bash

root@VM-128-6-ubuntu:/tmp/redis# tree 
.
|-- config
|   |-- redis_server_6379.conf
|   |-- redis_server_6380.conf
|   |-- redis_server_6381.conf
|   |-- redis_server_6382.conf
|   |-- redis_server_6383.conf
|   `-- redis_server_6384.conf
|-- data
`-- run

redis-server配置

bash

root@VM-128-6-ubuntu:/tmp/redis# cat config/redis_server_6379.conf 
daemonize yes
port 6379
bind 127.0.0.1
dbfilename "6379.rdb"
appendfilename "6379.aof"
dir "/tmp/redis/data"
logfile "/tmp/redis/data/6379.log"
pidfile "/tmp/redis/run/redis_6379.pid"
appendonly yes

cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file "nodes-6379.conf"

启动节点

bash

root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6379.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6380.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6381.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6382.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6383.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6384.conf 
root@VM-128-6-ubuntu:~# tree /tmp/redis/config/
/tmp/redis/config/
|-- redis_server_6379.conf
|-- redis_server_6380.conf
|-- redis_server_6381.conf
|-- redis_server_6382.conf
|-- redis_server_6383.conf
`-- redis_server_6384.conf

0 directories, 6 files
root@VM-128-6-ubuntu:~# tree /tmp/redis/
/tmp/redis/
|-- config
|   |-- redis_server_6379.conf
|   |-- redis_server_6380.conf
|   |-- redis_server_6381.conf
|   |-- redis_server_6382.conf
|   |-- redis_server_6383.conf
|   `-- redis_server_6384.conf
|-- data
|   |-- 6379.aof
|   |-- 6379.log
|   |-- 6380.aof
|   |-- 6380.log
|   |-- 6381.aof
|   |-- 6381.log
|   |-- 6382.aof
|   |-- 6382.log
|   |-- 6383.aof
|   |-- 6383.log
|   |-- 6384.aof
|   |-- 6384.log
|   |-- nodes-6379.conf
|   |-- nodes-6380.conf
|   |-- nodes-6381.conf
|   |-- nodes-6382.conf
|   |-- nodes-6383.conf
|   `-- nodes-6384.conf
`-- run
    |-- redis_6379.pid
    |-- redis_6380.pid
    |-- redis_6381.pid
    |-- redis_6382.pid
    |-- redis_6383.pid
    `-- redis_6384.pid

3 directories, 30 files

检查日志输出

bash

$ more /tmp/redis/data/6379.log 
27632:M 18 Mar 13:22:00.229 * Increased maximum number of open files to 10032 (it was originally set to 1024).
# ...
27632:M 18 Mar 13:22:00.242 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower va
lue of 128.
27632:M 18 Mar 13:22:00.242 # Server started, Redis version 3.0.7
27632:M 18 Mar 13:22:00.242 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.
overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
27632:M 18 Mar 13:22:00.242 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usag
e issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/r
c.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
27632:M 18 Mar 13:22:00.242 * The server is now ready to accept connections on port 6379

3.2.3 创建集群

通过redis-trib.rb create创建集群，--replicas 参数意为每个主节点从节点数，这里是1。代表每个主节点有一个从节点。

bash

$ redis-trib.rb create --replicas 1 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384 
>>> Creating cluster
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
127.0.0.1:6379
127.0.0.1:6380
127.0.0.1:6381
Adding replica 127.0.0.1:6382 to 127.0.0.1:6379
Adding replica 127.0.0.1:6383 to 127.0.0.1:6380
Adding replica 127.0.0.1:6384 to 127.0.0.1:6381
M: b758e6a0c9da00b65785420799e4af532a5528a1 127.0.0.1:6379
   slots:0-5460 (5461 slots) master
M: 4ab96902ae5860023c4aff27f04903d0dab4268f 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
M: 4e52b7d18ac8602a25f8ac39e62da2706c0b93ba 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
S: 3bd7b3dd35e31db25167e760e5feac99999997ea 127.0.0.1:6382
   replicates b758e6a0c9da00b65785420799e4af532a5528a1
S: 39edc3ff4f1b4b4b42f9b4714b05580e264c6ce9 127.0.0.1:6383
   replicates 4ab96902ae5860023c4aff27f04903d0dab4268f
S: bfd946d66e80d8db240da667b056bb30aa53c46c 127.0.0.1:6384
   replicates 4e52b7d18ac8602a25f8ac39e62da2706c0b93ba
Can I set the above configuration? (type 'yes' to accept):

redis-trib.rb自动选择了三个主节点及三个从节点，然后向我们确认是否采用这样的配置。

输入 yes，继续向下

bash

Can I set the above configuration? (type 'yes' to accept): yes 
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join....
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: b758e6a0c9da00b65785420799e4af532a5528a1 127.0.0.1:6379
   slots:0-5460 (5461 slots) master
M: 4ab96902ae5860023c4aff27f04903d0dab4268f 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
M: 4e52b7d18ac8602a25f8ac39e62da2706c0b93ba 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
M: 3bd7b3dd35e31db25167e760e5feac99999997ea 127.0.0.1:6382
   slots: (0 slots) master
   replicates b758e6a0c9da00b65785420799e4af532a5528a1
M: 39edc3ff4f1b4b4b42f9b4714b05580e264c6ce9 127.0.0.1:6383
   slots: (0 slots) master
   replicates 4ab96902ae5860023c4aff27f04903d0dab4268f
M: bfd946d66e80d8db240da667b056bb30aa53c46c 127.0.0.1:6384
   slots: (0 slots) master
   replicates 4e52b7d18ac8602a25f8ac39e62da2706c0b93ba
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

可以看到这个工具帮我们做了很多工作，例如节点配置更新，发送meet信息，分配槽到主节点，检查槽配置等等

我们可以进行再次确认下，可以看到都是ok的了。

none

# redis-cli cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_sent:236
cluster_stats_messages_received:236

到此，集群部署完成

3.2.4 检查验证

通过for循环创建1000条数据

bash

root@VM-128-6-ubuntu:/tmp/redis/config# for i in `seq 1 1000`; do redis-cli -h 127.0.0.1 -p 6379 -c set key$i key$i; done
...
ok

完成后查询下每个主节点的键数量，完完整整1000个均匀的分配到了三个节点上。

bash

root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6379 dbsize
(integer) 333
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6380 dbsize
(integer) 338
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6381 dbsize
(integer) 329

3.3 集群伸缩

Redis Cluster 支持在不影响对外提供服务的情况下，实现集群扩缩容。而对于 Redis Cluster 来说，集群伸缩=槽和数据在节点间移动

3.3.1 扩容集群

（1）准备新节点

操作步骤：

准备新节点
加入集群
迁移槽和数据

添加启动2个新节点，此时是孤儿节点

bash

root@VM-128-6-ubuntu:/tmp/redis/config# cp redis_server_6379.conf redis_server_6385.conf
root@VM-128-6-ubuntu:/tmp/redis/config# cp redis_server_6379.conf redis_server_6386.conf
root@VM-128-6-ubuntu:/tmp/redis/config# sed -r -i s/6379/6385/g redis_server_6385.conf 
root@VM-128-6-ubuntu:/tmp/redis/config# sed -r -i s/6379/6386/g redis_server_6386.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6385.conf 
root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6386.conf

（2）加入集群

加入集群，这里有两种方式将新节点加入集群中

方式一：cluster meet

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -h 127.0.0.1 -p 6379 cluster meet 127.0.0.1 6385
OK
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -h 127.0.0.1 -p 6379 cluster meet 127.0.0.1 6386
OK

方式二： redis-trib.rb可以完成为现有集群添加节点并直接配置为从节点的操作

bash

redis-trib.rb add-node 127.0.0.1:6385 127.0.0.1:6379
redis-trib.rb add-node 127.0.0.1:6385 127.0.0.1:6379

建议使用第二种方式，因为第二种方式在加入时会执行集群状态检查，如果发现新节点已经加入集群，自动退出加入流程。

但是，如果使用cluster meet加入集群，则被加入节点的集群和现有集群数据丢失错乱，后果非常严重，线上需要谨慎操作！！！

查看集群节点

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli cluster nodes
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552891919735 0 connected
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552891920737 3 connected 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552891918732 4 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552891916729 2 connected 5461-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552891921739 5 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 master - 0 1552891922742 7 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552891918231 6 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 1 connected 0-5460

新节点开始都是主节点状态，但是由于没有分配槽，所以无法执行读写操作。所以，一般新节点我们有两种后续操作

迁移其他节点槽到新节点
新节点作为主节点的从节点负责故障转移

（3）迁移新主

在迁移槽和数据之前，要先指定槽的迁移计划，确保每个节点负责相似数量的槽，从而保证各节点数据均匀，例如：

迁移前：

6479 负责 5461 个槽
6480 负责 5460 个槽
6481 负责 5460 个槽

迁移后：

6479 负责 4096 个槽
6480 负责 4096 个槽
6481 负责 4096 个槽
6482 负责 4096 个槽

① 手动迁移

手动迁移流程开始

在目标节点 6385 上准备导入 4096 个槽数据，6385 上槽大小为 4096

bash

127.0.0.1:6385> cluster setslot 4096 importing 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
OK

确认 6385 上 4096 槽导入状态开启

bash

127.0.0.1:6385> cluster nodes
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 master - 0 1552891976687 7 connected
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552891974679 1 connected
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552891974679 2 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552891976186 2 connected 5461-10922
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 myself,master - 0 0 0 connected [4096-<-620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39]
...

源节点 6379 准备导入 4096 槽数据给 6385

bash

127.0.0.1:6379> cluster setslot 4096 migrating f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK

确保 6379 槽输入导出状态

bash

127.0.0.1:6379> cluster nodes
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552892023429 0 connected
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552892021928 3 connected 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552892016917 4 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552892022929 2 connected 5461-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552892017919 5 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 master - 0 1552892023930 7 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552892019923 6 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 1 connected 0-5460 [4096->-f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56]

批量获取槽 4096 对应的键

bash

# 此时我这里 4096 槽是没有键的
127.0.0.1:6379> CLUSTER GETKEYSINSLOT 4096 100
(empty list or set)
# 所以迁移时提示 NOKEY
127.0.0.1:6379> MIGRATE 127.0.0.1 6385 "" 0 5000 keys
NOKEY
# 向所有主节点通知 4096 槽由 6385 接管了
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6379 CLUSTER SETSLOT 4096 node f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6380 CLUSTER SETSLOT 4096 node f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6381 CLUSTER SETSLOT 4096 node f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK
root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6385 CLUSTER SETSLOT 4096 node f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK

查看 6385 节点槽负责情况

bash

127.0.0.1:6385> cluster nodes
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 master - 0 1552892350453 7 connected
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552892351455 1 connected
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552892346946 2 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552892347949 2 connected 5461-10922
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 myself,master - 0 0 8 connected 4096
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 master - 0 1552892347447 1 connected 0-4095 4097-5460
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552892349451 3 connected
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552892348450 3 connected 10923-16383

可以看到，6379 负责 0-4095 4097-5460 范围内的槽，而 6385 4096 一个槽

② 脚本迁移

redis-trib.rb 提供了快速迁移槽和数据的功能，命令规范如下:

bash

redis-trib.rb reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg>

–from：源节点ID，如果有多个用逗号隔开
–to：需要迁移到的目标节点 ID，只能有一个
–slots：迁移槽总数
–yes：当打印出迁移计划时是否需要用户输入在执行
–timeout：迁移超时时间，默认 60000ms
–pipeline：控制每次批量迁移键数，默认 10 个

之前已经迁移了一个槽，剩下的由 redis-trib.rb 来完成

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb reshard 127.0.0.1:6379
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379
   slots:0-4095,4097-5460 (5460 slots) master
   1 additional replica(s)
M: f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385
   slots:4096 (1 slots) master
   0 additional replica(s)
M: 897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
S: d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382
   slots: (0 slots) slave
   replicates 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
M: 9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383
   slots: (0 slots) slave
   replicates 9f2ea00de86082644b5960fd1b63071bbc6cdd79
M: 92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386
   slots: (0 slots) master
   0 additional replica(s)
S: d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384
   slots: (0 slots) slave
   replicates 897f924f94ec11fffa2d29004758426900da5816
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)?

输入迁移多少槽，以及从那些节点迁移，done 表示结束

bash

How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
Source node #2:9f2ea00de86082644b5960fd1b63071bbc6cdd79
Source node #3:897f924f94ec11fffa2d29004758426900da5816
Source node #4:done

迁移前会打印所有槽的迁移计划，确认无误后驶入 yes 开始迁移

bash

...
    Moving slot 1359 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
    Moving slot 1360 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
    Moving slot 1361 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
    Moving slot 1362 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
    Moving slot 1363 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
    Moving slot 1364 from 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
Do you want to proceed with the proposed reshard plan (yes/no)? yes

终端会不断刷新打印迁移过程，完毕后会自动退出迁移

bash

Moving slot 1357 from 127.0.0.1:6379 to 127.0.0.1:6385: 
Moving slot 1358 from 127.0.0.1:6379 to 127.0.0.1:6385: 
Moving slot 1359 from 127.0.0.1:6379 to 127.0.0.1:6385: .
Moving slot 1360 from 127.0.0.1:6379 to 127.0.0.1:6385: .
Moving slot 1361 from 127.0.0.1:6379 to 127.0.0.1:6385: 
Moving slot 1362 from 127.0.0.1:6379 to 127.0.0.1:6385: 
Moving slot 1363 from 127.0.0.1:6379 to 127.0.0.1:6385: 
Moving slot 1364 from 127.0.0.1:6379 to 127.0.0.1:6385: 
root@VM-128-6-ubuntu:/tmp/redis/config#

检查槽迁移完毕后节点槽分配情况

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli cluster nodes
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552893305413 8 connected 0-1364 4096 5461-6826 10923-12287
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552893311426 3 connected 12288-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552893308419 4 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552893309422 2 connected 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552893310926 5 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 master - 0 1552893310424 7 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552893307417 6 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 1 connected 1365-4095 4097-5460

可以看到6379、6380、6381、6382都参与槽的重新分配，而且已存在的键都均匀分配到了4个主节点。

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6379 dbsize
(integer) 248
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6380 dbsize
(integer) 255
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6381 dbsize
(integer) 250
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6385 dbsize
(integer) 247

为了确保主节点间槽分配均衡，可以通过这个命令检测

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb rebalance 127.0.0.1:6380
>>> Performing Cluster Check (using node 127.0.0.1:6380)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 2.0% threshold.

可以看到，当前槽数差异小于2%所以没必要进行均衡

（4）添加新从

添加 6386 节点做为 6385 节点的从节点

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6386 cluster replicate f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
OK

查看集群节点状态

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6386 cluster nodes
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 master - 0 1552893608367 1 connected 1365-4095 4097-5460
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552893610371 3 connected 12288-16383
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552893606874 2 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 myself,slave f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 0 0 7 connected
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552893609880 1 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552893607363 3 connected
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552893612376 8 connected 0-1364 4096 5461-6826 10923-12287
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552893611374 2 connected 6827-10922

3.3.2 收缩集群

迁移主要流程如下：

如果要迁移的节点拥有槽数据的话，需要先迁移槽数据，保证集群槽节点映射的完整性
当迁移完成后，不再负责槽或者本身是从节点情况下，可以通知集群内其他节点忘记下线节点，然后正常关闭下线节点

（1）下线迁移槽

经过上面扩容后，此时有四个主节点(6379、6380、6381、6385)，四个从节点(6382、6383、6384、6386)。此时如果要下线 6385 节点，就需要将该节点的 4096 个槽均匀的分配的其他三个节点上去。

此时，6385 就变成了源节点，而其他主节点变成了目标节点，所以需要执行三次，1365、1366、1366 将这个槽分配给其余主节点。

在 6385 节点上重新分配槽

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb reshard 127.0.0.1:6385
>>> Performing Cluster Check (using node 127.0.0.1:6385)
M: f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385
   slots:0-1364,4096,5461-6826,10923-12287 (4097 slots) master
   1 additional replica(s)
S: 92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386
   slots: (0 slots) slave
   replicates f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
S: d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382
   slots: (0 slots) slave
   replicates 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
S: 4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383
   slots: (0 slots) slave
   replicates 9f2ea00de86082644b5960fd1b63071bbc6cdd79
M: 9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380
   slots:6827-10922 (4096 slots) master
   1 additional replica(s)
M: 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379
   slots:1365-4095,4097-5460 (4095 slots) master
   1 additional replica(s)
S: d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384
   slots: (0 slots) slave
   replicates 897f924f94ec11fffa2d29004758426900da5816
M: 897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381
   slots:12288-16383 (4096 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1365
What is the receiving node ID? 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
# 输入迁移槽数
Source node #1: How many slots do you want to move (from 1 to 16384)? 1365 
# 输入迁移目标节点 6379 的 ID   
What is the receiving node ID? 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
# 输入源节点 6385 的 ID
Source node #1:f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
...
    Moving slot 1364 from f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Do you want to proceed with the proposed reshard plan (yes/no)? yes 
...

接下来就是迁移剩下槽数到其他两个主节点

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb reshard 127.0.0.1:6385
>>> Performing Cluster Check (using node 127.0.0.1:6385)
...
# 输入迁移槽数
How many slots do you want to move (from 1 to 16384)?1365   
# 输入迁移目标节点 6380 的 ID
What is the receiving node ID? 9f2ea00de86082644b5960fd1b63071bbc6cdd79
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
# 输入源节点 6385 的 ID
Source node #1:f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Source node #2: done
Do you want to proceed with the proposed reshard plan (yes/no)? yes

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb reshard 127.0.0.1:6385
>>> Performing Cluster Check (using node 127.0.0.1:6385)
...
# 输入迁移槽数
How many slots do you want to move (from 1 to 16384)?1366 
# 输入迁移目标节点6381的ID  
What is the receiving node ID? 897f924f94ec11fffa2d29004758426900da5816
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
# 输入源节点 6385 的 ID
Source node #1:f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Source node #2: done
Do you want to proceed with the proposed reshard plan (yes/no)? yes

检查 6385 上是否还存在有槽和数据，我靠，拉下了一个槽没迁移

bash

root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6379 CLUSTER nodes
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552897498850 8 connected 12287
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552897493841 11 connected 6825-6826 10923-12286 12288-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552897495845 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552897493841 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552897494843 10 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 slave f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 0 1552897496846 8 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552897497849 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

将 6385 上剩余的一个槽迁移到 6381 上

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb reshard 127.0.0.1:6385
>>> Performing Cluster Check (using node 127.0.0.1:6385)
...
How many slots do you want to move (from 1 to 16384)? 1
What is the receiving node ID? 897f924f94ec11fffa2d29004758426900da5816
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1:f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Source node #2:done                                               

Ready to move 1 slots.
  Source nodes:
    M: f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385
   slots:12287 (1 slots) master
   1 additional replica(s)
  Destination node:
    M: 897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381
   slots:6825-6826,10923-12286,12288-16383 (5462 slots) master
   1 additional replica(s)
  Resharding plan:
    Moving slot 12287 from f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 12287 from 127.0.0.1:6385 to 127.0.0.1:6381:

再次确认，此时 6385 不再负责任何槽和数据

bash

root@VM-128-6-ubuntu:/tmp/redis/data# redis-cli -p 6379 CLUSTER nodes
f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 127.0.0.1:6385 master - 0 1552897641130 8 connected
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552897637122 11 connected 6825-6826 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552897636121 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552897636621 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552897639125 10 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552897640127 11 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552897638123 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

检查数据完整性，OK~ 没问题！

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6379 dbsize
(integer) 333
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6380 dbsize
(integer) 338
root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6381 dbsize
(integer) 329

（2）忘记节点

由于 Redis 使用Gossip协议进行节点交换数据，所以需要用一种机制来让节点停止和下线节点通讯

Redis 提供了cluster forget <downNodeID>命令实现该功能。节点接收到cluster forget命令后，会将节点ID加入禁止列表中，禁用列表有限期为 60 秒，在这期间，集群内所有节点会忘记下线节点

但是，线上不建议使用这种方式，因为需要和大量节点交互，操作过于繁琐而且容易疏漏

建议使用redis-trib.rb del-node <host:port> {downNodeID}命令

内部实现原理大致如下：

检测是否存在 slots，退出
循环遍历节点，如果是忘记节点 ID 等于本机，则忽略继续
如果下线节点拥有从节点，则将从节点指向最少从节点的主节点
最后发送忘记节点命令

操作步骤：

下线忘记 6386 主节点

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-trib.rb del-node 127.0.0.1:6379 f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 
>>> Removing node f460cfa36701e253fa9e5e9ec0cbef6f0ef1df56 from cluster 127.0.0.1:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
root@VM-128-6-ubuntu:/tmp/redis/config# netstat -tunlp|grep 85

可以看到此时 6385 节点进程已经被关闭，再次查看下主节点 6385 的从节点 6386 情况如何

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6379 cluster nodes
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552898687238 11 connected 6825-6826 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552898685234 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552898681726 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552898682228 10 connected
92f5c3c965182016fe0c2f106d3eed595f785603 127.0.0.1:6386 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552898686236 11 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552898684231 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

此时的 6386 节点已经是 6381 节点的从了

下线从节点 6386

bash

# redis-trib.rb del-node 127.0.0.1:6379 92f5c3c965182016fe0c2f106d3eed595f785603
>>> Removing node 92f5c3c965182016fe0c2f106d3eed595f785603 from cluster 127.0.0.1:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
root@VM-128-6-ubuntu:/tmp/redis/config# netstat -tunlp|grep 86

可以看到 6386 也被停止了，最后查看下集群节点状态

bash

root@VM-128-6-ubuntu:/tmp/redis/config# redis-cli -p 6379 cluster nodes
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552898842574 11 connected 6825-6826 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552898844579 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552898843576 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552898845581 10 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552898841571 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

此时，又恢复到最初的三主三从的状态了

3.4 数据迁移

官方提供的 redis-trib.rb 支持数据迁移，但是有以下几个问题

只能从单机迁移到集群
不支持在线迁移、迁移中业务方必须停止读写
迁移时如果出现异常，无法断点续传
单线程迁移，大数据量时迁移慢

所以，推荐开源社区中唯品会开发的 redis-migrate-tools，它支持一下特性

丰富的迁移源类型、单机、Twemproxy、Redis Cluster、RDB/AOF等
在线迁移数据，通过模拟从节点基于流迁移，不影响业务方读写
采用多线程加速迁移且提供数据校验及迁移状态查看

3.4.1 安装工具

安装 redis-migrate-tools

bash

$ git clone https://github.com/vipshop/redis-migrate-tool.git
$ cd redis-migrate-tool/
$ autoreconf -fvi
$ ./configure && make
$ cp src/redis-migrate-tool /usr/local/bin/

3.4.2 准备节点数据

创建单机节点并生成随即数据

none

# 6385 配置如下
daemonize yes
port 6385
bind 127.0.0.1
dbfilename "6385.rdb"
appendfilename "6385.aof"
dir "/tmp/redis/data"
logfile "/tmp/redis/data/6385.log"
pidfile "/tmp/redis/run/redis_6385.pid"
appendonly yes

启动服务

none

# redis-server /tmp/redis/config/redis_server_6385.conf
# for i in `seq 300`; do redis-cli -h 127.0.0.1 -p 6385 set data$i data$i; done
# redis-cli -p 6385 dbsize
(integer) 300

3.4.3 单机 -> 集群

导入单机数据到集群中

ini

# cat migrate-rmt.conf 
[source]
# 从单机迁移
type: single
servers:
 # 源节点列表
 - 127.0.0.1:6385
[target]
# 目标是 Redis 集群
type: redis cluster
servers:
   # 集群主节点地址
 - 127.0.0.1:6379
[common]
# 监听 8888，可以查看状态信息
listen: 0.0.0.0:8888

执行迁移命令

bash

# redis-migrate-tool -c migrate-rmt.conf -o /tmp/migrate-console.log -d

查看迁移日志

bash

# cat /tmp/migrate-console.log
...
[2019-03-20 10:17:05.729] rmt_core.c:2551 migrate job is running...
[2019-03-20 10:17:05.730] rmt_redis.c:1706 Start connecting to MASTER[127.0.0.1:6385].
[2019-03-20 10:17:05.730] rmt_redis.c:1740 Master[127.0.0.1:6385] replied to PING, replication can continue...
[2019-03-20 10:17:05.730] rmt_redis.c:1051 Partial resynchronization for MASTER[127.0.0.1:6385] not possible (no cached master).
[2019-03-20 10:17:05.730] rmt_redis.c:1110 Full resync from MASTER[127.0.0.1:6385]: e7f2988dfe40ea670b5288cde78019f2f5f41c68:1
[2019-03-20 10:17:05.858] rmt_redis.c:1517 MASTER <-> SLAVE sync: receiving 4904 bytes from master[127.0.0.1:6385]
[2019-03-20 10:17:05.858] rmt_redis.c:1623 MASTER <-> SLAVE sync: RDB data for node[127.0.0.1:6385] is received, used: 0 s
[2019-03-20 10:17:05.858] rmt_redis.c:1643 rdb file node127.0.0.1:6385-1553048225730532-10072.rdb write complete
[2019-03-20 10:17:05.860] rmt_redis.c:6601 Rdb file for node[127.0.0.1:6385] parsed finished, use: 0 s.
[2019-03-20 10:17:05.860] rmt_redis.c:6709 All nodes' rdb file parsed finished for this write thread(0).
...

3.5 故障转移

3.5.1 故障发现

（1）主观下线

主观下线：指某一个节点认定另一节点不可用，仅代表一家之言，可能存在误判
客观下线：集群内多个节点认为该节点不可用，标记一个节点真正的下线，如果是持有槽的主节点故障，此时需要做故障转移

流程图

（2）客观下线

当集群中节点判定另一节点主观下线后，会通过 ping/pong 消息体发送包含 pfail 状态的判定信息给其他主节点，只有主节点参与故障发现决策，因为之后主节点负责读写请求和槽信息维护
当小于半数持有槽的主节点标记时退出客观下线流程。半数以上持有槽的主节点标记某个节点主观下线时，触发客观下线流程，同时维护下线报告链表，每隔下线报告都存在有效期，如果下线报告超过有效期(cluster-node-timeout * 2)则会删除，所以cluster-node-timeout不建议过小
向集群内广播一条 fail 消息，通知所有节点将故障节点标记为 客观下线

3.5.2 故障恢复

故障节点客观下线后，为了保证集群的高可用，需要从它的从节点中寻找一个作为替代

故障恢复流程大致如下：

资格检查
准备选举时间
发起选举
选举投票
替换主节点

（1）资源检查

每个从节点都要检查最后和主节点通信时间的，如果超过 cluster-node-time * cluster-slave-validity-factor 则没有资格竞选主节点，cluster-slave-validity-factor 是从节点有效因子，默认为 10

（2）准备选举时间

这里之所以搞一个选举时间，其实是为了通过增加延迟触发机制，实现优先级问题，例如，复制偏移量越大的从节点，应该具有越高级别的优先级

（3）发起选举

当到达故障选举时间 failover_auth_time，发起选举，首先更新配置纪元，

配置纪元是一个只增不减的整数，每个主节点为了一个自身的配置纪元，所有主节点的配置纪元都不相等，集群中还有一个全局的配置纪元。

配置纪元的主要作用是：

标识集群中主节点的当前版本和集群的最大版本
每次发生重大时间时，都会递增配置纪元，例如添加主节点，从节点晋升
主节点的配置纪元越大，代表了越新，当遇见槽节点冲突检测或节点投票选举冲突检测时，以配置纪元最大的一方为准

从节点发起投票，自增集群内全局配置纪元，并单独保存在clusterState.failover_auth_epoch变量中以标识本次从节点发起选举的版本，最后在集群中广播选举消息

（4）选举投票

持有槽的主节点都拥有一张选票，当接收到第一个从节点投票请求后，恢复 FAILOVER_AUTH_ACK 作为投票，之后忽略其他从节点选举消息。

当从节点收到超过 N/2+1 主节点投票后，开始执行替换主节点流程

（5）替换主节点

首先，从节点会取消复制，变为主节点
执行 clusterDelSlot 操作撤销主节点负责的槽，并执行 clusterAddSlot 将槽分配给自己
向集群广播自己的 pong 消息，通知集群内所有节点，当前从节点变为主节点，接管了故障主节点的槽信息

3.5.3 模拟故障

（1）确认当前状态

bash

root@VM-128-6-ubuntu:~# redis-cli cluster nodes
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 master - 0 1552910228109 11 connected 6825-6826 10923-16383
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552910225103 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552910229111 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552910226104 10 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 slave 897f924f94ec11fffa2d29004758426900da5816 0 1552910227107 11 connected
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

（2）kill 模拟故障

强制杀死 6381 节点进程

bash

root@VM-128-6-ubuntu:~# ps -ef|grep 6381
root      5967     1  0 14:39 ?        00:00:17 redis-server 127.0.0.1:6381 [cluster]                
root     17360 17106  0 19:58 pts/0    00:00:00 grep --color=auto 6381
root@VM-128-6-ubuntu:~# kill -9 5967

（3）分析日志

观察 6384 从节点日志

bash

# 开始从主节点同步数据
5987:S 18 Mar 19:58:38.644 * MASTER <-> SLAVE sync started
# 同步失败，端口连接拒绝
5987:S 18 Mar 19:58:38.644 # Error condition on socket for SYNC: Connection refused
# 发送关于6381主节点FAIL消息
5987:S 18 Mar 19:58:39.238 * FAIL message received from 9f2ea00de86082644b5960fd1b63071bbc6cdd79 about 897f924f94ec11fffa2d29004758426900da5816
# 集群状态变更为fail状态
5987:S 18 Mar 19:58:39.238 # Cluster state changed: fail
# 进入客观下线后准备选举时间，当前排名以及偏移量
5987:S 18 Mar 19:58:39.246 # Start of election delayed for 952 milliseconds (rank #0, offset 46216).
5987:S 18 Mar 19:58:39.648 * Connecting to MASTER 127.0.0.1:6381
5987:S 18 Mar 19:58:39.648 * MASTER <-> SLAVE sync started
5987:S 18 Mar 19:58:39.648 # Error condition on socket for SYNC: Connection refused
# 开始进行选举，请求主节点投票
5987:S 18 Mar 19:58:40.250 # Starting a failover election for epoch 12.
# 获取超过2票后，开始执行替换主节点操作
5987:S 18 Mar 19:58:40.302 # Failover election won: I'm the new master.
# 故障转移后设置配置纪元为12
5987:S 18 Mar 19:58:40.302 # configEpoch set to 12 after successful failover
# 清理过期的主节点状态
5987:M 18 Mar 19:58:40.302 * Discarding previously cached master state.
# 集群状态恢复为ok
5987:M 18 Mar 19:58:40.302 # Cluster state changed: ok

观察其他主节点日志

bash

# 6379 主节点 日志
...
5939:M 18 Mar 16:19:30.153 # New configEpoch set to 9
5939:M 18 Mar 16:19:30.153 # configEpoch updated after importing slot 0
# 收到来自 6380 发来关于 6381 主节点 FAIL 消息
5939:M 18 Mar 19:58:39.238 * FAIL message received from 9f2ea00de86082644b5960fd1b63071bbc6cdd79 about 897f924f94ec11fffa2d29004758426900da5816
# 集群状态变更为 fail
5939:M 18 Mar 19:58:39.254 # Cluster state changed: fail
# 投票给 6384 节点升级为主节点
5939:M 18 Mar 19:58:40.301 # Failover auth granted to d98e4dbb2b558f8c7788330cd08eaaf1a5504071 for epoch 12
# 从节点升级完成，集群状态切换为 ok
5939:M 18 Mar 19:58:40.341 # Cluster state changed: ok


# 6380 主节点日志
...
5962:M 18 Mar 16:22:03.202 # New configEpoch set to 10
5962:M 18 Mar 16:22:03.202 # configEpoch updated after importing slot 4096
5962:M 18 Mar 19:58:39.198 * Marking node 897f924f94ec11fffa2d29004758426900da5816 as failing (quorum reached).
# 切换集群状态为 fail
5962:M 18 Mar 19:58:39.238 # Cluster state changed: fail
# # 投票给 6384 节点升级为主节点
5962:M 18 Mar 19:58:40.301 # Failover auth granted to d98e4dbb2b558f8c7788330cd08eaaf1a5504071 for epoch 12
# 从节点升级完成，集群状态切换为 ok
5962:M 18 Mar 19:58:40.342 # Cluster state changed: ok

观察其他从节点日志

bash

# 6382 从节点日志
...
# 标记 6381 主节点为 fail 状态
5973:S 18 Mar 19:58:37.393 * Marking node 897f924f94ec11fffa2d29004758426900da5816 as failing (quorum reached).
5973:S 18 Mar 19:58:37.397 # Cluster state changed: fail
5973:S 18 Mar 19:58:40.340 # Cluster state changed: ok

# 6383 从节点日志
# 标记 6381 主节点为 fail 状态
5978:S 18 Mar 19:58:37.292 * Marking node 897f924f94ec11fffa2d29004758426900da5816 as failing (quorum reached).
5978:S 18 Mar 19:58:37.305 # Cluster state changed: fail
5978:S 18 Mar 19:58:40.342 # Cluster state changed: ok

（4）恢复故障节点

恢复 6381 原主节点

bash

root@VM-128-6-ubuntu:~# redis-server /tmp/redis/config/redis_server_6381.conf

观察日志

bash

19616:M 18 Mar 20:14:56.135 # Configuration change detected. Reconfiguring myself as a replica of d98e4dbb2b558f8c7788330cd08eaaf1a5504071
19616:S 18 Mar 20:14:56.136 # Cluster state changed: ok
19616:S 18 Mar 20:14:57.141 * Connecting to MASTER 127.0.0.1:6384
19616:S 18 Mar 20:14:57.141 * MASTER <-> SLAVE sync started
19616:S 18 Mar 20:14:57.141 * Non blocking connect for SYNC fired the event.
19616:S 18 Mar 20:14:57.142 * Master replied to PING, replication can continue...
19616:S 18 Mar 20:14:57.142 * Partial resynchronization not possible (no cached master)
19616:S 18 Mar 20:14:57.143 * Full resync from master: bec8ecac27fd0efee6e489ac9fcf6074c0029f84:1
19616:S 18 Mar 20:14:57.241 * MASTER <-> SLAVE sync: receiving 4883 bytes from master
19616:S 18 Mar 20:14:57.242 * MASTER <-> SLAVE sync: Flushing old data
19616:S 18 Mar 20:14:57.242 * MASTER <-> SLAVE sync: Loading DB in memory
19616:S 18 Mar 20:14:57.242 * MASTER <-> SLAVE sync: Finished with success
...

观察集群节点情况

bash

root@VM-128-6-ubuntu:~# redis-cli cluster nodes
897f924f94ec11fffa2d29004758426900da5816 127.0.0.1:6381 slave d98e4dbb2b558f8c7788330cd08eaaf1a5504071 0 1552911543656 12 connected
d97435e9c52fcba27cf0e4eb4e680b9d08e9e272 127.0.0.1:6382 slave 620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 0 1552911544658 9 connected
9f2ea00de86082644b5960fd1b63071bbc6cdd79 127.0.0.1:6380 master - 0 1552911546662 10 connected 4096 5461-6824 6827-10922
4ec5af7256cd40393731a64e006533ae00632e4b 127.0.0.1:6383 slave 9f2ea00de86082644b5960fd1b63071bbc6cdd79 0 1552911547162 10 connected
d98e4dbb2b558f8c7788330cd08eaaf1a5504071 127.0.0.1:6384 master - 0 1552911545659 12 connected 6825-6826 10923-16383
620f0ece6d3c0301f261f23fc2ed3aa0dcf1fb39 127.0.0.1:6379 myself,master - 0 0 9 connected 0-4095 4097-5460

可以看到，当原主节点 6381 恢复后，此时编程了新主节点 6384 的从节点。而且数据保持同步

bash

root@VM-128-6-ubuntu:~# redis-cli -p 6384 dbsize
(integer) 329
root@VM-128-6-ubuntu:~# redis-cli -p 6381 dbsize
(integer) 329

http://yo-yo.fun/redis/c897ae91056d/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Da !

NoSQL Redis Cluster

Redis 持久化

2023-04-18 Redis

NoSQL Redis RDB AOF

Redis Sentinel 哨兵架构

2023-04-18 Redis

NoSQL Redis Sentinel

Redis Cluster 集群架构

Redis Cluster 集群架构

一、Redis Cluster 是什么？

二、Redis Cluster 工作原理

2.1 数据分布

2.1.1 节点取余分区

2.1.2 一致性哈希分区

2.1.3 虚拟槽分区

2.2 节点通信

2.2.1 通信流程

2.2.2 Gossip 信息

2.2.3 节点选择

2.3 请求路由

2.3.1 请求重定向

2.3.2 ASK 重定向（todo）

三、Redis Cluster 集群管理

3.1 手动搭建

3.1.1 准备节点

3.1.2 节点握手

3.1.3 分配槽（主节点）

3.1.4 复制槽（从节点）

3.2 脚本搭建

3.2.1 环境配置

3.2.2 节点准备

3.2.3 创建集群

3.2.4 检查验证

3.3 集群伸缩

3.3.1 扩容集群

（1）准备新节点

（2）加入集群

（3）迁移新主

① 手动迁移

② 脚本迁移

（4）添加新从

3.3.2 收缩集群

（1）下线迁移槽

（2）忘记节点

3.4 数据迁移

3.4.1 安装工具

3.4.2 准备节点数据

3.4.3 单机 -> 集群

3.5 故障转移

3.5.1 故障发现

（1）主观下线

（2）客观下线

3.5.2 故障恢复

（1）资源检查

（2）准备选举时间

（3）发起选举

（4）选举投票

（5）替换主节点

3.5.3 模拟故障

（1）确认当前状态

（2）kill 模拟故障

（3）分析日志

（4）恢复故障节点

微信扫一扫：分享