引言

我们在实际业务环境中出于业务需求高可用架构的目的可能需要搭建redis集群来满足要求,而使用ansible来部署是最方便的方式,以下是一个部署的ansible role,我在centos7测试环境中已经实际部署过可以正常启动集群!且解决了好几个bug(悲)

老规矩直接先上项目完整源码:https://github.com/CaptainValk/ansible-redis-cluster

先看目录:
image

准备工作

redis集群的最低要求是六个节点,所以我们选择三主三从的部署方式,这就需要三台主机:

hosts

[redis]
node1 192.168.209.131
node2 192.168.209.132
node3 192.168.209.133

默认我们已经是创建好ansible免密登录的,如果是其他密码验证方式,根据自身情况修改hosts文件。

首先在开头讲一下需要修改的地方:

  • var变量目录下的main.yml
  • hosts里的主机清单
  • 最后一步集群创建命令里的ip

创建Ansible Roles

首先我们看一下var/main.yml

---
# vars file for redis
master_port: 6379
slave_port: 6380
file_name: 'redis-6.2.5'
install_path: '/usr/local'
data_path: '/data/redis'
redis_password: 'E9QZjNtacn'

master_port和slave_port分别是主节点端口和从节点端口,file_name指的是安装包的名字,我们使用源码包安装,可以在这里指定版本,一定要和files文件夹下的安装包对应!如果需要部署其他版本就在此修改。

data_path是redis的节点数据保存位置,默认不需要修改,如果要修改请参考tasks。

redis_password是redis的密码,对应配置文件的masterauthrequirepass字段。

tip:注意不要把你的注释写进变量文件的任何既有行后面!否则会被变量一起读取进去(惨痛教训)

然后是redis.yml主入口

---
- hosts: redis
  roles:
    - redis-cluster

最后是正文:

每一步的name都解释了task的作用,需要修改配置文件请在对应的地方修改,默认不需要修改!

---
 - name: Create Base Path
   file:
     path: '{{install_path}}/redis/{{item.port}}'
     state: 'directory'
   become: true
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
 
 - name: Create Base Path
   file:
     path: '{{data_path}}/{{item.port}}'
     state: 'directory'
   become: true
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
 
 - name: Check if redis exists
   shell: 'ps -ef | grep redis-server | grep -v "grep" | wc -l'
   register: check_redis
 
 - name: Stop redis server
   shell: 'redis-cli -h localhost -p {{item.port}} -a {{redis_password}} shutdown'
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
   when: check_redis.stdout|int > 0
 
 - name: Delete aof
   file:
    path: '{{data_path}}/{{item.port}}/appendonly.aof'
    state: absent
   become: true
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
 
 - name: Delete nodes.conf
   file:
    path: '{{install_path}}/{{item.port}}/nodes.conf'
    state: absent
   become: true
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
 
 - name: Copy redis tar
   copy:
     src: '{{file_name}}.tar.gz'
     dest: '/tmp'
   become: true
 
 - name: Unpack redis tar
   unarchive:
     src: '/tmp/{{file_name}}.tar.gz'
     dest: '{{install_path}}'
     remote_src: yes
     owner: root
     group: root
   become: true
 - name: Installing Redis
   shell: make && make install
   args:
     chdir: '{{install_path}}/{{file_name}}'
   become: true
 
 - name: Replace common config
   replace:
     path: '{{install_path}}/{{file_name}}/redis.conf'
     regexp: '{{item.regexp}}'
     replace: '{{ item.line }}'
   with_items:
     - { regexp: 'bind 127.0.0.1 -::1', line: 'bind 0.0.0.0'}
     - { regexp: '# cluster-enabled yes', line: 'cluster-enabled yes'}
     - { regexp: '# cluster-config-file nodes-6379.conf', line: 'cluster-config-file {{install_path}}/redis/{{master_port}}/nodes.conf'}
     - { regexp: '# cluster-node-timeout 15000', line: 'cluster-node-timeout 5000'}
     - { regexp: 'appendonly no', line: 'appendonly yes'}
     - { regexp: 'daemonize no', line: 'daemonize yes'}
     - { regexp: '# masterauth <master-password>', line: 'masterauth {{redis_password}}'}
     - { regexp: '# requirepass foobared', line: 'requirepass {{redis_password}}'}
     - { regexp: 'pidfile /var/run/redis_6379.pid', line: 'pidfile  {{data_path}}/{{master_port}}/redis.pid'}
     - { regexp: 'dir ./', line: 'dir {{data_path}}/{{master_port}}'}
   become: true

 - name: Copy config
   copy:
     src: '{{install_path}}/{{file_name}}/redis.conf'
     dest: '{{install_path}}/redis/{{item.port}}/redis.conf'
     remote_src: true
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
   become: true
 
 - name: Replace slave config
   replace:
     path: '{{install_path}}/redis/{{slave_port}}/redis.conf'
     regexp: '{{item.regexp}}'
     replace: '{{ item.line }}'
   with_items:
     - { regexp: 'cluster-config-file {{install_path}}/redis/{{master_port}}/nodes.conf', line: 'cluster-config-file {{install_path}}/redis/{{slave_port}}/nodes.conf'}
     - { regexp: 'pidfile  {{data_path}}/{{master_port}}/redis.pid', line: 'pidfile {{data_path}}/{{slave_port}}/redis.pid'}
     - { regexp: 'dir {{data_path}}/{{master_port}}', line: 'dir {{data_path}}/{{slave_port}}'}
   become: true
 
 
 - name: Modify port
   replace:
     path: '{{item.path}}'
     regexp: '{{item.regexp}}'
     replace: '{{ item.line }}'
   become: true
   with_items:
     - {path: '{{install_path}}/redis/{{master_port}}/redis.conf', regexp: 'port 6379', line: 'port {{master_port}}'}
     - {path: '{{install_path}}/redis/{{slave_port}}/redis.conf', regexp: 'port 6379', line: 'port {{slave_port}}'}

 - name: Delete nodes conf
   file:
     path: '{{install_path}}/{{item.port}}/nodes-{{item.port}}.conf'
     state: 'absent'
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
   become: true
 
 - name: Open redis ports
   firewalld:
    port:
      - '{{item.port}}/tcp'
      - '1{{item.port}}/tcp'
    permanent: yes
    state: enabled
    immediate: yes
    with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
   become: true

 - name: Start redis
   shell: '{{install_path}}/{{file_name}}/src/redis-server {{install_path}}/redis/{{item.port}}/redis.conf'
   with_items:
     - { port: '{{master_port}}' }
     - { port: '{{slave_port}}' }
   become: true

#  - name: cluster
#    shell: 'echo yes |redis-cli --cluster create 192.168.209.131:6379 192.168.209.132:6379 192.168.209.133:6379 192.168.209.131:6380 192.168.209.132:6380 192.168.209.133:6380 --cluster-replicas 1 -a {{redis_password}}'
#    run_once: true

这里要注意只进行到Start redis进程是因为我的测试环境最后一步创建集群,有时候报错没有redis-cli这个命令,但是其实命令是存在的,但是用ansible无法使用(待解决的bug),所以我这里选择ssh到第一台主机上然后直接运行命令。

echo yes |redis-cli --cluster create 192.168.209.131:6379 192.168.209.132:6379 192.168.209.133:6379 192.168.209.131:6380 192.168.209.132:6380 192.168.209.133:6380 --cluster-replicas 1 -a <你的redis密码>
#根据你的主机ip来配置这里的主从节点,主从两两对应

注意这条命令只需要在第一台主机上运行一次即可!所以结尾的run_once: true 即是表达此含义。

集群创建命令运行结果(出现以下表示你成功了):
创建集群会自动配置插槽,所以redis-cli创建集群是非常方便的!

注意看你的自己的M和S是否对应nodes id

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 192.168.209.134:6380 to 192.168.209.131:6379
Adding replica 192.168.209.135:6380 to 192.168.209.134:6379
Adding replica 192.168.209.131:6380 to 192.168.209.135:6379
M: 09fa7c549e1122899b107c8e342c74ad50e08659 192.168.209.131:6379
   slots:[0-5460] (5461 slots) master
M: a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1 192.168.209.134:6379
   slots:[5461-10922] (5462 slots) master
M: 9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b 192.168.209.135:6379
   slots:[10923-16383] (5461 slots) master
S: 92d97912de391f46616e84985a35d9333479fc2d 192.168.209.131:6380
   replicates 9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b
S: 35df17bfba95d768eb396e06de37c06b3f420dc8 192.168.209.134:6380
   replicates 09fa7c549e1122899b107c8e342c74ad50e08659
S: 925fd61585cf7b022bea75e513999d206f6a4dbf 192.168.209.135:6380
   replicates a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1
Can I set the above configuration? (type 'yes' to accept): >>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join

>>> Performing Cluster Check (using node 192.168.209.131:6379)
M: 09fa7c549e1122899b107c8e342c74ad50e08659 192.168.209.131:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 925fd61585cf7b022bea75e513999d206f6a4dbf 192.168.209.135:6380
   slots: (0 slots) slave
   replicates a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1
S: 92d97912de391f46616e84985a35d9333479fc2d 192.168.209.131:6380
   slots: (0 slots) slave
   replicates 9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b
S: 35df17bfba95d768eb396e06de37c06b3f420dc8 192.168.209.134:6380
   slots: (0 slots) slave
   replicates 09fa7c549e1122899b107c8e342c74ad50e08659
M: a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1 192.168.209.134:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: 9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b 192.168.209.135:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

验证

查看节点情况

redis-cli -a <你的redis密码>

127.0.0.1:6379> cluser nodes

127.0.0.1:6379> cluser nodes
(error) ERR unknown command `cluser`, with args beginning with: `nodes`, 
127.0.0.1:6379> cluster nodes
925fd61585cf7b022bea75e513999d206f6a4dbf 192.168.209.135:6380@16380 slave a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1 0 1678960784516 2 connected
92d97912de391f46616e84985a35d9333479fc2d 192.168.209.131:6380@16380 slave 9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b 0 1678960783000 3 connected
35df17bfba95d768eb396e06de37c06b3f420dc8 192.168.209.134:6380@16380 slave 09fa7c549e1122899b107c8e342c74ad50e08659 0 1678960783000 1 connected
a62e65275aa1a932b8c4fe9490c0bf07b8bf92a1 192.168.209.134:6379@16379 master - 0 1678960784000 2 connected 5461-10922
9f63a6c4c9bb0a8f7b02efc933f652163daa7c3b 192.168.209.135:6379@16379 master - 0 1678960784010 3 connected 10923-16383
09fa7c549e1122899b107c8e342c74ad50e08659 192.168.209.131:6379@16379 myself,master - 0 1678960783000 1 connected 0-5460
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.209.134,port=6380,state=online,offset=532,lag=1
master_failover_state:no-failover
master_replid:4b4185b5ac0cc3a002bbce9d420dc37adb5fed7d
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:532
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:532

补充

tasks执行过程中的报错请根据情况修改,这里着重讲一下可能出现的一种情况。

如果出现创建集群那里一直卡在这个位置:

>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join

这个问题是由于端口没有放开导致的,每个主机都需开放主从四个端口,例如:

  • 6379/tcp
  • 6380/tcp
  • 16379/tcp
  • 16380/tcp

这里"10000+主端口"是二进制协议用来节点之间互相通信的,所以如果未开放MEET messages就无法传输,节点之间就无法通信感知到,所以会一直卡住,这里需要检查tasks中Open redis ports ,这个地方我写了之后没有经过验证,都知道ansible的语法格式很严格,写错了就可能没应用上或者直接报错,可以注释掉或者直接用files下的firewalld.sh 脚本开放端口(推荐),注意修改成你的端口!!!