The next post from the Redis replication series.
Previous parts:
Redis: replication, part 2 – Master-Slave replication, and Redis Sentinel
Redis: replication, part 3 – redis-py and work with Redis Sentinel from Python
The task now is to write an Ansible role for automated Redis replication cluster provisioning and configuration.
This role has to install and run a Redis Master node and two its Redis slaves, plus – Redis Sentinel instances which have to monitor Redis replicas and run a failover operation to create a new Master if the current one will go offline.
The task is a bit more complex because we have Redis already running on our environments and the new cluster must work some time simultaneously with this existing one until backend-develops will update all our projects to use replication and Sentinel.
To achieve this – new Redis nodes will use port 6389 (while the standard 6379 will be used by the currently existing Redis nodes), also will have to create own systemd
‘s unit-files to manage new Redis and Sentinels.
The overall new scheme will be absolutely common:
I.e. will have three servers here:
- Console: or central host where we have some administrative tasks running. Also, Redis Master node and the first Sentinel instance will be placed here
- App-1 and App-2: two of our applications hosts where will have two Redis Slaves and two Sentinels
Ansible role
Create directories for the new role:
$ mkdir roles/redis-cluster/{tasks,templates}
And its execution to the playbook:
... - role: redis-cluster tags: common, app, redis-cluster when: "'backend-bastion' not in inventory_hostname" ...
Create variables to be used in this role:
... ### ROLES VARS ### # redis-cluster redis_cluster_config_home: "/etc/redis-cluster" redis_cluster_logs_home: "/var/log/redis-cluster" redis_cluster_data_home: "/var/lib/redis-cluster" redis_cluster_runtime_home: "/var/run/redis-cluster" redis_cluster_node_port: 6389 redis_cluster_master_host: "" redis_cluster_name: "redis-{{ env }}-cluster" redis_cluster_sentinel_port: 26389 ...
Create tasks file roles/redis-cluster/tasks/main.yml
Will start writing our role from the Redis Master installation and start.
Catalogs and files must be owned by the redis
For the Redis Master will use when: "'backend-console' in inventory_hostname"
condition – our hostnames are for Console aka Master host, and withи – for Redis slaves.
Describe tasks:
- name: "Install Redis" apt: name: "redis-server" state: present - name: "Create {{ redis_cluster_config_home }}" file: path: "{{ redis_cluster_config_home }}" state: directory owner: "redis" group: "redis" - name: "Create {{ redis_cluster_logs_home }}" file: path: "{{ redis_cluster_logs_home }}" state: directory owner: "redis" group: "redis" - name: "Create {{ redis_cluster_data_home }}" file: path: "{{ redis_cluster_data_home }}" state: directory owner: "redis" group: "redis" - name: "Copy redis-cluster-master.conf to {{ redis_cluster_config_home }}" template: src: "templates/redis-cluster-master.conf.j2" dest: "{{ redis_cluster_config_home }}/redis-cluster.conf" owner: "redis" group: "redis" mode: 0644 when: "'backend-console' in inventory_hostname" - name: "Copy Redis replication cluster systemd unit file" template: src: "templates/redis-cluster-replica-systemd.j2" dest: "/etc/systemd/system/redis-cluster.service" owner: "root" group: "root" mode: 0644 - name: "Redis relication cluster restart" systemd: name: "redis-cluster" state: restarted enabled: yes daemon_reload: yes
Create files templates
Will start from the systemd
. As our new Redis-cluster has to work alongside already existing Redis nodes and use non-standard ports and directories – we can not use the default Redi’s systemd
unit file.
So copy it and update for our needs.
Create a roles/redis-cluster/templates/redis-cluster-replica-systemd.j2
template file:
[Unit] Description=Redis relication cluster node [Service] Type=forking ExecStart=/usr/bin/redis-server {{ redis_cluster_config_home }}/redis-cluster.conf PIDFile={{ redis_cluster_runtime_home }}/ TimeoutStopSec=0 Restart=always User=redis Group=redis RuntimeDirectory=redis-cluster ExecStop=/bin/kill -s TERM $MAINPID UMask=007 PrivateTmp=yes LimitNOFILE=65535 PrivateDevices=yes ProtectHome=yes ReadOnlyDirectories=/ ReadWriteDirectories=-{{ redis_cluster_data_home }} ReadWriteDirectories=-{{ redis_cluster_logs_home }} ReadWriteDirectories=-{{ redis_cluster_runtime_home }} CapabilityBoundingSet=~CAP_SYS_PTRACE ProtectSystem=true ReadWriteDirectories=-{{ redis_cluster_config_home }} [Install]
In the ExecStart=/usr/bin/redis-server {{ redis_cluster_config_home }}/redis-cluster-master.conf
parameter our own Redis config file will be passed.
Redis Master
Create a Redis Master config file template roles/redis-cluster/templates/redis-cluster-master.conf.j2
bind protected-mode yes port {{ redis_cluster_node_port }} tcp-backlog 511 timeout 0 tcp-keepalive 300 daemonize yes supervised no pidfile {{ redis_cluster_runtime_home }}/ loglevel notice logfile {{ redis_cluster_logs_home }}/redis-cluster.log databases 16 stop-writes-on-bgsave-error yes rdbcompression yes rdbchecksum yes dbfilename dump.rdb dir {{ redis_cluster_data_home }} slave-serve-stale-data yes slave-read-only yes repl-diskless-sync no repl-diskless-sync-delay 5 repl-disable-tcp-nodelay no slave-priority 100 appendonly yes appendfilename "appendonly.aof" appendfsync everysec no-appendfsync-on-rewrite no auto-aof-rewrite-percentage 100 auto-aof-rewrite-min-size 64mb aof-load-truncated yes lua-time-limit 5000 slowlog-log-slower-than 10000 slowlog-max-len 128 latency-monitor-threshold 0 notify-keyspace-events "" hash-max-ziplist-entries 512 hash-max-ziplist-value 64 list-max-ziplist-size -2 list-compress-depth 0 set-max-intset-entries 512 zset-max-ziplist-entries 128 zset-max-ziplist-value 64 hll-sparse-max-bytes 3000 activerehashing yes client-output-buffer-limit normal 0 0 0 client-output-buffer-limit slave 256mb 64mb 60 client-output-buffer-limit pubsub 32mb 8mb 60 hz 10 aof-rewrite-incremental-fsync yes
Later will have to update it and set more appropriate parameters but for now, can leave it with the defaults – just update bind
and port
Deploy it using
. In future it will be deployed via a Jenkins job:
$ ./ -t redis-cluster Tags: redis-cluster Env: mobilebackend-dev ...
Check Redis Master status:
root@bttrm-dev-console:/home/admin# systemctl status redis-cluster.service ● redis-cluster.service - Redis relication cluster node Loaded: loaded (/etc/systemd/system/redis-cluster.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2019-04-03 14:05:46 EEST; 9s ago Process: 22125 ExecStop=/bin/kill -s TERM $MAINPID (code=exited, status=0/SUCCESS) Process: 22131 ExecStart=/usr/bin/redis-server /etc/redis-cluster/redis-cluster-master.conf (code=exited, status=0/SUCCESS) Main PID: 22133 (redis-server) Tasks: 3 (limit: 4915) Memory: 1.1M CPU: 14ms CGroup: /system.slice/redis-cluster.service └─22133 /usr/bin/redis-server Apr 03 14:05:46 bttrm-dev-console systemd[1]: Starting Redis relication cluster node... Apr 03 14:05:46 bttrm-dev-console systemd[1]: redis-cluster.service: PID file /var/run/redis/ not readable (yet?) after start: No such file or directory Apr 03 14:05:46 bttrm-dev-console systemd[1]: Started Redis relication cluster node.
Redis Slaves
Add config for Redis Slaves – roles/redis-cluster/templates/redis-cluster-slave.conf.j2
It almost the same as the master’s config just has slaveoff
slaveof {{ redis_cluster_master_host }} {{ redis_cluster_node_port }} bind port {{ redis_cluster_node_port }} pidfile {{ redis_cluster_runtime_home }}/ logfile {{ redis_cluster_logs_home }}/redis-cluster.log dir {{ redis_cluster_data_home }} protected-mode yes tcp-backlog 511 timeout 0 tcp-keepalive 300 ...
Add task.
Here the when: "'backend-console' not in inventory_hostname"
condition used to copy this file to the App-1 and App-2 only:
... - name: "Copy redis-cluster-slave.conf to {{ redis_cluster_config_home }}" template: src: "templates/redis-cluster-slave.conf.j2" dest: "{{ redis_cluster_config_home }}/redis-cluster.conf" owner: "redis" group: "redis" mode: 0644 when: "'backend-console' not in inventory_hostname" ...
Deploy, check:
root@bttrm-dev-app-1:/home/admin# redis-cli -p 6389 -a foobared info replication # Replication role:slave master_port:6389 master_link_status:down master_last_io_seconds_ago:-1 ...
Check replication
Add a key on the Master:
root@bttrm-dev-console:/home/admin# redis-cli -p 6389 -a foobared set test 'test' OK
Get it on slaves:
root@bttrm-dev-app-1:/home/admin# redis-cli -p 6389 -a foobared get test "test" root@bttrm-dev-app-2:/home/admin# redis-cli -p 6389 -a foobared get test "test"
Redis Sentinel
Add a Redis Sentinel’s config, one for all hosts – roles/redis-cluster/templates/redis-cluster-sentinel.conf.j2
Use the sentinel announce-ip
here, see the Redis: Sentinel – bind, the localhost issue and the announce-ip option for details:
sentinel monitor {{ redis_cluster_name }} {{ redis_cluster_master_host }} {{ redis_cluster_node_port }} 2 bind port {{ redis_cluster_sentinel_port }} sentinel announce-ip {{ hostvars[inventory_hostname]['ansible_default_ipv4']['address'] }} sentinel down-after-milliseconds {{ redis_cluster_name }} 6001 sentinel failover-timeout {{ redis_cluster_name }} 60000 sentinel parallel-syncs {{ redis_cluster_name }} 1 daemonize yes logfile {{ redis_cluster_logs_home }}/redis-sentinel.log pidfile {{ redis_cluster_runtime_home }}/
Add a template for the Sentinel’s service systemd
unit file – roles/redis-cluster/templates/redis-cluster-sentinel-systemd.j2
[Unit] Description=Redis relication Sentinel instance [Service] Type=forking ExecStart=/usr/bin/redis-server {{ redis_cluster_config_home }}/redis-sentinel.conf --sentinel PIDFile={{ redis_cluster_runtime_home }}/ TimeoutStopSec=0 Restart=always User=redis Group=redis ExecStop=/bin/kill -s TERM $MAINPID ProtectSystem=true ReadWriteDirectories=-{{ redis_cluster_logs_home }} ReadWriteDirectories=-{{ redis_cluster_config_home }} ReadWriteDirectories=-{{ redis_cluster_runtime_home }} [Install]
Add the Sentinels stop task at the very beginning of the roles/redis-cluster/tasks/main.yml
, otherwise during deploy if a Sentinel instance will be running – it will overwrite Ansible’s changes in its config:
- name: "Install Redis" apt: name: "redis-server" state: present - name: "Redis replication Sentinel stop" systemd: name: "redis-sentinel" state: stopped ignore_errors: true ...
Add files copy and Sentinel start:
... - name: "Copy redis-cluster-sentinel.conf to {{ redis_cluster_config_home }}" template: src: "templates/redis-cluster-sentinel.conf.j2" dest: "{{ redis_cluster_config_home }}/redis-sentinel.conf" owner: "redis" group: "redis" mode: 0644 ... - name: "Copy Redis replication Sentinel systemd unit file" template: src: "templates/redis-cluster-sentinel-systemd.j2" dest: "/etc/systemd/system/redis-sentinel.service" owner: "root" group: "root" mode: 0644 ... - name: "Redis relication Sentinel restart" systemd: name: "redis-sentinel" state: restarted enabled: yes daemon_reload: yes
The documentation says Sentinels must be started with at least 30 seconds pause – but it works (for now) without it.
Will check during Dev/Stage testing/
Deploy, check:
root@bttrm-dev-console:/home/admin# redis-cli -p 26389 info sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=redis-dev-cluster,status=ok,address=,slaves=2,sentinels=3
Testing Sentinel failover
Run tail -f
for logs on all instances:
root@bttrm-dev-app-1:/etc/redis-cluster# tail -f /var/log/redis-cluster/redis-sentinel.log
On the Master – check current master’s IP:
root@bttrm-dev-console:/etc/redis-cluster# redis-cli -h -p 26389 sentinel get-master-addr-by-name redis-dev-cluster 1) "" 2) "6389"
And replication status:
root@bttrm-dev-console:/etc/redis-cluster# redis-cli -h -p 6389 info replication # Replication role:master connected_slaves:2 ...
Role – Master, two slaves – all good.
Stop the Master’s Redis node:
root@bttrm-dev-console:/etc/redis-cluster# systemctl stop redis-cluster.service
Log on the App-2:
11976:X 09 Apr 13:12:13.869 # +sdown master redis-dev-cluster 6389 11976:X 09 Apr 13:12:13.983 # +new-epoch 1 11976:X 09 Apr 13:12:13.984 # +vote-for-leader 8fd5f2bb50132db0dc528e69089cc2f9d82e01d0 1 11976:X 09 Apr 13:12:14.994 # +odown master redis-dev-cluster 6389 #quorum 2/2 11976:X 09 Apr 13:12:14.994 # Next failover delay: I will not start a failover before Tue Apr 9 13:14:14 2019 11976:X 09 Apr 13:12:15.105 # +config-update-from sentinel 8fd5f2bb50132db0dc528e69089cc2f9d82e01d0 26389 @ redis-dev-cluster 6389 11976:X 09 Apr 13:12:15.105 # +switch-master redis-dev-cluster 6389 6389
sdown master
: Sentinel think the Master is downodown master quorum 2/2
: both Sentinels on the App-1 and App-2 agreedswitch-master ...
– Sentinel reconfigured Redis node on the from the Slave role to the new Master role
All works??
Check on the, it’s App-1:
root@bttrm-dev-app-1:/etc/redis-cluster# redis-cli -p 6389 info replication # Replication role:master connected_slaves:1 ...
Turn Redis Master on the Console/Master host:
root@bttrm-dev-console:/etc/redis-cluster# systemctl start redis-cluster.service
Check App-2 log:
11976:X 09 Apr 13:17:23.954 # -sdown slave 6389 @ redis-dev-cluster 6389 11976:X 09 Apr 13:17:33.880 * +convert-to-slave slave 6389 @ redis-dev-cluster 6389
Check the new Master log:
root@bttrm-dev-console:/etc/redis-cluster# redis-cli -p 6389 info replication # Replication role:slave master_host: master_port:6389 master_link_status:up ...
Old Master became Salve now.
All works.