Redis: “psync scheduled to be closed ASAP for overcoming of output buffer limits” and the client-output-buffer-limit

By | 02/26/2020
 

We have a Redis-cluster with Master-slave replication and Sentinel, see the Redis: replication, part 2 – Master-Slave replication, and Redis Sentinel, Redis: fork – Cannot allocate memory, Linux, virtual memory and vm.overcommit_memory, and Redis: main configuration parameters and performance tuning overview posts.

The system worked great until we started using it much more actively.

Redis info

A few useful troubleshooting commands.

To check all keys in all database use the INFO command:

[simterm]

admin@bttrm-production-console:~$ redis-cli -h 10.0.3.105 -p 6389 INFO keyspace
# Keyspace
db0:keys=5,expires=0,avg_ttl=0
db4:keys=16988,expires=2698,avg_ttl=30025662
db5:keys=138713,expires=90388,avg_ttl=7435947

[/simterm]

And keys size – with the bigkeys command:

[simterm]

admin@bttrm-production-console:~$ redis-cli -h 10.0.3.105 -p 6389 -n 5 --bigkeys
...
[00.00%] Biggest string found so far 'projectname_aba3b5bf1af24246b465826658b3c375' with 45 bytes
...
[38.56%] Biggest string found so far 'projectname_44bb0e2dd7f4fabf41ee09001791bdc4' with 4627517 bytes

-------- summary -------

Sampled 150726 keys in the keyspace!
Total key length in bytes is 6782670 (avg len 45.00)

Biggest string found 'projectname_44bb0e2dd7f4fabf41ee09001791bdc4' has 4627517 bytes

[/simterm]

The usfull option --stat:

[simterm]

root@bttrm-production-console:/home/admin# redis-cli -h 10.0.3.105 -p 6389 --stat -i 2
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections          
10290781   3.80G    12      0       37279511 (+0)       79301       
...    
10292946   3.81G    10      0       37283233 (+701)     79502       
...

[/simterm]

And a monitor:

[simterm]

root@bttrm-production-console:/home/admin# redis-cli -h 10.0.3.105 -p 6389 monitor
...
1582037638.965389 [5 10.0.3.92:58594] "SET" "projectname_e7e0a3ed5251cb36e2925957981e9faa" "a:2:{***}
1582037638.967345 [5 10.0.3.71:40906] "SET" "projectname_80d9171937e4463d0238bcc42a54aead" "a:2:{***}
1582037638.967411 [5 10.0.3.71:40684] "SET" "projectname_e8b81cd503d87be74070aa012b22be4a" "a:2:{***}
...

[/simterm]

And slowlog:

[simterm]

10.0.3.105:6389> SLOWLOG get 2
1) 1) (integer) 9
   2) (integer) 1582119750
   3) (integer) 21507206
   4) 1) "FLUSHDB"
2) 1) (integer) 8
   2) (integer) 1582058687
   3) (integer) 71653
   4) 1) "PSYNC"
      2) "?"
      3) "-1"

[/simterm]

Also, check the latency.

Now, let’s go to the error.

The “psync scheduled to be closed ASAP for overcoming of output buffer limits” error

On one calm evening, our monitoring sends an alert that the cluster is down.

Redis master log during that issue was:

[simterm]

2244:C 17 Feb 19:48:12.464 * DB saved on disk
2244:C 17 Feb 19:48:12.527 * RDB: 370 MB of memory used by copy-on-write
1911:M 17 Feb 19:48:12.681 * Background saving terminated with success
1911:M 17 Feb 19:48:12.681 * Starting BGSAVE for SYNC with target: disk
1911:M 17 Feb 19:48:12.744 * Background saving started by pid 2397
1911:M 17 Feb 19:48:13.251 * Slave 10.0.3.71:6389 asks for synchronization
1911:M 17 Feb 19:48:13.251 * Full resync requested by slave 10.0.3.71:6389
1911:M 17 Feb 19:48:13.251 * Waiting for end of BGSAVE for SYNC
1911:M 17 Feb 19:48:15.743 # Client id=3144 addr=10.0.3.92:40313 fd=104 name= age=31 idle=31 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=11284 oll=1203 omem=269714472 events=r cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
1911:M 17 Feb 19:48:15.743 # Client id=4300 addr=10.0.3.71:40029 fd=110 name= age=2 idle=2 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=11284 oll=1203 omem=269714472 events=r cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
1911:M 17 Feb 19:48:15.754 # Connection with slave 10.0.3.92:6389 lost.
1911:M 17 Feb 19:48:15.756 # Connection with slave 10.0.3.71:6389 lost.
1911:M 17 Feb 19:48:16.251 * Slave 10.0.3.71:6389 asks for synchronization
1911:M 17 Feb 19:48:16.251 * Full resync requested by slave 10.0.3.71:6389
1911:M 17 Feb 19:48:16.251 * Can't attach the slave to the current BGSAVE. Waiting for next BGSAVE for SYNC

[/simterm]

From all the text about we are most interested in the  “psync scheduled to be closed ASAP for overcoming of output buffer limits” message.

A “quick-fix” – client-output-buffer-limit

The solution is to increase the client-output-buffer-limit.

In short, it is a buffer used by the Redis Master to store a data that was changed during synchronization with its slave.

Check it current value:

[simterm]

root@bttrm-production-console:/home/admin# redis-cli -h 10.0.3.105 -p 6389
10.0.3.105:6389> config get client-output-buffer-limit
1) "client-output-buffer-limit"
2) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"

[/simterm]

The format here:

client-output-buffer-limit <class> <hard limit> <soft limit> <soft seconds>

And default values:

  • client-output-buffer-limit normal 0 0 0
  • client-output-buffer-limit slave 256mb 64mb 60
  • client-output-buffer-limit pubsub 32mb 8mb 60

With such settings, a connection to a slave will be closed immediately if the <hard limit> – 256 MB – reached, or if the <soft limit> – 64 MB – exceeded over time from the  <soft seconds> – 60 seconds.

For us – we need to increase values in the slave 268435456 67108864 60.

On the Master host set a new value, 512 MB:

[simterm]

10.0.3.105:6389> config set client-output-buffer-limit "slave 4194990176 4194990176 0"
OK
10.0.3.105:6389> config get client-output-buffer-limit 
1) "client-output-buffer-limit"
2) "normal 0 0 0 slave 4194990176 4194990176 0 pubsub 33554432 8388608 60"

[/simterm]

Actually, I had to increase the buffer up to the 512 MB, then 1 GB, and 2… and 4, 8 – and only with the 10 BG the service was able to finish the synchronization prcoess., although data in memory was only about 5 GB:

[simterm]

1911:M 17 Feb 20:30:21.308 * Slave 10.0.3.71:6389 asks for synchronization
1911:M 17 Feb 20:30:21.308 * Full resync requested by slave 10.0.3.71:6389
1911:M 17 Feb 20:30:21.308 * Starting BGSAVE for SYNC with target: disk
1911:M 17 Feb 20:30:21.389 * Background saving started by pid 8166
1911:M 17 Feb 20:30:21.851 * Slave 10.0.3.92:6389 asks for synchronization
1911:M 17 Feb 20:30:21.851 * Full resync requested by slave 10.0.3.92:6389
1911:M 17 Feb 20:30:21.851 * Waiting for end of BGSAVE for SYNC
8166:C 17 Feb 20:30:55.069 * DB saved on disk
8166:C 17 Feb 20:30:55.148 * RDB: 851 MB of memory used by copy-on-write
1911:M 17 Feb 20:30:55.365 * Background saving terminated with success
1911:M 17 Feb 20:31:05.160 * Synchronization with slave 10.0.3.92:6389 succeeded
1911:M 17 Feb 20:31:05.223 * Synchronization with slave 10.0.3.71:6389 succeeded

[/simterm]

And a slave’s log:

[simterm]

11347:S 17 Feb 20:29:06.523 * Connecting to MASTER 10.0.3.105:6389
11347:S 17 Feb 20:29:06.524 * MASTER <-> SLAVE sync started
11347:S 17 Feb 20:29:06.526 * Non blocking connect for SYNC fired the event.
11347:S 17 Feb 20:29:06.530 * Master replied to PING, replication can continue...
11347:S 17 Feb 20:29:06.532 * Partial resynchronization not possible (no cached master)
11347:S 17 Feb 20:29:06.612 * Full resync from master: e96dc8dcea06375e45f9e0796f796cb642b2a94a:243379789779
11347:S 17 Feb 20:29:36.120 * MASTER <-> SLAVE sync: receiving 1457381394 bytes from master
11347:S 17 Feb 20:29:43.069 # I/O error trying to sync with MASTER: connection lost
11347:S 17 Feb 20:29:43.191 * Connecting to MASTER 10.0.3.105:6389
11347:S 17 Feb 20:29:43.191 * MASTER <-> SLAVE sync started
11347:S 17 Feb 20:29:43.198 * Non blocking connect for SYNC fired the event.
11347:S 17 Feb 20:29:43.446 * Master replied to PING, replication can continue...
11347:S 17 Feb 20:29:43.450 * Partial resynchronization not possible (no cached master)
11347:S 17 Feb 20:29:43.532 * Full resync from master: e96dc8dcea06375e45f9e0796f796cb642b2a94a:247290941137
11347:S 17 Feb 20:30:17.763 * MASTER <-> SLAVE sync: receiving 1458314307 bytes from master
11347:S 17 Feb 20:30:20.516 # I/O error trying to sync with MASTER: connection lost
11347:S 17 Feb 20:30:21.303 * Connecting to MASTER 10.0.3.105:6389
11347:S 17 Feb 20:30:21.303 * MASTER <-> SLAVE sync started
11347:S 17 Feb 20:30:21.304 * Non blocking connect for SYNC fired the event.
11347:S 17 Feb 20:30:21.305 * Master replied to PING, replication can continue...
11347:S 17 Feb 20:30:21.307 * Partial resynchronization not possible (no cached master)
11347:S 17 Feb 20:30:21.389 * Full resync from master: e96dc8dcea06375e45f9e0796f796cb642b2a94a:251282993972
11347:S 17 Feb 20:30:55.365 * MASTER <-> SLAVE sync: receiving 1459112192 bytes from master
11347:S 17 Feb 20:31:05.299 * MASTER <-> SLAVE sync: Flushing old data
11347:S 17 Feb 20:31:20.802 * MASTER <-> SLAVE sync: Loading DB in memory
11347:S 17 Feb 20:31:41.073 * MASTER <-> SLAVE sync: Finished with success

[/simterm]

And now, when Production went back to live, let’s go deeper into the issue.

What’s going on?

Redis master-slave – the synchronization process explained

Let’s see steps on how the synchronization process is performed:

  1. a Slave on a start or after disconnect goes to a Master and ask to send him Master’s database
  2. a Master replies to this request and:
    1. creates a child process to create its database dump onto a filesystem as a  dump.rdb file (see the fork() vs fork() vs clone())
    2. during this time Master will proceed its work with currently connected clients
    3. and all the data that was changed in its dataset during the dump creation will be saved to the replication buffer
  3. Master sends a notification that the dump is ready to the Slave and the Slave starts copying the dump over the network to save it on a disk of the Slave’s host
  4. the Slave finishes dump copy, loads it to its Redis instance memory and send a notification to the Master that the Slave is ready to serve clients
  5. Master in its turn checks its replication buffer and if there are any data – Master starts sending the changed data from the buffer to the Slave, so the Slave can replace the data in its memory with new, actualized, data from the Master
  6. the Slave applies those changes and starts its work

Thus, the replication process consists of two parts:

  1. at first, a full database copy is created, which may contain some outdated data (was changed during a dump creation and transferring)
  2. after that, a consistent update from a buffer is applied to the data on a Slave to make it the same as it is in a Master ‘s database

A database and buffer size

The first issue is that sometimes Redis needs in x3 memory comparing to its database size to create a dump, and the second – and the main issue – is that fact that while a dump will be created, and then transferred to a slave with all the network’s delays and will be saved to a Slave’s hard disk – the data on a Master can be changed.

So, during the dump transition – all such changes on a Master Redis instance will be stored in the Replication buffer. But once it will be exhausted – it will be purged and the synchronization process will start over. And over. And again.

In theory, the buffer’s size must be not less than a database’s size, although there is really little chance that all the data in the database will be updated during sync.

In any case, it’s all really rough estimations and you’ll have to test and monitor your real usage to get the real value for the buffer size.

But here is a couple of steps that can help you in such an investigation.

Check the current size of a database in a Master’s memory:

[simterm]

root@bttrm-production-console:/home/admin# redis-cli -h 10.0.3.105 -p 6389 info memory
# Memory
used_memory:871106648
used_memory_human:830.75M
...

[/simterm]

Create its dump:

[simterm]

root@bttrm-production-console:/home/admin# redis-cli -h 10.0.3.105 -p 6389 bgsave
Background saving started

[/simterm]

Check the RDB (a dump’s) file size:

[simterm]

root@bttrm-production-console:/home/admin# ls -lh /var/lib/redis-cluster/dump.rdb 
-rw-rw---- 1 redis redis 345M Feb 20 15:04 /var/lib/redis-cluster/dump.rdb

[/simterm]

So, in memory we have 830 MB database, but in the configuration, we have memory limit up to the 50% of the hots’s memory:

[simterm]

root@bttrm-production-console:/home/admin# cat /etc/redis-cluster/redis-cluster.conf  | grep maxmem
maxmemory 50%
maxmemory-policy allkeys-lru

[/simterm]

So, the database can grow up to the ~16 GB (the server has 32 GB mem in total), and after a dump will be created – its size will be about 7-8 GB.

Now, we need to get a rough estimation time of the network transfer process, and knowing the maximal file size to be transferred – let’s calculate the time needed to finish it.

To check the connection speed – the iperf tool can be used, install it on a Slave and Master:

[simterm]

root@bttrm-production-console:/home/admin# apt -y install iperf3

[/simterm]

On the Master host – run  iperf in the server’s mode:

[simterm]

root@bttrm-production-console:/home/admin# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
...

[/simterm]

Позже наткнулся на redis-cli --latency -h <Redis IP> -p <Redis port>

And run it from the Slave as a client:

[simterm]

root@bttrm-production-app-1:/home/admin# iperf3 -c 10.0.3.105
Connecting to host 10.0.3.105, port 5201
[  4] local 10.0.3.71 port 47046 connected to 10.0.3.105 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   585 MBytes  4.90 Gbits/sec    0   1.37 MBytes       
...     
[  4]   9.00-10.00  sec   592 MBytes  4.97 Gbits/sec    0   2.06 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  5.78 GBytes  4.96 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  5.78 GBytes  4.96 Gbits/sec                  receiver

iperf Done.

[/simterm]

4.90 Gbits/sec – 625 MBytes/sec.

Knowing the fact that the dump’s size will be about 7 GB we can use the calculator here>>> – 13 seconds to transfer it.

Now, using the --stat we can check how many keys are updated on the Master in one second:

[simterm]

root@bttrm-production-app-1:/home/admin# redis-cli -h 10.0.3.105 -p 6389 --stat 
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections          
530178     765.42M  9       0       91699673 (+0)       2856563     
530407     766.04M  16      0       91700194 (+521)     2856580
...

[/simterm]

530178 – 530407 == 229 keys per second.

Or you can use monitoring tools like Prometheus and redis-exporter, and then select all the SET-requests as this are the main update’s requests:

irate(redis_commands_total{instance="production.bttrm-production-console",job="redis-exporter-cluster",cmd="set"}[1m])

And find keys size:

[simterm]

root@bttrm-production-app-1:/home/admin# redis-cli -h 10.0.3.105 -p 6389 --bigkeys
...
[00.00%] Biggest string found so far 'projectname_8caf094272eac1c6a51213c1c878767b' with 45 bytes
...
[00.00%] Biggest string found so far 'projectname_44bb0e2dd7f4fabf41ee09001791bdc4' with 4626843 bytes
...
Biggest string found 'projectname_44bb0e2dd7f4fabf41ee09001791bdc4' has 4626843 bytes

[/simterm]

229 by the ~4.5 MB == around 1 GB changed data will be added to the buffer during those 13 seconds for the transfer.

But, again – this is very rough calculations as we are counting operations during a normal workload, while sometimes it can be in a few times bigger.

Useful links