Besides the Apache Bench and JMeter there is another utility – Yandex Tank.
It’s used by our QA team and now it’s time for me to take a closer look on it to test one issue with our application running on a Kubernetes cluster.
In this post a short overview of its capabilities and configuration.
In contrast to the Apache Bench, Yandex.Tank displays response codes statistics and is much more simple in running and configuration the JMeter, plus it has a nice Autostop feature for a case when “Huston, we have a problem” (с)
Contents
Components
See Modules.
The Yandex Tank core is written in Python.
For load testing, it has few modules – Load generators, by default it uses the Phantom written on С++, so it’s really fast.
The Telegraf tool is a monitoring module that can connect to a testing host via SSH to run its own agent to collect metrics about CPU/mem/etc which will be displayed in Yandex.Tank during load test in real-time.
The Overloader is a module to upload results to the Yandex Overloader or to an InfluxDB, but we will not use it here. Still, see the Artifact uploaders.
Also, in the examples below I’ll not cover the “ammo” topic to create more complicated tests with POST, etc requests, as for me now will be enough simple GET requests. But you can find its documentation in the Preparing requests.
Running Yandex.Tank with Docker
Create a minimal config for Phantom:
phantom: address: rtfm.co.ua:443 header_http: "1.1" headers: - "[Host: rtfm.co.ua]" uris: - / load_profile: load_type: rps schedule: const(1,30s) ssl: true console: enabled: true telegraf: enabled: false
Here:
phantom
:address
: an address and port to the targetheader_http
: HTTP version used for requests, set it to the 1.1 to use persistent connections (see HTTP persistent connection)headers
: a set of headers to be passed to the target serveruris
: list of URIs to make calls toload_profile
:load_type
: can be set torps
orinstances
:rps
: requests per second – set desirable requests per second to be issued to a testing hostinstances
: or set desirable active treads number, which will perform as much RPS as they can, see the Dynamic thread limit
schedule
: can be set toconst
,line
orstep
(or all together) – defines load test profile, see the Tutorials:const
: is set as (load,dur), were load – RPS number, dur – load test duration, in the example above Yandex.Tank will run one request per second for 30 secondsline
: is set as (a,b,dur), where a – start number for RPS, b – final number, dur – load test duration, so RPS will be increased linearly from the a to the b valuesstep
: is set as (a,b,step,dur), where a – start number for RPS, b – final number, step – how much requests will be added on each step after dur seconds
ssl
: enable SSL support for HTTPS requests (add 443 port to theaddress
)
console
: display results to the consoletelegraf
: monitoring agent configuration, will be covered in the Monitoring (Telegraf)
Run Yandex.Tank with Docker:
[simterm]
$ docker run --rm -v $(pwd):/var/loadtest -it direvius/yandex-tank
[/simterm]
And results:
Monitoring (Telegraf)
By using the Telegraf Yandex.Tank can connect via SHS to the testing host to grab resources metrics on it.
Enable it in the load.yaml
file:
... telegraf: enabled: true package: yandextank.plugins.Telegraf
Metrics to be collected are described in a dedicated file, create it as monitoring.xml
, see more at Configuration file format:
<Monitoring> <Host address="rtfm.co.ua" interval="1" username="root"> <CPU /> <Kernel /> <Net /> <System /> <Memory /> <Disk /> <Netstat/> </Host> </Monitoring>
Here in the address
set the testing target to collect metrics from, interval
– how often get the metrics, username
– the SSH user to be used during connection by the Telegraf module.
This user in the target host must have an SSH key’s public part to be added to the ~/.ssh/authorized_keys.
The private part of this key will be mounted to the Yandex.Tank Docker container as /root/.ssh/id_rsa
, as all process in the container are running under the root
user:
[simterm]
$ docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank
[/simterm]
Paramiko: SSHException: not a valid RSA private key file
On the first run Telegraf failed with the Paramiko error:
[simterm]
16:32:54 [ERROR] Failed to install monitoring agent to rtfm.co.ua Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/yandextank/plugins/Telegraf/client.py", line 209, in install out, errors, err_code = self.ssh.execute(cmd) File "/usr/local/lib/python2.7/dist-packages/yandextank/common/util.py", line 72, in execute with self.connect() as client: File "/usr/local/lib/python2.7/dist-packages/yandextank/common/util.py", line 42, in connect timeout=self.timeout, ) File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 437, in connect passphrase, File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 749, in _auth raise saved_exception SSHException: not a valid RSA private key file
[/simterm]
It’s because the RSA key on DigitalOcean is issued in the PEM/OpenSSH format:
[simterm]
$ file /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11 /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11: OpenSSH private key
[/simterm]
Convert it to the RSA:
[simterm]
$ ssh-keygen -p -m PEM -f /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11
[/simterm]
And check again:
[simterm]
$ file /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11 /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11: PEM RSA private key
[/simterm]
Run test again and Telegraf will print its configuration and metrics to be used:
[simterm]
$ docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank ... 16:36:38 [INFO] Detected monitoring configuration: telegraf 16:36:38 [INFO] Preparing test... 16:36:38 [INFO] Telegraf Result config {'username': 'root', 'comment': '', 'telegraf': '/usr/bin/telegraf', 'python': '/usr/bin/env python2', 'host_config': {'Kernel': {'fielddrop': '["boot_time"]', 'name': '[inputs.kernel]'}, 'Netstat': {'name': '[inputs.netstat]'}, 'System': {'fielddrop': '["n_users", "n_cpus", "uptime*"]', 'name': '[inputs.system]'}, 'Memory': {'fielddrop': '["active", "inactive", "total", "used_per*", "avail*"]', 'name': '[inputs.mem]'}, 'Net': {'interfaces': '["eth0","eth1","eth2","eth3","eth4","eth5"]', 'fielddrop': '["icmp*", "ip*", "udplite*", "tcp*", "udp*", "drop*", "err*"]', 'name': '[inputs.net]'}, 'Disk': {'name': '[inputs.diskio]', 'devices': '["vda0","sda0","vda1","sda1","vda2","sda2","vda3","sda3","vda4","sda4","vda5","sda5"]'}, 'CPU': {'fielddrop': '["time_*", "usage_guest_nice"]', 'name': '[inputs.cpu]', 'percpu': 'false'}}, 'startup': [], 'host': 'rtfm.co.ua', 'telegrafraw': [], 'shutdown': [], 'port': 22, 'interval': '1', 'custom': [], 'source': []} 16:36:38 [INFO] Installing monitoring agent at [email protected]... 16:36:38 [INFO] Creating temp dir on rtfm.co.ua 16:36:38 [INFO] Execute on rtfm.co.ua: /usr/bin/env python2 -c "import tempfile; print tempfile.mkdtemp();" ...
[/simterm]
After this, the load test will be started and on the right side you’ll see the resources used on the target server:
And the agent running on the server:
[simterm]
root@rtfm-do-production-d10:~# ps aux | grep tele root 4580 0.5 0.4 309992 9436 pts/1 Ssl+ 15:38 0:00 python2 /tmp/tmpZez6yJ/agent.py --telegraf /tmp/telegraf --host rtfm.co.ua root 4582 7.1 1.5 851256 31896 ? Ssl 15:38 0:01 /tmp/telegraf -config /tmp/tmpZez6yJ/agent.cfg
[/simterm]
Autostop
The Autostop module is used to terminate tests if something went wrong.
For example, you can configure it to stop the tests if 5[[ response rate will be higher than 10%, or if the response time will be greater than a specified value.
Add the following to check it:
... autostop: autostop: - http(2xx,100%,1s)
Here for example tests will be stopped once will get the 2xx response over 1 second.
Run the tests, and:
[simterm]
$ docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank ... 16:56:24 [INFO] Monitoring received first data. 16:56:24 [WARNING] Autostop criterion requested test stop: http(2xx,100%,1s) 16:56:24 [WARNING] Autostop criterion requested test stop: 2xx codes count higher than 100.0% for 1s, since 1612889780 16:56:24 [INFO] Finishing test... 16:56:24 [INFO] Stopping load generator and aggregator ...
[/simterm]
It was immediately stopped.
See more option in the documenation>>>.
The whole load.yaml
now is:
phantom: address: rtfm.co.ua:443 header_http: "1.1" headers: - "[Host: rtfm.co.ua]" uris: - / load_profile: load_type: rps schedule: const(1,30s) ssl: true console: enabled: true telegraf: enabled: true package: yandextank.plugins.Telegraf config: monitoring.xml autostop: autostop: - http(2xx,100%,1s)
Useful links
All in Russian, unfortunately.
- Автоматизация нагрузочного тестирования при помощи инструмента Яндекс.Танк
- Нагрузочное тестирование c Yandex.Tank и JMeter
- Нагрузочное тестирование http-сервера (nginx), 205k+ RPS
- Тестирование в Яндексе: строим свой Лунапарк
- Пример нагрузочного тестирования сайта с Yandex.Tank