Yandex.Tank: load testing tool – an overview, configuration, and examples

By | 02/10/2021
 

Besides the Apache Bench and JMeter there is another utility – Yandex Tank.

It’s used by our QA team and now it’s time for me to take a closer look on it to test one issue with our application running on a Kubernetes cluster.

In this post a short overview of its capabilities and configuration.

In contrast to the Apache Bench, Yandex.Tank displays response codes statistics and is much more simple in running and configuration the JMeter, plus it has a nice Autostop feature for a case when “Huston, we have a problem” (с)

Components

See Modules.

The Yandex Tank core is written in Python.

For load testing, it has few modules – Load generators, by default it uses the Phantom written on С++, so it’s really fast.

The Telegraf tool is a monitoring module that can connect to a testing host via SSH to run its own agent to collect metrics about CPU/mem/etc which will be displayed in Yandex.Tank during load test in real-time.

The Overloader is a module to upload results to the Yandex Overloader or to an InfluxDB, but we will not use it here. Still, see the Artifact uploaders.

Also, in the examples below I’ll not cover the “ammo” topic to create more complicated tests with POST, etc requests, as for me now will be enough simple GET requests. But you can find its documentation in the Preparing requests.

Running Yandex.Tank with Docker

Create a minimal config for Phantom:

phantom:
  address: rtfm.co.ua:443
  header_http: "1.1"
  headers:
    - "[Host: rtfm.co.ua]"
  uris:
    - /
  load_profile:
    load_type: rps
    schedule: const(1,30s)
  ssl: true
console:
  enabled: true
telegraf:
  enabled: false

Here:

  • phantom:
    • address: an address and port to the target
    • header_http: HTTP version used for requests, set it to the 1.1 to use persistent connections (see HTTP persistent connection)
    • headers: a set of headers to be passed to the target server
    • uris: list of URIs to make calls to
    • load_profile:
      • load_type: can be set to rps or instances:
        • rps: requests per second – set desirable requests per second to be issued to a testing host
        • instances: or set desirable active treads number, which will perform as much RPS as they can, see the Dynamic thread limit
      • schedule: can be set to const, line or step (or all together) – defines load test profile, see the Tutorials:
        • const: is set as (load,dur), were load – RPS number, dur – load test duration, in the example above Yandex.Tank will run one request per second for 30 seconds
        • line: is set as (a,b,dur), where a – start number for RPS, b – final number, dur – load test duration, so RPS will be increased linearly from the a to the b values
        • step: is set as (a,b,step,dur), where a – start number for RPS, b – final number, step – how much requests will be added on each step after dur seconds
    • ssl: enable SSL support for HTTPS requests (add 443 port to the address)
  • console: display results to the console
  • telegraf: monitoring agent configuration, will be covered in the Monitoring (Telegraf)

Run Yandex.Tank with Docker:

docker run --rm -v $(pwd):/var/loadtest -it direvius/yandex-tank

And results:

Monitoring (Telegraf)

By using the Telegraf Yandex.Tank can connect via SHS to the testing host to grab resources metrics on it.

Enable it in the load.yaml file:

...
telegraf:
  enabled: true
  package: yandextank.plugins.Telegraf

Metrics to be collected are described in a dedicated file, create it as monitoring.xml, see more at  Configuration file format:

<Monitoring>
  <Host address="rtfm.co.ua" interval="1" username="root">
    <CPU />
    <Kernel />
    <Net />
    <System />
    <Memory />
    <Disk />
    <Netstat/>
  </Host>
</Monitoring>

Here in the address set the testing target to collect metrics from, interval – how often get the metrics, username – the SSH user to be used during connection by the Telegraf module.

This user in the target host must have an SSH key’s public part to be added to the ~/.ssh/authorized_keys.

The private part of this key will be mounted to the Yandex.Tank Docker container as /root/.ssh/id_rsa, as all process in the container are running under the root user:

docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank

Paramiko: SSHException: not a valid RSA private key file

On the first run Telegraf failed with the Paramiko error:

16:32:54 [ERROR] Failed to install monitoring agent to rtfm.co.ua
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/yandextank/plugins/Telegraf/client.py", line 209, in install
out, errors, err_code = self.ssh.execute(cmd)
File "/usr/local/lib/python2.7/dist-packages/yandextank/common/util.py", line 72, in execute
with self.connect() as client:
File "/usr/local/lib/python2.7/dist-packages/yandextank/common/util.py", line 42, in connect
timeout=self.timeout, )
File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 437, in connect
passphrase,
File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 749, in _auth
raise saved_exception
SSHException: not a valid RSA private key file

It’s because the RSA key on DigitalOcean is issued in the PEM/OpenSSH format:

file /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11
/home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11: OpenSSH private key

Convert it to the RSA:

ssh-keygen -p -m PEM -f /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11

And check again:

file /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11
/home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11: PEM RSA private key

Run test again and Telegraf will print its configuration and metrics to be used:

docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank
...
16:36:38 [INFO] Detected monitoring configuration: telegraf
16:36:38 [INFO] Preparing test...
16:36:38 [INFO] Telegraf Result config {'username': 'root', 'comment': '', 'telegraf': '/usr/bin/telegraf', 'python': '/usr/bin/env python2', 'host_config': {'Kernel': {'fielddrop': '["boot_time"]', 'name': '[inputs.kernel]'}, 'Netstat': {'name': '[inputs.netstat]'}, 'System': {'fielddrop': '["n_users", "n_cpus", "uptime*"]', 'name': '[inputs.system]'}, 'Memory': {'fielddrop': '["active", "inactive", "total", "used_per*", "avail*"]', 'name': '[inputs.mem]'}, 'Net': {'interfaces': '["eth0","eth1","eth2","eth3","eth4","eth5"]', 'fielddrop': '["icmp*", "ip*", "udplite*", "tcp*", "udp*", "drop*", "err*"]', 'name': '[inputs.net]'}, 'Disk': {'name': '[inputs.diskio]', 'devices': '["vda0","sda0","vda1","sda1","vda2","sda2","vda3","sda3","vda4","sda4","vda5","sda5"]'}, 'CPU': {'fielddrop': '["time_*", "usage_guest_nice"]', 'name': '[inputs.cpu]', 'percpu': 'false'}}, 'startup': [], 'host': 'rtfm.co.ua', 'telegrafraw': [], 'shutdown': [], 'port': 22, 'interval': '1', 'custom': [], 'source': []}
16:36:38 [INFO] Installing monitoring agent at root@rtfm.co.ua...
16:36:38 [INFO] Creating temp dir on rtfm.co.ua
16:36:38 [INFO] Execute on rtfm.co.ua: /usr/bin/env python2 -c "import tempfile; print tempfile.mkdtemp();"
...

After this, the load test will be started and on the right side you’ll see the resources used on the target server:

And the agent running on the server:

root@rtfm-do-production-d10:~# ps aux | grep tele
root      4580  0.5  0.4 309992  9436 pts/1    Ssl+ 15:38   0:00 python2 /tmp/tmpZez6yJ/agent.py --telegraf /tmp/telegraf --host rtfm.co.ua
root      4582  7.1  1.5 851256 31896 ?        Ssl  15:38   0:01 /tmp/telegraf -config /tmp/tmpZez6yJ/agent.cfg

Autostop

The Autostop module is used to terminate tests if something went wrong.

For example, you can configure it to stop the tests if 5[[ response rate will be higher than 10%, or if the response time will be greater than a specified value.

Add the following to check it:

...
autostop:
  autostop:
    - http(2xx,100%,1s)

Here for example tests will be stopped once will get the 2xx response over 1 second.

Run the tests, and:

docker run --rm -v $(pwd):/var/loadtest -v /home/setevoy/.ssh/setevoy-do-nextcloud-production-d10-03-11:/root/.ssh/id_rsa -it direvius/yandex-tank
...
16:56:24 [INFO] Monitoring received first data.
16:56:24 [WARNING] Autostop criterion requested test stop: http(2xx,100%,1s)
16:56:24 [WARNING] Autostop criterion requested test stop: 2xx codes count higher than 100.0% for 1s, since 1612889780
16:56:24 [INFO] Finishing test...
16:56:24 [INFO] Stopping load generator and aggregator
...

It was immediately stopped.

See more option in the documenation>>>.

The whole load.yaml now is:

phantom:
  address: rtfm.co.ua:443
  header_http: "1.1"
  headers:
    - "[Host: rtfm.co.ua]"
  uris:
    - /
  load_profile:
    load_type: rps
    schedule: const(1,30s)
  ssl: true
console:
  enabled: true
telegraf:
  enabled: true
  package: yandextank.plugins.Telegraf
  config: monitoring.xml
autostop:
  autostop:
    - http(2xx,100%,1s)

Useful links

All in Russian, unfortunately.



Also published on Medium.