Now we faced with outgoing traffic issues in our Production environments and can’t find who guilty for that.
One of the ways to try to catch it is to collect statistics about DNS requests and then to see if there is a correlation between the OUT spikes and DNS requests on AWS EC2 hosts to the local dnsmasq services.
the log is tailed by the promtail which sends data to a monitoring host with Loki
and finally, Grafana will draw graphs basing on the data from Loki
The setup described below is more Proof of Concept as the Loki itself and its support in Grafana still under development.
But now the Explore feature in Grafana supports aggregation and counting functions similarly to Prometheus – sum(), rate(), etc.
The promtail during the last year also added some new interesting abilities which will we use in this post.
At first, we will spin up a usual Grafana + Loki + promtail stack, then will add logs collecting from our Production environment and finally – will add a Grafana’s dashboard using new LogQL’s functions
Loki will start by Docker Compose, create a loki-stack.yml file:
A pipeline is used to transform a single log line, its labels, and its timestamp. A pipeline is comprised of a set of stages. There are 4 types of stages:
Parsing stages parse the current log line and extract data out of it. The extracted data is then available for use by other stages.
Transform stages transform extracted data from previous stages.
Action stages take extracted data from previous stages and do something with them. Actions can:
Add or modify existing labels to the log line
Change the timestamp of the log line
Change the content of the log line
Create a metric based on the extracted data
Filtering stages optionally apply a subset of stages or drop entries based on some condition.
So, in short terms – you can build a pipeline for data with multitype stages.
Stages can be:
Parsing stages: will parse the log and extract data to pass it then to the next stages
Transform stages: will transform data from a previous stage(s)
Action stages: receives data from the previous stage(s) and can:
change log’s line
create a metric(s) basing on an extracted data
Typical pipelines will start with a parsing stage (such as a regex or json stage) to extract data from the log line. Then, a series of action stages will be present to do something with that extracted data. The most common action stage will be a labels stage to turn extracted data into a label.
Let’s go back to the very beginning of this whole story – what we want to achieve?
We want to get all IN A requests to our dnsmasq, extract hostnames and display a graph – how many requests were performed for a particular domain name.
Thus, need to:
grab all IN A requests
save each to a label
and count them
Go to the promtail on the Production and add stages – update the promtail-dev.yml config file:
create a regex stage which selects all lines with the query[A] string
create a regex group called query, where will save resulted string till the first space
i.e. a string origin could be: Nov 16 08:23:33 dnsmasq: query[A] backend-db3-master.example.com from 127.0.0.1
and in the query regex group will get the value: backend-db3-master.example.com
create alabels stage which will attach a new label called query with the backend-db3-master.example.com value taken from the query regex group
root@bttrm-production-console:/home/admin# docker run -ti -v /opt/prometheus-client/promtail-dev.yml:/etc/promtail/promtail.yml -v /var/log:/var/log grafana/promtail:master-2739551 -config.file=/etc/promtail/promtail.yml
level=info ts=2019-11-16T11:56:29.760425279Z caller=server.go:121 http=[::]:9080 grpc=[::]:32945 msg="server listening on addresses"