Grafana Loki: LogQL for logs and creating metrics for alerts

By | 03/12/2023
 

Good – we learned how to launch Loki – Grafana Loki: architecture and running in Kubernetes with AWS S3 storage and bolted-shipper, we also figured out how to configure alerts – Grafana Loki: alerts from the Loki Ruler and labels from logs.

Now it’s time to figure out what we can do in Loki using its LogQL.

Preparation

For examples, we will use two pods – one with nginxdemo/hello for normal nginx logs, and the other with thorstenhans/fake-logger, which will write logs in JSON.

For Nginx, add a Kubernetes Service to be able to send requests from curl:

apiVersion: v1
kind: Pod
metadata:
  name: nginxdemo
  labels:
    app: nginxdemo
    logging: test
spec:
  containers:
  - name: nginxdemo
    image: nginxdemos/hello
    ports:
      - containerPort: 80
        name: nginxdemo-svc
---
apiVersion: v1
kind: Service
metadata:
  name: nginxdemo-service
  labels:
    app: nginxdemo
    logging: test
spec:
  selector:
    app: nginxdemo
  ports:
  - name: nginxdemo-svc-port
    protocol: TCP
    port: 80
    targetPort: nginxdemo-svc
---
apiVersion: v1
kind: Pod
metadata:
  name: fake-logger
  labels:
    app: fake-logger
    logging: test
spec:
  containers:
  - name: fake-logger
    image: thorstenhans/fake-logger:0.0.2

Deploy and forward the port:

[simterm]

$ kk port-forward svc/nginxdemo-service 8080:80

[/simterm]

And let’s run curl with a regular GET in a loop:

[simterm]

$ watch -n 1 curl localhost:8080 2&>1 /dev/null &

[/simterm]

And another one, with POST:

[simterm]

$ watch -n 1 curl -X POST localhost:8080 2&>1 /dev/null

[/simterm]

Let’s go.

Grafana Explore: Loki – interface

A few words about the Grafana Explore > Loki interface itself.

You can use multiple queries at the same time:

Also, it is possible to divide the interface into two parts, and perform separate queries in each:

As in the usual Grafana dashboards, it is possible to select the period for which you want to receive data and set the interval for auto-updating:

Or you can turn on the Live mode – then the data will appear as soon as it reaches Loki:

There are two modes for creating requests – Builder and Code.

In the Builder mode, Loki displays a list of available label and filters:

In the Code mode, they will be substituted automatically as you type:

The Explain function will explain exactly what your request does:

And the Inspector will display details about your request – how much time and resources were used to form the answer, useful for optimizing requests:

Also, you can open the Loki Cheat Sheet by clicking (?)to the right of the query box:

LogQL: overview

In general, working with Loki and its LogQL is almost similar to working with Prometheus and its PromQL – almost all the same functions and general approach, this is even reflected in the description of Loki: ” Like Prometheus, but for logs”.

So, the main selection is based on indexed labels, with the help of which we make a basic search in the logs – we select a stream.

Types of requests in Loki depend on the final result:

  • Log queries: form strings from log files
  • Metric queries: includes Log queries, but as a result form numerical values ​​that can be used to generate graphs in Grafana or for alerts in Ruler

In general, any query consists of three main parts:

{Log Stream Selectors} <Log Pipeline "Log filter">

That is, in the query:

{app="nginxdemo"} |= "172.17.0.1"

The {app="nginxdemo"} is the Log Stream Selector, in which we select a specific stream from Loki, the |= is the beginning of the Log Pipeline which includes a Log Filter Expression – "172.17.0.1".

In addition to the Log filter, the pipeline can include a Log or Tag formatting expression that changes the data received in the pipeline.

The Log Stream Selector is required, while the Log Pipeline with its expressions is optional, and is used to refine or format the results.

Log queries

Log Stream Selectors

Labels are used for selectors, which are specified by the agent that collects logs – promtailfluentd or by others.

The Log Stream Selector determines how many indexes and blocks of data will be loaded to return the result, that is, it directly affects the speed of operation and CPU/RAM resources used to generate the response.

In the example above, in the selector {app="nginxdemo"} we use the operator ” =“, which can be:

  • =: is equal to
  • !=: is not equal to
  • =~: regex
  • !~: negative regex

So, with the {app="nginxdemo"} query we will receive the logs of all pods that have a tag app with the value nginxdemo:

We can combine several selectors, for example, get all logs with the logging=test but without app=nginxdemo:

{logging="test", app!="nginxdemo"}

Or use regex:

{app=~"nginx.+"}

Or simply select all logs (streams) that have the app label:

Log Pipeline

Data received from the stream can be passed to the pipeline for further filtering or formatting. At the same time, the result of one pipeline can be transferred to the next, and so on.

A pipeline can include:

  • Log line filtering expressions – for filtering previous results
  • Parser expressions – for obtaining tags from logs that can be passed to Tag filtering
  • Tag filtering expressions – for filtering data by tags
  • Log line formatting expressions – used to edit the received results
  • Tag formatting expressions  – tag/label editing

Log Line filtering

Filters are used to… filter 🙂

That is, when we received data from the stream and want to select individual dates from it, we use the log filter.

The filter consists of an operator and a term query, which is used to select data.

Operators can be:

  • |=: the string contains a string query
  • ! =: the string does NOT contain a string query
  • |~: string equals a regular expression
  • ! ~: string is NOT equal to a regular expression

When using regex, keep in mind that Golang RE2 syntax is used, and it is case-sensitive by default. To switch it to case-independent mode, add (i?).

In addition, it is better to use Log Line filtering at the beginning of the request, because they work quickly, and will save the following pipelines from unnecessary work.

An example of a log filter can be sampling by a string:

{job=~".+"} |= "promtail"

Or multiple expressions using a regular expression:

Parser expressions

Parsers… parse) input data and labels are obtained from them, which can then be used in further filters or to form Metric queries.

Currently, LogQL supports jsonlogfmtpatternregexpand unpackfor working with tags.

json

For example, json forms all JSON keys into labels, i.e. request {app="fake-logger"} | json instead of:

Creates a new set of tags:

The received through json tags can be further used for additional filters, for example – select only terms from level=debug:

logfmt

To generate tags from logs not in JSON format, you can use logfmt, which will convert all found fields into labels.

For example, job="monitoring/loki-read" has fields key=value:

level=info ts=2022-12-28T14:31:11.645759285Z caller=metrics.go:170 component=frontend org_id=fake latency=fast

With the help of logfmt will be converted into labels:

regexp

The regex parser takes an argument specifying the regex group that will generate the tag from the query.

For example, from the string:

10.0.44.12 – – [28/Dec/2022:14:42:58 +0000] 204 “POST /loki/api/v1/push HTTP/1.1” 0 “-” “promtail/” “-“

We can dynamically generate labels ip and status_code:

{container="nginx"} | regexp "^(?P<ip>[0-9]{1,3}.{3}[0-9]{1,3}).*(?P<status_code>[0-9]{3})"

pattern

The pattern allows you to create labels based on log record templates, i.e. the line:

10.0.7.188 – – [28/Dec/2022:15:27:04 +0000] 204 “POST /loki/api/v1/push HTTP/1.1” 0 “-” “promtail/2.7.0” “-“

Can be described as:

{container="nginx"} | pattern `<ip> - - [<date>] <status_code> "<request> <uri> <_>" <size> "<_>" "<agent>" <_>`

where <_> ignores, that is, does not creates a tag.

And as a result, we will get a set of labels according to this template:

See more hereIntroducing the pattern parser.

Tag filtering expressions

As can be seen from the name, it allows you to create new filters from tags that are already in the record, or that were created using a previous parser, for example logfmt.

Let’s take the following string:

level=info ts=2022-12-28T15:56:31.449162304Z caller=table_manager.go:252 msg=”query readiness setup completed” duration=1.965µs distinct_users_len=0

If we pass it through the parser logrmt, we will get the tags callermsgdurarion and distinct_users_len:

Next, we can create a filter based on these tags:

Available operators here are ===!=>>=<<=.

Also, we can use operators and or or:

{job="monitoring/loki-read"} | logfmt | caller="table_manager.go:252" or org_id="fake" and caller!~"metrics.go.+"

Log line formatting expressions

Next, we can form what data will be displayed to us in the record.

For example, let’s take the same loki-read where we have labels:

Among them, we are interested in displaying the only component and duration, so we can use the following formatting:

{job="monitoring/loki-read"} | logfmt | line_format "{{.component}} {{.duration}}"

Label format expressions

With its help of the label_format we can rename, change or add new labels.

To do so, we pass the name of the label with the operator = followed by the desired value as an argument.

For example, we have a label app:

Which we want to rename to application – use label_format application=app:

Or we can use the value of an existing label to create a new one, for this we use a template in the form of {{.field_name}}, where we can combine several fields.

That is if we want to create a label error_message that will contain the values ​​of fields level and msg – we can build the following request:

{job="default/fake-logger"} | json | label_format error_message="{{.level}} {{.msg}}"

Log Metrics

And let’s see how we can create metrics from logs that can be used to generate graphs or alerts (see Grafana Loki: alerts from the Loki Ruler and labels from logs).

Interval vectors

For working with time vectors, there are currently four available functions, already familiar from Prometheus:

  • rate: number of logs per second
  • count_over_time: count the number of stream entries for a given time period
  • bytes_rate: number of bytes per second
  • bytes_over_time: count the number of bytes of the stream for a given time interval

For example, to get queries per second for the fake-logger job:

rate({job="default/fake-logger"}[5m])

It can be useful to create an alert for a case when some service starts writing a lot of logs, which can be a sign that “something went wrong”.

You can get the number of records with a level warningin the last 5 minutes using the following query:

count_over_time({job="default/fake-logger"} | json | level="warning" [5m])

Aggregation Functions

Also, we can use aggregation functions to combine output data, also familiar from PromQL:

  • sum: an amount by label
  • minmaxand avg: minimum, maximum, and average value
  • stdevstdvar: standard deviation and variance
  • count: the number of elements in the vector
  • bottomkand topk: minimum and maximum elements

Syntax of aggregation functions:

<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]

For example, to get the number of records per second from the fake-logger job, and divide them by the label:

sum(rate({job="default/fake-logger"} | json [5m])) by (level)

Or from the examples above:

  • get records from loki-read pods
  • from the result to create two new labels – componentandduration
  • get the number of records per second
  • remove entries without a component
  • and display the sum for each component
sum(rate({job="monitoring/loki-read"} | logfmt | line_format "{{.component}} {{.duration}}" | component != "" [10s])) by (component)

Other operators

And very briefly about other possibilities.

Mathematical operators:

  • +– addition
  • -– subtraction
  • *– multiplication
  • /– division
  • %is the coefficient
  • ^– exponentiation

Logical operators:

  • and: and
  • or: or
  • unless: except

Comparison operators:

  • ==: is equal to
  • !=: is not equal to
  • >: more than
  • >=: greater than or equal to
  • <: less than
  • <=: less than or equal to

Again from the examples used earlier, create a label request:

{container="nginx"} | pattern `<_> - - [<_>] <_> "<request> <_> <_>" <_> "<_>" "<_>" <_>`

Let’s get the rate of POST requests per second for the last 5 minutes:

sum(rate({container="nginx"} | pattern `<_> - - [<_>] <_> "<request> <_> <_>" <_> "<_>" "<_>" <_>` | request="POST" [5m]))

First, let’s check the number of GET and POST requests on the graph:

sum(rate({container="nginx"} | pattern `<_> - - [<_>] <_> "<request> <_> <_>" <_> "<_>" "<_>" <_>` [5m]))  by (request)

And now we will get a percentage with POST type from the total number of requests:

  • divide all POST requests by the total number of requests
  • multiply the result by 100
sum(rate({container="nginx"} | pattern `<_> - - [<_>] <_> "<request> <_> <_>" <_> "<_>" "<_>" <_>` | request="POST" [5m])) / sum(rate({container="nginx"} | pattern `<_> - - [<_>] <_> "<request> <_> <_>" <_> "<_>" "<_>" <_>` [5m])) * 100

That’s all.

Useful links