Now we have VictoriaMetrics + Grafana on a regular EC2 instance, launched with Docker Compose, see the VictoriaMetrics: an overview and its use instead of Prometheus.
It was kind of a Proof of Concept, and it’s time to launch it “in an adult way” – in Kubernetes and all the configurations stored in a GitHub repository.
VictoriaMetrics has charts for each component to deploy in Kubernetes, see Victoria Metrics Helm Charts, and there are charts to run VictoriaMetrics Operator andvictoria-metrics-k8s-stack
– an analog of the Kuber Prometheus Stack, which I’ve used before.
We will use the victoria-metrics-k8s-stack
which “under the hood” will launch VictoriaMetrics Operator, Grafana, and kube-state-metrics, see its dependencies
.
This post turned out to be quite long, but I tried to describe all the interesting points of deploying full-fledged monitoring with the VictoriaMetrics Kubernetes Monitoring Stack.
UPD: I did a documentation today on my project where I’ve set up that stack, and here is what we will have as the result:
Contents
Planning
So, what will need to be done:
- check the deployment of the
victoria-metrics-k8s-stack
Helm chart - look and think about how to run Prometheus exporters – some of them have charts, but we also have self-written ones (see Prometheus: GitHub Exporter – creating own exporter for GitHub API), so that exporters will have to be pushed to the Elastic Container Service and pulled from there to run in Kubernetes
- secrets for monitoring – Grafana passwords, exporters tokens, etc
- IRSA for exporters – Create IAM Policy and Roles for ServiceAccounts
- transfer of existing alerts
- config for VMAgent to collect metrics from the exporters
- run Grafana Loki
Regarding the logs, recently VictoriaLogs was released, but it is still in preview, do not have support to store data in AWS S3, do not have integration with Grafana, and in general, I do not want to spend time yet, as I already know Loki more or less. Perhaps I will launch VictoriaLogs separately, to “play around and see”, and when it will be integrated with Grafana, I will replace Loki with VictoriaLogs, because now we already have dashboards with graphs from Loki logs.
Also, it will be necessary to take a look at persistence in VictoriaMetrics in Kubernetes – size, types of disks, and so on. Maybe think about their backups (VMBackup?).
We have a lot of things in the existing monitoring:
[simterm]
root@ip-172-31-89-117:/home/admin/docker-images/prometheus-grafana# tree . . ├── alertmanager │ ├── config.yml │ └── notifications.tmpl ├── docker-compose.yml ├── grafana │ ├── config.monitoring │ └── provisioning │ ├── dashboards │ │ └── dashboard.yml │ └── datasources │ └── datasource.yml ├── prometheus │ ├── alert.rules │ ├── alert.templates │ ├── blackbox-targets │ │ └── targets.yaml │ ├── blackbox.yml │ ├── cloudwatch-config.yaml │ ├── loki-alerts.yaml │ ├── loki-conf.yaml │ ├── prometheus.yml │ ├── promtail.yaml │ └── yace-config.yaml └── prometheus.yml
[/simterm]
What to deploy at all? Through the AWS CDK and its cluster.add_helm_chart()
– or do a separate step in GitHub Actions with Helm?
We will need a CDK in any case – to create certificates from ACM, Lambda for logs in Loki, S3 buckets, IAM roles for exporters, etc.
But I don’t like the idea to drag the deployment charts into the AWS CDK, because it is better to separate the deployment of infrastructure objects from the deployment of the monitoring stack itself.
OK – let’s do it separately: CDK will create resources in AWS, and Helm will deploy charts. Or a single chart? Maybe just make an own Helm chart, and connect VictoriaMetrics Stack and exporters to it as subcharts? Seems like a good idea.
We will also need to create Kubernetes Secrets and ConfigMaps with configs for VMAgent, Loki (see Loki: collecting logs from CloudWatch Logs using Lambda Promtail), for Alertmanager, etc. Make them with Kustomize? Or just YAML-manifests in the templates
directory of our chart?
Will see during setup.
Now in order – what needs to be done:
- run exporters
- connect a config to VMAgent to start collecting metrics from these exporters
- check how ServiceMonitors are configured (VMServiceScrape in VictoriaMetrics)
- Grafana:
- data sources
- dashboards
- add Loki
- alerts
Let’s go. Let’s start by checking the chart itself victoria-metrics-k8s-stack
.
VictoriaMetrics Stack Helm Chart installation
Add repositories with dependencies:
[simterm]
$ helm repo add grafana https://grafana.github.io/helm-charts "grafana" has been added to your repositories $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts "prometheus-community" has been added to your repositories
[/simterm]
And VictoriaMetrics itself:
[simterm]
$ helm repo add vm https://victoriametrics.github.io/helm-charts/ "vm" has been added to your repositories $ helm repo update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "vm" chart repository ...Successfully got an update from the "grafana" chart repository ...Successfully got an update from the "prometheus-community" chart repository Update Complete. ⎈Happy Helming!⎈
[/simterm]
Check the versions of the victoria-metrics-k8s-stack
chart:
[simterm]
$ helm search repo vm/victoria-metrics-k8s-stack -l NAME CHART VERSION APP VERSION DESCRIPTION vm/victoria-metrics-k8s-stack 0.17.0 v1.91.3 Kubernetes monitoring on VictoriaMetrics stack.... vm/victoria-metrics-k8s-stack 0.16.4 v1.91.3 Kubernetes monitoring on VictoriaMetrics stack.... vm/victoria-metrics-k8s-stack 0.16.3 v1.91.2 Kubernetes monitoring on VictoriaMetrics stack.... ...
[/simterm]
All values can be taken as follows:
[simterm]
$ helm show values vm/victoria-metrics-k8s-stack > default-values.yaml
[/simterm]
Or just from the repository – values.yaml.
A minimal values for the VictoriaMetrics chart
VictoriaMetrics has very good documentation, so during the process, we will often use the API Docs.
Here, we’ll use VMSingle instead of VMCluster as our project is small, and I’m just getting to know VictoriaMetrics, so I don’t want to complicate the system.
Create a minimal configuration:
# to confugire later victoria-metrics-operator: serviceAccount: create: false # to confugire later alertmanager: enabled: true # to confugire later vmalert: annotations: {} enabled: true # to confugire later vmagent: enabled: true grafana: enabled: true ingress: enabled: true annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/scheme: internet-facing hosts: - monitoring.dev.example.co
Deploy to a new namespace:
[simterm]
$ helm upgrade --install victoria-metrics-k8s-stack -n dev-monitoring-ns --create-namespace vm/victoria-metrics-k8s-stack -f atlas-monitoring-dev-values.yaml
[/simterm]
Check Pods:
[simterm]
$ kk -n dev-monitoring-ns get pod NAME READY STATUS RESTARTS AGE victoria-metrics-k8s-stack-grafana-76867f56c4-6zth2 0/3 Init:0/1 0 5s victoria-metrics-k8s-stack-kube-state-metrics-79468c76cb-75kgp 0/1 Running 0 5s victoria-metrics-k8s-stack-prometheus-node-exporter-89ltc 1/1 Running 0 5s victoria-metrics-k8s-stack-victoria-metrics-operator-695bdxmcwn 0/1 ContainerCreating 0 5s vmsingle-victoria-metrics-k8s-stack-f7794d779-79d94 0/1 Pending 0 0s
[/simterm]
And Ingress:
[simterm]
$ kk -n dev-monitoring-ns get ing NAME CLASS HOSTS ADDRESS PORTS AGE victoria-metrics-k8s-stack-grafana <none> monitoring.dev.example.co k8s-devmonit-victoria-***-***.us-east-1.elb.amazonaws.com 80 6m10s
[/simterm]
Wait for a DNS update, or just open access to the Grafana Service – find it:
[simterm]
$ kk -n dev-monitoring-ns get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE victoria-metrics-k8s-stack-grafana ClusterIP 172.20.162.193 <none> 80/TCP 12m ...
[/simterm]
And run port-forward
:
[simterm]
$ kk -n dev-monitoring-ns port-forward svc/victoria-metrics-k8s-stack-grafana 8080:80 Forwarding from 127.0.0.1:8080 -> 3000 Forwarding from [::1]:8080 -> 3000
[/simterm]
Go to the http://localhost:8080/ in your browser.
Default username is admin, get its generated password:
[simterm]
$ kubectl -n dev-monitoring-ns get secret victoria-metrics-k8s-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo 1Ev***Ko2
[/simterm]
And we already have ready-made dashboards (the defaultDashboardsEnabled
parameter in the default values):
Okay, that’s working, and it’s time to think about other settings.
Creating own helm chart for the monitoring stack
So, let’s make a kind of “umbrella chart” that will run the VictoriaMetrics Stack itself, all necessary Prometheus exporters, and will create all the necessary Secrets/ConfgiMaps, etc.
How will it work?
- we’ll create a chart
- in its
dependencies
we’ll add the VictoriaMetrics Stack - in the same
dependencies
we will add exporters - in the
templates
directory of our chart we will describe our custom resources (ConfigMaps, VMRules, Deployments for custom Exporters, etc.)
Let’s recall how it is generally doing – Helm Create, Helm: dependencies aka subcharts – обзор и пример, How to make a Helm chart in 10 minutes, One Chart to rule them all – How to implement Helm Subcharts.
But instead of helm create
we’ll do the chart manually, as helm create
will create too many needless files.
Create directories in our monitoring repository:
[simterm]
$ mkdir -p victoriametrics/{templates,charts,values}
[/simterm]
Check the structure:
[simterm]
$ tree victoriametrics victoriametrics ├── charts ├── templates └── values
[/simterm]
Go to the victoriametrics
directory and create a Chart.yaml
file:
apiVersion: v2 name: atlas-victoriametrics description: A Helm chart for Atlas Victoria Metrics kubernetes monitoring stack type: application version: 0.1.0 appVersion: "1.16.0"
Adding subcharts
Now it’s time to add dependencies
, start with the victoria-metrics-k8s-stack
.
Versions have already been found, let’s remember which was the last one:
[simterm]
$ helm search repo vm/victoria-metrics-k8s-stack -l NAME CHART VERSION APP VERSION DESCRIPTION vm/victoria-metrics-k8s-stack 0.17.0 v1.91.3 Kubernetes monitoring on VictoriaMetrics stack.... vm/victoria-metrics-k8s-stack 0.16.4 v1.91.3 Kubernetes monitoring on VictoriaMetrics stack.... ...
[/simterm]
Add with ~
to the version number to include patches up to version 0.17 (see Dependencies):
apiVersion: v2 name: atlas-victoriametrics description: A Helm chart for Atlas Victoria Metrics kubernetes monitoring stack type: application version: 0.1.0 appVersion: "1.16.0" dependencies: - name: victoria-metrics-k8s-stack version: ~0.17.0 repository: https://victoriametrics.github.io/helm-charts/
Add values.yaml for subcharts
Next, create directories for values
:
[simterm]
$ mkdir -p values/{dev,prod}
[/simterm]
Copy our minimal config to the values/dev/
:
[simterm]
$ cp ../atlas-monitoring-dev-values.yaml values/dev/
[/simterm]
Then we will export all the general parameters in some common-values.yaml
, and the values that will be different for Dev/Prod – in separate files.
Update our values - add a victoria-metrics-k8s-stack
block, because now it will be our subchart:
victoria-metrics-k8s-stack: # no need yet victoria-metrics-operator: serviceAccount: create: true # to confugire later alertmanager: enabled: true # to confugire later vmalert: annotations: {} enabled: true # to confugire later vmagent: enabled: true grafana: enabled: true ingress: enabled: true annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/scheme: internet-facing hosts: - monitoring.dev.example.co
Download charts from the dependencies
:
[simterm]
$ helm dependency update Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "vm" chart repository ...Successfully got an update from the "grafana" chart repository ...Successfully got an update from the "prometheus-community" chart repository Update Complete. ⎈Happy Helming!⎈ Saving 1 charts Downloading victoria-metrics-k8s-stack from repo https://victoriametrics.github.io/helm-charts/ Deleting outdated charts
[/simterm]
Check content of the charts
directory:
[simterm]
$ ls -1 charts/ victoria-metrics-k8s-stack-0.17.0.tgz
[/simterm]
And run helm template
for the new Helm chart with our VictoriaMetrics Stack to check that the chart itself, its dependencies
and values
are working:
[simterm]
$ helm template . -f values/dev/atlas-monitoring-dev-values.yaml --- # Source: victoriametrics/charts/victoria-metrics-k8s-stack/charts/grafana/templates/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: labels: helm.sh/chart: grafana-6.44.11 app.kubernetes.io/name: grafana app.kubernetes.io/instance: release-name ...
[/simterm]
Looks OK – let’s try to deploy.
Delete the old release:
[simterm]
$ helm -n dev-monitoring-ns uninstall victoria-metrics-k8s-stack release "victoria-metrics-k8s-stack" uninstalled
[/simterm]
Service Invalid value: must be no more than 63 characters
Deploy a new release, and:
[simterm]
$ helm -n dev-monitoring-ns upgrade --install atlas-victoriametrics . -f values/dev/atlas-monitoring-dev-values.yaml Release "atlas-victoriametrics" does not exist. Installing it now. Error: 10 errors occurred: * Service "atlas-victoriametrics-victoria-metrics-k8s-stack-kube-controlle" is invalid: metadata.labels: Invalid value: "atlas-victoriametrics-victoria-metrics-k8s-stack-kube-controller-manager": must be no more than 63 characters
[/simterm]
Check the length of the name:
[simterm]
$ echo atlas-victoriametrics-victoria-metrics-k8s-stack-kube-controller-manager | wc -c 73
[/simterm]
To solve this, add fullnameOverride
to the values
a shortened name:
victoria-metrics-k8s-stack: fullnameOverride: "vm-k8s-stack" ...
Deploy again:
[simterm]
$ helm -n dev-monitoring-ns upgrade --install atlas-victoriametrics . -f values/dev/atlas-monitoring-dev-values.yaml Release "atlas-victoriametrics" has been upgraded. Happy Helming! ...
[/simterm]
Check resources:
[simterm]
$ kk -n dev-monitoring-ns get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/atlas-victoriametrics-grafana ClusterIP 172.20.93.0 <none> 80/TCP 0s service/atlas-victoriametrics-kube-state-metrics ClusterIP 172.20.113.37 <none> 8080/TCP 0s ...
[/simterm]
Seems everything is fine here – let’s add exporters
Prometheus CloudWatch Exporter subchart
To authenticate exports to AWS, we will use IRSA, described in the AWS: CDK and Python – configure an IAM OIDC Provider, and install Kubernetes Controllers post.
So let’s assume that the IAM Role for the exporter already exists – we just need to install the prometheus-cloudwatch-exporter
Helm chart and specify the ARN of the IAM role.
Check the chart’s available versions:
[simterm]
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm search repo prometheus-community/prometheus-cloudwatch-exporter NAME CHART VERSION APP VERSION DESCRIPTION prometheus-community/prometheus-cloudwatch-expo... 0.25.1 0.15.4 A Helm chart for prometheus cloudwatch-exporter
[/simterm]
Add it to the dependencies
of our Chart.yaml
:
... dependencies: - name: victoria-metrics-k8s-stack version: ~0.17.0 repository: https://victoriametrics.github.io/helm-charts/ - name: prometheus-cloudwatch-exporter version: ~0.25.1 repository: https://prometheus-community.github.io/helm-charts
In the values/dev/atlas-monitoring-dev-values.yaml
file add a prometheus-cloudwatch-exporter.serviceAccount.annotations
parameter with the ARN of our IAM role, and a config
block with the metrics that we will collect:
prometheus-cloudwatch-exporter: serviceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::492***148:role/atlas-monitoring-dev-CloudwatchExporterRole0613A27-EU5LW9XRWVRL config: |- region: us-east-1 metrics: - aws_namespace: AWS/Events aws_metric_name: FailedInvocations aws_dimensions: [RuleName] aws_statistics: [Sum, SampleCount] - aws_namespace: AWS/Events aws_metric_name: Invocations aws_dimensions: [EventBusName, RuleName] aws_statistics: [Sum, SampleCount]
Although if the config is large, it is probably better to do it by creating your own ConfigMap for the exporter.
Update the dependencies:
[simterm]
$ helm dependency update
[/simterm]
Deploy:
[simterm]
$ helm -n dev-monitoring-ns upgrade --install atlas-victoriametrics . -f values/dev/atlas-monitoring-dev-values.yaml
[/simterm]
Check the Pod:
[simterm]
$ kk -n dev-monitoring-ns get pod | grep cloud atlas-victoriametrics-prometheus-cloudwatch-exporter-564ccfjm9j 1/1 Running 0 53s
[/simterm]
And corresponding ServiceAccount:
[simterm]
$ kk -n dev-monitoring-ns get pod atlas-victoriametrics-prometheus-cloudwatch-exporter-64b6f6b9rv -o yaml ... - name: AWS_ROLE_ARN value: arn:aws:iam::492***148:role/atlas-monitoring-dev-CloudwatchExporterRole0613A27-EU5LW9XRWVRL - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token ...
[/simterm]
Run port-forward
:
[simterm]
$ kk -n dev-monitoring-ns port-forward svc/atlas-victoriametrics-prometheus-cloudwatch-exporter 9106
[/simterm]
And let’s see if we have metrics there:
[simterm]
$ curl -s localhost:9106/metrics | grep aws_ # HELP aws_events_invocations_sum CloudWatch metric AWS/Events Invocations Dimensions: [EventBusName, RuleName] Statistic: Sum Unit: Count # TYPE aws_events_invocations_sum gauge aws_events_invocations_sum{job="aws_events",instance="",event_bus_name="***-staging",rule_name="***_WsConnectionEstablished-staging",} 2.0 1689598980000 aws_events_invocations_sum{job="aws_events",instance="",event_bus_name="***-prod",rule_name="***_ReminderTimeReached-prod",} 2.0 1689598740000 aws_events_invocations_sum{job="aws_events",instance="",event_bus_name="***-prod",rule_name="***_PushNotificationEvent-prod",} 2.0 1689598740000
[/simterm]
Great.
Now, we need to configure VMAgent to start collecting these metrics from this exporter.
Collecting metrics from exporters: VMAgent && scrape_configs
The usual for the Kube Prometheus Stack way is to simply set the servicemonitor.enabled=true
in an exporter’s Helm chart values, and Prometheus Operator will create a ServiceMonitor to start collecting metrics.
However, this won’t work with VictoriaMetrics because ServiceMonitor CRD is a part of the kube-prometheus-stack, and the ServiceMonitor resource simply won’t be created.
Instead, VictoriaMetrics has its own counterpart – VMServiceScrape, which can be created from a manifest where we can configure an endpoint to collect metrics. In addition, VictoriaMetrics can create a VMServiceScrape resoucres from existing ServiceMonitors, but this requires the installation of the ServiceMonitor CRD itself.
We can also pass a list of targets with the inlineScrapeConfig
or additionalScrapeConfigs
, see VMAgentSpec
.
Most likely, I’ll use the inlineScrapeConfig
for now, because our config is not too big.
It is also worth to take look at the VMAgent’s values.yaml – for example, there are default scrape_configs
values.
One more nuance that should be kept in mind – VMAgent does not check target configurations, i.e. if there is an error in YAML – then VMAgent simply ignores the changes and does not reload the file, and will not write anything to the log.
VMServiceScrape
First, let’s create a VMServiceScrape manually to see how it works.
Check the labels in the CloudWatch Exporter Service:
[simterm]
$ kk -n dev-monitoring-ns describe svc atlas-victoriametrics-prometheus-cloudwatch-exporter Name: atlas-victoriametrics-prometheus-cloudwatch-exporter Namespace: dev-monitoring-ns Labels: app=prometheus-cloudwatch-exporter app.kubernetes.io/managed-by=Helm chart=prometheus-cloudwatch-exporter-0.25.1 heritage=Helm release=atlas-victoriametrics ...
[/simterm]
Describe the VMServiceScrape with the matchLabels
where we specify the labels of the CloudWatch exporter’s Service:
apiVersion: operator.victoriametrics.com/v1beta1 kind: VMServiceScrape metadata: name: prometheus-cloudwatch-exporter-vm-scrape namespace: dev-monitoring-ns spec: selector: matchLabels: app: prometheus-cloudwatch-exporter endpoints: - port: http
Deploy:
[simterm]
$ kubectl apply -f vmsvcscrape.yaml vmservicescrape.operator.victoriametrics.com/prometheus-cloudwatch-exporter-vm-scrape created
[/simterm]
Check the vmservicescrape
resources – there is already a bunch of default ones created by the VictoriaMetrics Operator:
[simterm]
$ kk -n dev-monitoring-ns get vmservicescrape NAME AGE prometheus-cloudwatch-exporter-vm-scrape 6m45s vm-k8s-stack-apiserver 4d22h vm-k8s-stack-coredns 4d22h vm-k8s-stack-grafana 4d22h vm-k8s-stack-kube-controller-manager 4d22h ...
[/simterm]
The VMAgent config is created in the Pod in the file /etc/vmagent/config_out/vmagent.env.yaml
.
Let’s see if our CloudWatch Exporter has been added there:
[simterm]
$ kk -n dev-monitoring-ns exec -ti vmagent-vm-k8s-stack-98d7678d4-cn8qd -c vmagent -- cat /etc/vmagent/config_out/vmagent.env.yaml global: scrape_interval: 25s external_labels: cluster: eks-dev-1-26-cluster prometheus: dev-monitoring-ns/vm-k8s-stack scrape_configs: - job_name: serviceScrape/dev-monitoring-ns/prometheus-cloudwatch-exporter-vm-scrape/0 honor_labels: false kubernetes_sd_configs: - role: endpoints namespaces: names: - dev-monitoring-ns ...
[/simterm]
And now we must have the metrics in VictoriaMetrics itself.
Open a port:
[simterm]
$ kk -n dev-monitoring-ns port-forward svc/vmsingle-vm-k8s-stack 8429
[/simterm]
Go to the http://localhost:8429/vmui/, and to check – make a request for any metric from the CloudWatch Exporter:
Good – we saw how to manually create a VMServiceScrape. But what about automating this process? I don’t really like the idea to create a dedicated VMServiceScrape for each service through Kustomize.
VMServiceScrape from a ServiceMonitor and VictoriaMetrics Prometheus Converter
So as already mentioned, in order for the ServiceMonitor object to be created in the cluster, we need a ServiceMonitor’s Custom Resource Definition.
We can install it directly from the manifest in the repository kube-prometheus-stack
:
[simterm]
$ kubectl apply -fhttps://raw.githubusercontent.com/prometheus-community/helm-charts/main/charts/kube-prometheus-stack/charts/crds/crds/crd-servicemonitors.yaml customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
[/simterm]
Then updatevalues
– add the serviceMonitorenabled=true
:
... prometheus-cloudwatch-exporter: serviceAccount: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::492***148:role/atlas-monitoring-dev-CloudwatchExporterRole0613A27-EU5LW9XRWVRL eks.amazonaws.com/sts-regional-endpoints: "true" serviceMonitor: enabled: true ...
And in the victoria-metrics-k8s-stack
values, add the operator.disable_prometheus_converter=false
parameter:
victoria-metrics-k8s-stack: fullnameOverride: "vm-k8s-stack" # no need yet victoria-metrics-operator: serviceAccount: create: true operator: disable_prometheus_converter: false ...
Deploy and check whether a servicemonitor
was created:
[simterm]
$ kk -n dev-monitoring-ns get servicemonitors NAME AGE atlas-victoriametrics-prometheus-cloudwatch-exporter 2m22s
[/simterm]
And we have to have a vmservicescrape
created automatically:
[simterm]
$ kk -n dev-monitoring-ns get vmservicescrape NAME AGE atlas-victoriametrics-prometheus-cloudwatch-exporter 2m11s ...
[/simterm]
Check the targets
:
Everything is there.
The only nuance here is that when a ServiceMonitor is deleted, the corresponding one vmservicescrape
will remain in the cluster. Also, the need to install a third-party CRD, which will have to be somehow updated over time, preferably automatically.
inlineScrapeConfig
Probably the simplest option is to describe a config using inlineScrapeConfig
directly in the values
of our chart:
... vmagent: enabled: true spec: externalLabels: cluster: "eks-dev-1-26-cluster" inlineScrapeConfig: | - job_name: cloudwatch-exporter-inline-job metrics_path: /metrics static_configs: - targets: ["atlas-victoriametrics-prometheus-cloudwatch-exporter:9106"] ...
Deploy and check the vmagent
:
[simterm]
$ kk -n dev-monitoring-ns get vmagent -o yaml apiVersion: v1 items: - apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAgent ... inlineScrapeConfig: | - job_name: cloudwatch-exporter-inline-job metrics_path: /metrics static_configs: - targets: ["atlas-victoriametrics-prometheus-cloudwatch-exporter:9106"] ...
[/simterm]
Let’s look at the targets again:
additionalScrapeConfigs
A more secure way if there are any access tokens/keys in the parameters, but requires a separate Kubernetes Secret object to be created.
Actually, it is not a problem, because we will have to have additional ConfigMaps/Secrets anyway, and I’ll want to export the config of targets in a separate file most likely I will convert it to the additionalScrapeConfigs
.
Now we will create it manually, just to see how it will work. Take an example directly from the documentation:
apiVersion: v1 kind: Secret metadata: name: additional-scrape-configs stringData: prometheus-additional.yaml: | - job_name: cloudwatch-exporter-secret-job metrics_path: /metrics static_configs: - targets: ["atlas-victoriametrics-prometheus-cloudwatch-exporter:9106"]
Do not forget to deploy it
[simterm]
$ kubectl -n dev-monitoring-ns apply -f vmagent-targets-secret.yaml secret/additional-scrape-configs created
[/simterm]
Update the VMAgent values - add the additionalScrapeConfigs
block:
... vmagent: enabled: true spec: externalLabels: cluster: "eks-dev-1-26-cluster" additionalScrapeConfigs: name: additional-scrape-configs key: prometheus-additional.yaml inlineScrapeConfig: | - job_name: cloudwatch-exporter-inline-job metrics_path: /metrics static_configs: - targets: ["atlas-victoriametrics-prometheus-cloudwatch-exporter:9106"] ...
Update the deployment and check the targets:
Now that we have the metrics, we can move on to the Grafana.
Grafana provisioning
What do we need for Grafana? Plugins, plugins, data sources, and dashboards.
First, let’s add Data Sources, see the documentation.
Adding Data Sources && Plugins
If everything is more or less simple with dashboards, then with Data Sources there is a question: how to transfer some secrets to them? For example, for the Sentry data source, we need to set a token, which I do not want to show in the values of the chart because we do not encrypt the data in GitHub, even though the repositories are private (check the git-crypt
if you thinking about encrypting data in a Git repository).
Let’s first see how it works in general, then will think about how to transfer the token to us.
We will add a Sentry Data Source, see grafana-sentry-datasource
. We already have a token created in sentry.io > User settings > User Auth Tokens.
In the Grafana values, we’ll add the plugins
where we set the name of the plugin grafana-sentry-datasource
(the value of the type
field from the documentation above), and describe the additionalDataSources
block with the secureJsonData
field with the token itself:
... grafana: enabled: true ingress: enabled: true annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/scheme: internet-facing hosts: - monitoring.dev.example.co plugins: - grafana-sentry-datasource additionalDataSources: - name: Sentry type: grafana-sentry-datasource access: proxy orgId: 1 version: 1 editable: true jsonData: url: https://sentry.io orgSlug: *** secureJsonData: authToken: 974***56b ...
Deploy, and check the plugin:
And the Data Source:
Okay, it works.
A Token for a Data Source with the envFromSecret
Now let’s try to use a variable with a value from a Kubernetes Secret taken by the envFromSecret
.
Create a Secret:
--- apiVersion: v1 kind: Secret metadata: name: grafana-datasource-sentry-token stringData: SENTRY_TOKEN: 974***56b
Update the Grafana values - add the envFromSecret
to set the $SENTRY_TOKEN
variable, and then use it in the additionalDataSources
:
... grafana: ... envFromSecret: grafana-datasource-sentry-token additionalDataSources: - name: Sentry type: grafana-sentry-datasource access: proxy orgId: 1 version: 1 editable: true jsonData: url: https://sentry.io orgSlug: *** secureJsonData: authToken: ${SENTRY_TOKEN} ...
Deploy and check the variable in the Grafana’s Pod:
[simterm]
$ kk -n dev-monitoring-ns exec -ti atlas-victoriametrics-grafana-64d9db677-g7l25 -c grafana -- printenv | grep SENTRY_TOKEN SENTRY_TOKEN=974***56b
[/simterm]
Check the config of the data sources:
[simterm]
$ kk -n dev-monitoring-ns exec -ti atlas-victoriametrics-grafana-64d9db677-bpkw8 -c grafana -- cat /etc/grafana/provisioning/datasources/datasource.yaml ... apiVersion: 1 datasources: - name: VictoriaMetrics type: prometheus url: http://vmsingle-vm-k8s-stack.dev-monitoring-ns.svc:8429/ access: proxy isDefault: true jsonData: {} - access: proxy editable: true jsonData: orgSlug: *** url: https://sentry.io name: Sentry orgId: 1 secureJsonData: authToken: ${SENTRY_TOKEN} type: grafana-sentry-datasource version: 1
[/simterm]
And again check the Data Source:
So with this approach, we can use the AWS Secrets and Configuration Provider (ASCP) (see AWS: Kubernetes – AWS Secrets Manager and Parameter Store Integration ):
- create a secret variable
$SECRET_NAME_VAR
in the GitHub Actions Secrets - during AWS CDK deployment, take the value into a variable with the
os.env("SECRET_NAME_VAR")
, and create a secret in the AWS Secrets Manager - in our chart’s
templates
directory, we can createSecretProviderClass
with a fieldsecretObjects.secretName
to create a Kubernetes Secret
And when Grafana’s Pod will be created, it will connect this Secret to the Pod:
[simterm]
$ kk -n dev-monitoring-ns get pod atlas-victoriametrics-grafana-64d9db677-dlqfr -o yaml ... envFrom: - secretRef: name: grafana-datasource-sentry-token ...
[/simterm]
And will pass the value to the Grafana itself.
Okay, this might work, although it looks a bit confusing.
But there is another option – with the sidecar.datasources
.
A Kubernetes Secret with a Data Source with sidecar.datasources
There is a second option – to configure data sources through a sidecar container: we can create a Kubernetes Secret with the specific labels
, and add a data source to this secret. See Sidecar for datasources.
And that’s a pretty nice idea: create a manifest with a Kubernetes Secret in the templates
directory, and transfer a value with the --set
during helm install
in GitHub Actions with a value from GitHub Actions Secrets. And it looks simpler. Let’s try.
Describe a Kubernetes Secret in the templates/grafana-datasources-secret.yaml
file:
apiVersion: v1 kind: Secret metadata: name: grafana-datasources labels: grafana_datasource: 'true' stringData: sentry.yaml: |- apiVersion: 1 datasources: - name: Sentry type: grafana-sentry-datasource access: proxy orgId: 1 version: 1 editable: true jsonData: url: https://sentry.io orgSlug: *** secureJsonData: authToken: {{ .Values.grafana.sentry_token }}
Deploy it with --set grafana.sentry_token=TOKEN
:
[simterm]
$ helm -n dev-monitoring-ns upgrade --install atlas-victoriametrics . -f values/dev/atlas-monitoring-dev-values.yaml --set grafana.sentry_token="974***56b"
[/simterm]
Check the configuration of data sources in the Grafana Pod:
[simterm]
$ kk -n dev-monitoring-ns exec -ti atlas-victoriametrics-grafana-5967b494f6-5zmjb -c grafana -- ls -l /etc/grafana/provisioning/datasources total 8 -rw-r--r-- 1 grafana 472 187 Jul 19 13:36 datasource.yaml -rw-r--r-- 1 grafana 472 320 Jul 19 13:36 sentry.yaml
[/simterm]
And the sentry.yaml
file’s content:
[simterm]
$ kk -n dev-monitoring-ns exec -ti atlas-victoriametrics-grafana-5967b494f6-5zmjb -c grafana -- cat /etc/grafana/provisioning/datasources/sentry.yaml apiVersion: 1 datasources: - name: Sentry type: grafana-sentry-datasource access: proxy orgId: 1 version: 1 editable: true jsonData: url: https://sentry.io orgSlug: *** secureJsonData: authToken: 974***56b
[/simterm]
And once again the data source in the Grafana itself:
It’s a magic!
Adding Dashboards
So we already have Grafana in our PoC monitoring, and there are dashboards that we need to move to the new monitoring stack and deploy from the GitHub repository.
Documentation on importing dashboards – Import dashboards.
To create a dashboard through the Helm chart, we have a sidecar container grafana-sc-dashboard
similar to the grafana-sc-datasources
which will check all ConfigMaps with a specific label, and will connect them to the Pod. See Sidecar for dashboards.
Keep in mind the recommendation:
A recommendation is to use one configmap per dashboard, as a reduction of multiple dashboards inside one configmap is currently not properly mirrored in grafana.
That is, one ConfigMap for each dashboard.
So what we need to do is describe a ConfigMap for each dashboard, and Grafana will add them to the /tmp/dashboards
.
Export of existing dashboard and the “Data Source UID not found” error
To avoid an error with the UID (“Failed to retrieve datasource Datasource f0f2c234-f0e6-4b6c-8ed1-01813daa84c9 was not found”) – go to the dashboard in the existing Grafana instance and add a new variable with the Data Source type:
Repeat for Loki, Sentry:
And update the panels – set a datasource from the variable:
Repeat the same for all queries in the Annotations and Variables:
Create a directory for the files that will keep files to import into Kubernetes:
[simterm]
$ mkdir -p grafana/dashboards/
[/simterm]
And export a dashboard in JSON and save as grafana/dashboards/overview.json
:
Dashboard ConfigMap
In the templates
directory, create a manifest for the ConfigMap:
apiVersion: v1 kind: ConfigMap metadata: name: overview-dashboard labels: grafana_dashboard: "1" data: overview.json: | {{ .Files.Get "grafana/dashboards/overview.json" | indent 4 }}
Now all the files of our project look like this:
Deploy the chart and check the ConfigMap:
[simterm]
$ kk -n dev-monitoring-ns get cm overview-dashboard -o yaml | head apiVersion: v1 data: overview.json: | { "annotations": { "list": [ { "builtIn": 1, "datasource": { "type": "grafana",
[/simterm]
And check in the Pod whether the file was added to the /tmp/dashboards
:
[simterm]
$ kubectl -n dev-monitoring-ns exec -ti atlas-victoriametrics-grafana-5967b494f6-gs4jm -c grafana -- ls -l /tmp/dashboards total 1032 ... -rw-r--r-- 1 grafana 472 74821 Jul 19 10:31 overview.json ...
[/simterm]
And in the Grafana itself:
And we have our graphs – not all yet, because an only one exporter has been launched:
Let’s move on.
What do we have to do else?
- GitHub exporter – create a chart, add it as a dependency to the general chart (or just create a manifest with Deployment? we will have just one Pod there)
- launch Loki
- configure alerts
For the GitHub exporter, I’ll probably just make a Deployment manifest in templates
of the main chart.
So, let’s now recall on Loki installation, because when I did it six months ago it was a bit hard. I hope I didn’t change too much, and I can just take the config from the Grafana Loki: architecture and running in Kubernetes with AWS S3 storage and boltdb-shipper post.
Running Grafana Loki with AWS S3
What do we need here?
- create an S3 bucket
- create an IAM Policy && IAM Role to access the bucket
- create a ConfigMap with the Loki config
- add the Loki chart as a subchart of our main chart
AWS CDK for S3 and IAM Role
Describe S3 and IAM, we are using AWS CDK:
... ################################## ### Grafana Loki AWS resources ### ################################## ### AWS S3 to store logs data and indexes loki_bucket_name = f"{environment}-grafana-loki" bucket = s3.Bucket( self, 'GrafanaLokiBucket', bucket_name=loki_bucket_name, block_public_access=s3.BlockPublicAccess.BLOCK_ALL ) # Create an IAM Role to be assumed by Loki grafana_loki_role = iam.Role( self, 'GrafanaLokiRole', # for Role's Trust relationships assumed_by=iam.FederatedPrincipal( federated=oidc_provider_arn, conditions={ 'StringEquals': { f'{oidc_provider_url.replace("https://", "")}:sub': f'system:serviceaccount:{monitoring_namespace}:loki' } }, assume_role_action='sts:AssumeRoleWithWebIdentity' ) ) # Attach an IAM Policies to that Role grafana_loki_policy = iam.PolicyStatement( actions=[ "s3:ListBucket", "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], resources=[ f"arn:aws:s3:::{loki_bucket_name}", f"arn:aws:s3:::{loki_bucket_name}/*" ] ) grafana_loki_role.add_to_policy(grafana_loki_policy) ... CfnOutput( self, 'GrafanaLokiRoleArn', value=grafana_loki_role.role_arn ) ...
Deploy it:
[simterm]
$ cdk deploy atlas-monitoring-dev ... Outputs: atlas-monitoring-dev.CloudwatchExporterRoleArn = arn:aws:iam::492***148:role/atlas-monitoring-dev-CloudwatchExporterRole0613A27-EU5LW9XRWVRL atlas-monitoring-dev.GrafanaLokiRoleArn = arn:aws:iam::492***148:role/atlas-monitoring-dev-GrafanaLokiRole27EECE19-1HLODQFKFLDNK ...
[/simterm]
And now can we add a subchart.
Loki Helm chart installation
Add the repository, find a latest version of the chart:
[simterm]
$ helm repo add grafana https://grafana.github.io/helm-charts $ helm search repo grafana/loki NAME CHART VERSION APP VERSION DESCRIPTION grafana/loki 5.8.9 2.8.2 Helm chart for Grafana Loki in simple, scalable... grafana/loki-canary 0.12.0 2.8.2 Helm chart for Grafana Loki Canary grafana/loki-distributed 0.69.16 2.8.2 Helm chart for Grafana Loki in microservices mode grafana/loki-simple-scalable 1.8.11 2.6.1 Helm chart for Grafana Loki in simple, scalable... grafana/loki-stack 2.9.10 v2.6.1 Loki: like Prometheus, but for logs.
[/simterm]
Here again a bunch of charts:
loki-canary
: a system to check Loki workloki-distributed
: Loki in the microservice modesimple-scalable
: deprecated, it’s theloki
loki-stack
: all together – Grafana, Promtail, etc
We will use the grafana/loki 5.8.9
.
Add depending to our chart in the Chart.yaml
:
apiVersion: v2 name: atlas-victoriametrics description: A Helm chart for Atlas Victoria Metrics kubernetes monitoring stack type: application version: 0.1.0 appVersion: "1.16.0" dependencies: - name: victoria-metrics-k8s-stack version: ~0.17.0 repository: https://victoriametrics.github.io/helm-charts/ - name: prometheus-cloudwatch-exporter version: ~0.25.1 repository: https://prometheus-community.github.io/helm-charts - name: loki version: ~5.8.9 repository: https://grafana.github.io/helm-charts
All default values are here>>>, I took them from my old config – everything worked:
... loki: loki: auth_enabled: false commonConfig: path_prefix: /var/loki replication_factor: 1 storage: bucketNames: chunks: dev-grafana-loki type: s3 schema_config: configs: - from: "2023-07-20" index: period: 24h prefix: loki_index_ store: boltdb-shipper object_store: s3 schema: v12 storage_config: aws: s3: s3://us-east-1/dev-grafana-loki insecure: false s3forcepathstyle: true boltdb_shipper: active_index_directory: /var/loki/index shared_store: s3 rulerConfig: storage: type: local local: directory: /var/loki/rules serviceAccount: create: true annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::492***148:role/atlas-monitoring-dev-GrafanaLokiRole27EECE19-1HLODQFKFLDNK" write: replicas: 1 read: replicas: 1 backend: replicas: 1 test: enabled: false monitoring: dashboards: enabled: false rules: enabled: false alerts: enabled: false serviceMonitor: enabled: false selfMonitoring: enabled: false grafanaAgent: installOperator: false lokiCanary: enabled: false ...
Will need to add Loki alerts, but will do it another time (see Grafana Loki: alerts from the Loki Ruler and labels from logs)
Promtail Helm chart installation
Let’s run Promtail in the cluster to check Loki, and to have logs from the cluster.
Find versions of the chart:
[simterm]
$ helm search repo grafana/promtail -l | head NAME CHART VERSION APP VERSION DESCRIPTION grafana/promtail 6.11.7 2.8.2 Promtail is an agent which ships the contents o... grafana/promtail 6.11.6 2.8.2 Promtail is an agent which ships the contents o... grafana/promtail 6.11.5 2.8.2 Promtail is an agent which ships the contents o... ...
[/simterm]
Add it as a subchart to the dependencies
of our Chart.yaml
:
apiVersion: v2 name: atlas-victoriametrics description: A Helm chart for Atlas Victoria Metrics kubernetes monitoring stack type: application version: 0.1.0 appVersion: "1.16.0" dependencies: - name: victoria-metrics-k8s-stack version: ~0.17.0 repository: https://victoriametrics.github.io/helm-charts/ - name: prometheus-cloudwatch-exporter version: ~0.25.1 repository: https://prometheus-community.github.io/helm-charts - name: loki version: ~5.8.9 repository: https://grafana.github.io/helm-charts - name: promtail version: ~6.11.7 repository: https://grafana.github.io/helm-charts
Find a Service for the Loki:
[simterm]
$ kk -n dev-monitoring-ns get svc | grep loki-gateway loki-gateway ClusterIP 172.20.102.186 <none> 80/TCP 160m
[/simterm]
Add values for the Promtail with the loki.serviceName
:
... promtail: loki: serviceName: "loki-gateway"
Deploy, and check the Pods:
[simterm]
$ kk -n dev-monitoring-ns get pod | grep 'loki\|promtail' atlas-victoriametrics-promtail-cxwpz 0/1 Running 0 17m atlas-victoriametrics-promtail-hv94f 1/1 Running 0 17m loki-backend-0 0/1 Running 0 9m55s loki-gateway-749dcc85b6-5d26n 1/1 Running 0 3h4m loki-read-6cf6bc7654-df82j 1/1 Running 0 57s loki-write-0 0/1 Running 0 52s
[/simterm]
Add a new Grafana Data Source via additionalDataSources
(see Provision the data source ) for the Loki:
... grafana: enabled: true ... additionalDataSources: - name: Loki type: loki access: proxy url: http://loki-gateway:80 jsonData: maxLines: 1000 ...
Deploy, and check data sources:
And we must see our logs in Grafana:
Now let’s see what about alerts in VictoriaMetrics.
Configuring alerts with VMAlert
We already have a VMAlert && Alertmanager Pods running from the chart described in the vmalert
and alertmanager
:
[simterm]
$ kk -n dev-monitoring-ns get pod | grep alert vmalert-vm-k8s-stack-dff5bf755-57rxd 2/2 Running 0 6d19h vmalertmanager-vm-k8s-stack-0 2/2 Running 0 6d19h
[/simterm]
First, let’s look at how Alertmanager is configured because alerts will be sent through it.
Alertmanager configuration
Documentation – VMAlertmanagerSpec.
Let’s find its config file:
[simterm]
$ kk -n dev-monitoring-ns describe pod vmalertmanager-vm-k8s-stack-0 ... Args: .... --config.file=/etc/alertmanager/config/alertmanager.yaml ... Mounts: ... /etc/alertmanager/config from config-volume (rw) ... Volumes: config-volume: Type: Secret (a volume populated by a Secret) SecretName: vmalertmanager-vm-k8s-stack-config ...
[/simterm]
That is, the /etc/alertmanager/config/alertmanager.yaml
file is mounted from a Kubernetes Secret vmalertmanager-vm-k8s-stack-config
:
[simterm]
$ kk -n dev-monitoring-ns get secret vmalertmanager-vm-k8s-stack-config -o yaml | yq '.data' { "alertmanager.yaml": "Z2x***GwK" }
[/simterm]
Check the content with base64 -d
or on the website www.base64decode.org.
Now let’s add our own config.
Here, again, we will have to think about a secret, because in the slack_api_url
we have a token. I think, will do the same as with the Sentry token – just pass via --set
.
Update our values/dev/atlas-monitoring-dev-values.yaml
:
... alertmanager: enabled: true config: global: resolve_timeout: 5m slack_api_url: "" route: repeat_interval: 12h group_by: ["alertname"] receiver: 'slack-default' routes: [] receivers: - name: "slack-default" slack_configs: - channel: "#alerts-devops" send_resolved: true title: '{{ template "slack.monzo.title" . }}' icon_emoji: '{{ template "slack.monzo.icon_emoji" . }}' color: '{{ template "slack.monzo.color" . }}' text: '{{ template "slack.monzo.text" . }}' actions: # self - type: button text: ':grafana: overview' url: '{{ (index .Alerts 0).Annotations.grafana_url }}' - type: button text: ':grafana: Loki Logs' url: '{{ (index .Alerts 0).Annotations.logs_url }}' - type: button text: ':mag: Alert query' url: '{{ (index .Alerts 0).GeneratorURL }}' - type: button text: ':aws: AWS dashboard' url: '{{ (index .Alerts 0).Annotations.aws_dashboard_url }}' - type: button text: ':aws-cloudwatch: AWS CloudWatch Metrics' url: '{{ (index .Alerts 0).Annotations.aws_cloudwatch_url }}' - type: button text: ':aws-cloudwatch: AWS CloudWatch Logs' url: '{{ (index .Alerts 0).Annotations.aws_logs_url }}' ...
Although in my current monitoring I have my own nice template for Slack, for now, let’s see what this Monzo looks like.
Deploy chart with the --set victoria-metrics-k8s-stack.alertmanager.config.global.slack_api_url=$slack_url
:
[simterm]
$ slack_url="https://hooks.slack.com/services/T02***37X" $ helm -n dev-monitoring-ns upgrade --install atlas-victoriametrics . -f values/dev/atlas-monitoring-dev-values.yaml --set grafana.sentry_token=$sentry_token --set victoria-metrics-k8s-stack.alertmanager.config.global.slack_api_url=$slack_url --debug
[/simterm]
And let’s check.
Find an Alertmanager Service:
[simterm]
$ kk -n dev-monitoring-ns get svc | grep alert vmalert-vm-k8s-stack ClusterIP 172.20.251.179 <none> 8080/TCP 6d20h vmalertmanager-vm-k8s-stack ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 6d20h
[/simterm]
Run port-forward
:
[simterm]
$ kk -n dev-monitoring-ns port-forward svc/vmalertmanager-vm-k8s-stack 9093
[/simterm]
And send an alert with cRUL:
[simterm]
$ curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"testalert"}}]' http://127.0.0.1:9093/api/v1/alerts {"status":"success"}
[/simterm]
Check in the Slack:
Wait, what?)))
Okay. It works, but I don’t like these message templates for Slack, so better to take the ones I already have on the old monitoring.
Custom Slack messages template
This custom template is connected via ConfigMap vmalertmanager-vm-k8s-stack-0
:
[simterm]
... Volumes: ... templates-vm-k8s-stack-alertmanager-monzo-tpl: Type: ConfigMap (a volume populated by a ConfigMap) Name: vm-k8s-stack-alertmanager-monzo-tpl Optional: false ...
[/simterm]
And are enabled with the monzoTemplate.enabled=true
parameter.
Let’s add a templateFiles
, where we can describe our own templates:
... alertmanager: enabled: true monzoTemplate: enabled: false templateFiles: slack.tmpl: |- {{/* Title of the Slack alert */}} {{ define "slack.title" -}} {{ if eq .Status "firing" }} :scream: {{- else -}} :relaxed: {{- end -}} [{{ .Status | toUpper -}} {{- if eq .Status "firing" -}}:{{ .Alerts.Firing | len }} {{- end }}] {{ (index .Alerts 0).Annotations.summary }} {{ end }} {{ define "slack.text" -}} {{ range .Alerts }} {{- if .Annotations.description -}} {{- "\n\n" -}} *Description*: {{ .Annotations.description }} {{- end }} {{- end }} {{- end }} ...
Deploy and check the ConfigMap, which is described in the custom-templates.yaml
:
[simterm]
$ kk -n dev-monitoring-ns get cm | grep extra vm-k8s-stack-alertmanager-extra-tpl 1 2m4s
[/simterm]
Check volumes in the Pod:
[simterm]
$ kk -n dev-monitoring-ns exec -ti vmalertmanager-vm-k8s-stack-0 -- ls -l /etc/vm/templates/ Defaulted container "alertmanager" out of: alertmanager, config-reloader total 0 drwxrwxrwx 3 root root 78 Jul 20 10:06 vm-k8s-stack-alertmanager-extra-tpl
[/simterm]
And wait for an alert:
Now everything is beautiful. Move on to create your own alerts.
VMAlert alerts with VMRules
Documentation – VMAlert.
So how to add our alerts to VMAlert?
VMAlert uses VMRules, which it selects by ruleSelector
:
[simterm]
$ kk -n dev-monitoring-ns get vmrule NAME AGE vm-k8s-stack-alertmanager.rules 6d19h vm-k8s-stack-etcd 6d19h vm-k8s-stack-general.rules 6d19h vm-k8s-stack-k8s.rules 6d19h ...
[/simterm]
That is, we can describe the necessary alerts in the VMRules manifests, deploy them, and VMAlert will pick them up.
Let’s take a look at VMAlert itself – we have only one here, and it will be enough for us for now:
[simterm]
$ kk -n dev-monitoring-ns get vmalert vm-k8s-stack -o yaml apiVersion: operator.victoriametrics.com/v1beta1 kind: VMAlert ... spec: datasource: url: http://vmsingle-vm-k8s-stack.dev-monitoring-ns.svc:8429/ evaluationInterval: 15s extraArgs: remoteWrite.disablePathAppend: "true" image: tag: v1.91.3 notifiers: - url: http://vmalertmanager-vm-k8s-stack.dev-monitoring-ns.svc:9093 remoteRead: url: http://vmsingle-vm-k8s-stack.dev-monitoring-ns.svc:8429/ remoteWrite: url: http://vmsingle-vm-k8s-stack.dev-monitoring-ns.svc:8429/api/v1/write resources: {} selectAllByDefault: true
[/simterm]
Let’s try to add a test alert – create a victoriametrics/templates/vmalert-vmrules-test.yaml
file with the kind: VMRule
:
apiVersion: operator.victoriametrics.com/v1beta1 kind: VMRule metadata: name: vmrule-test # no need now, as we have one VMAlert with selectAllByDefault #labels: # project: devops spec: groups: - name: testing-rule rules: - alert: TestAlert expr: up == 1 for: 1s labels: severity: test job: '{{ "{{" }} $labels.job }}' summary: Testing VMRule annotations: value: 'Value: {{ "{{" }} $value }}' description: 'Monitoring job {{ "{{" }} $labels.job }} failed'
Here we add a crutch in the form of "{{" }}
because {{ }}
is used both by Helm itself and by alerts.
Deploy, check the vmrule-test
VMRule:
[simterm]
$ kk -n dev-monitoring-ns get vmrule vmrule-test -o yaml apiVersion: operator.victoriametrics.com/v1beta1 kind: VMRule ... spec: groups: - name: testing-rule rules: - alert: TestAlert annotations: description: Monitoring job {{ $labels.job }} failed value: 'Value: {{ $value }}' summary: Testing VMRule expr: up == 1 for: 1s labels: job: '{{ $labels.job }}' severity: test
[/simterm]
Wait for an alert in Slack
“It works!”
Actually, that’s all – looks like I described the main points for the VictoriaMetrics Kubernetes Monitoring Stack installation.