Category Archives: Monitoring

Hardware, services and network monitoring systems

Karpenter: its monitoring, and Grafana dashboard for Kubernetes WorkerNodes
0 (0)

18 February 2024

We have an AWS Elastic Kubernetes Service cluster with Karpenter which is responsible for EC2 auto-scaling, see AWS: Getting started with Karpenter for autoscaling in EKS, and its installation with Helm. In general, there are no problems with it so far, but in any case we need to monitor it. For its monitoring, Karpenter provides… Read More »

Loading

AWS: CloudWatch – Multi source query: collecting metrics from an external Prometheus
0 (0)

13 December 2023

Another interesting announcement from the last re:Invent is that CloudWatch has added the ability to collect metrics from external resources (see a very interesting report AWS re:Invent 2023 – Cloud operations for today, tomorrow, and beyond (COP227)). That is, we can now create graphs and/or alerts not only from the default metrics of CloudWatch itself,… Read More »

Loading

Grafana Loki: collecting AWS LoadBalancer logs from S3 with Promtail Lambda
0 (0)

25 November 2023

Currently, we are able to collect our API Gateway logs from the CloudWatch Logs to Grafana Loki, see. Loki: collecting logs from CloudWatch Logs using Lambda Promtail. But in the process of migrating to Kubernetes, we have Application Load Balancers that can only write logs to S3, and we need to learn how to collect… Read More »

Loading

VictoriaMetrics: pushing metrics without Prometheus Pushgateway
0 (0)

18 November 2023

In the Prometheus: running Pushgateway on Kubernetes with Helm and Terraform post I wrote about how to add Pushgateway to Prometheus, which allows using the Push model instead of Pull, that is, an Exporter can send metrics directly to the database instead of waiting for Prometheus or VMAgent to come to it. With VictoriaMetrics, it’s… Read More »

Loading

VictoriaMetrics: VMAuth – Proxy, Authentication, and Authorization
0 (0)

27 August 2023

We continue to develop our monitoring stack. See the first part – VictoriaMetrics: creating a Kubernetes monitoring stack with its own Helm chart. What do we want to do next: give access to developers so that they can set Silence for alerts themselves in Alertmanager to avoid spamming Slack, see Prometheus: Alertmanager Web UI alerts Silence.… Read More »

Loading

Grafana: values ​​from records in Loki logs, and dual-Y-axes panels in Grafana
0 (0)

19 August 2023

We have a function in AWS Lambda, that is writing logs to CloudWatch Logs, from where with the lambda-promtail we are getting them to a Grafana Loki instance to use them in Grafana graphs. What the task is: in the logs, we have records about “Init duration” and “Max Memory Used” by Lambdas. There are no… Read More »

Loading

Grafana Loki: performance optimization with Recording Rules, caching, and parallel queries
0 (0)

19 August 2023

So, we have Loki installed from the chart in simple-scale mode, see Grafana Loki: architecture and running in Kubernetes with AWS S3 storage and boltdb-shipper. Loki is runnings on an AWS Elastic Kubernetes Service cluster, installed with Loki Helm chart, AWS S3 is used as a long-term store, and BoltDB Shipper is used to work… Read More »

Loading

AWS: Grafana Loki, InterZone traffic in AWS, and Kubernetes nodeAffinity
0 (0)

19 August 2023

Traffic in AWS is generally quite an interesting and sometimes complicated thing, I once wrote about it in the AWS: Cost optimization – services expenses overview and traffic costs in AWS. Now, it’s time to return to this topic again. So, what’s the problem: in AWS Cost Explorer, I’ve noticed that we have an increase… Read More »

Loading

VictoriaMetrics: deploying a Kubernetes monitoring stack
0 (0)

23 July 2023

Now we have VictoriaMetrics + Grafana on a regular EC2 instance, launched with Docker Compose, see the VictoriaMetrics: an overview and its use instead of Prometheus. It was kind of a Proof of Concept, and it’s time to launch it “in an adult way” – in Kubernetes and all the configurations stored in a GitHub… Read More »

Loading