Category Archives: Monitoring

Hardware, services and network monitoring systems

VictoriaLogs: an overview, run in Kubernetes, LogsQL, and Grafana

9 September 2024
 

 VictoriaLogs is a relatively new system for collecting and analyzing logs, similar to Grafana Loki, but – like VictoriaMetrics compared to vanilla Prometheus – less demanding on CPU/Memory resources. Personally, I’ve been using Grafana Loki for about 5 years, but sometimes I have concerns about it – both in terms of documentation and the overall… Read More »

EcoFlow: monitoring with Prometheus and Grafana

7 July 2024
 

 In continuation of the topic with Підготовка до зими 2024-2025: ДБЖ, інвертори, та акумулятори (in Ukrainian). Surprise – there’s even a Prometheus exporter for the EcoFlow – berezhinskiy/ecoflow_exporter! It looks really cool. I launched it, looked at it, and ran to write this post. It can be run in a couple of clicks with Docker… Read More »

AWS: Cost optimization – an overview of Bills, Cost Explorer, and the costs control

23 June 2024
 

 Let’s continue our series on cost optimization in AWS. Previous posts: AWS: cost optimization – purchasing RDS Reserved Instances AWS: Cost Explorer – costs checking on the CloudWatch Logs example AWS: Cost optimization – services expenses overview and traffic costs in AWS Now that we understand what we pay for in AWS, let’s see what… Read More »

Kubernetes: monitoring Events with kubectl and Grafana Loki

23 June 2024
 

  In Kubernetes, in addition to metrics and logs from containers, we can get information about the operation of components using Kubernetes Events. Events usually store information about the status of Pods (creation, evict, kill, ready or not-ready status of pods), WorkerNodes (status of servers), Kubernetes Scheduler (inability to start a pod, etc.). Kubernetes Events… Read More »

AWS: VPC Flow Logs, NAT Gateways, and Kubernetes Pods – a detailed overview

5 May 2024
 

 We have a relatively large spending on AWS NAT Gateway Processed Bytes, and it became interesting to know what exactly is processed through it. It would seem that everything is simple – just turn on VPC Flow Logs and see what’s what. But when it comes to AWS Elastic Kubernetes Service and NAT Gateways, things… Read More »

Kubernetes: tracing requests with AWS X-Ray, and Grafana data source

2 March 2024
 

 Tracing allows you to track requests between components, that is, for example, when using AWS and Kubernetes we can trace the entire path of a request from AWS Load Balancer to Kubernetes Pod and to DynamoDB or RDS. This helps us both to track performance issues – where and which requests are taking a long… Read More »

Terraform: creating a module for collecting AWS ALB logs in Grafana Loki

24 February 2024
 

 An example of creating a Terraform module to automate log collection from AWS Load Balancers in Grafana Loki. See how the scheme works in the Grafana Loki: collecting AWS LoadBalancer logs from S3 with Promtail Lambda blog. In short, ALB writes logs to an S3 bucket, from where they are picked up by a Lambda… Read More »

Grafana Loki: LogQL and Recording Rules for metrics from AWS Load Balancer logs

24 February 2024
 

 I didn’t plan this post at all as I thought I would do it quickly, but it didn’t work out quickly, and I need to dig a little deeper into this topic. So, what we are talking about: we have AWS Load Balancers, logs from which are collected to Grafana Loki, see. Grafana Loki: collecting… Read More »

Karpenter: its monitoring, and Grafana dashboard for Kubernetes WorkerNodes

18 February 2024
 

 We have an AWS Elastic Kubernetes Service cluster with Karpenter which is responsible for EC2 auto-scaling, see AWS: Getting started with Karpenter for autoscaling in EKS, and its installation with Helm. In general, there are no problems with it so far, but in any case we need to monitor it. For its monitoring, Karpenter provides… Read More »

AWS: CloudWatch – Multi source query: collecting metrics from an external Prometheus

13 December 2023
 

 Another interesting announcement from the last re:Invent is that CloudWatch has added the ability to collect metrics from external resources (see a very interesting report AWS re:Invent 2023 – Cloud operations for today, tomorrow, and beyond (COP227)). That is, we can now create graphs and/or alerts not only from the default metrics of CloudWatch itself,… Read More »