Tag Archives: monitoring

Kubernetes: HorizontalPodAutoscaler – an overview with examples
0 (0)

12 August 2020

Kubernetes HorizontalPodAutoscaler automatically scales Kubernetes Pods under ReplicationController, Deployment, or ReplicaSet controllers basing on its CPU, memory, or other metrics. It was shortly discussed in the Kubernetes: running metrics-server in AWS EKS for a Kubernetes Pod AutoScaler post, now let’s go deeper to check all options available for scaling. For HPA you can use three… Read More: Kubernetes: HorizontalPodAutoscaler – an overview with examples0 (0) »

Kubernetes: monitoring with Prometheus – exporters, a Service Discovery, and its roles
0 (0)

26 April 2020

The next task with our Kubernetes cluster is to set up its monitoring with Prometheus. This task is complicated by the fact, that there is the whole bunch of resources needs to be monitored: from the infrastructure side – ЕС2 WokerNodes instances, their CPU, memory, network, disks, etc key services of Kubernetes itself – its… Read More: Kubernetes: monitoring with Prometheus – exporters, a Service Discovery, and… »

Kubernetes: running metrics-server in AWS EKS for a Kubernetes Pod AutoScaler
0 (0)

15 February 2020

Assuming, we already have an AWS EKS cluster with worker nodes. In this post – we will connect to a newly created cluster, will create a test deployment with an HPA – Kubernetes Horizontal Pod AutoScaler and will try to get information about resources usage using kubectl top. Kubernetes cluster Create a test cluster using… Read More: Kubernetes: running metrics-server in AWS EKS for a Kubernetes Pod… »

Grafana: Loki – the LogQL’s Prometheus-like counters, aggregation functions and dnsmasq’s requests graphs
0 (0)

17 November 2019

The last time I configured Loki for logs collecting and monitoring was in February 2019 – almost a year ago, see the Grafana Labs: Loki – logs collecting and monitoring system post, when Loki was in its Beta state. Now we faced with outgoing traffic issues in our Production environments and can’t find who guilty for… Read More: Grafana: Loki – the LogQL’s Prometheus-like counters, aggregation functions and… »

Debian: logrotate won’t rotate logs with an “unknown group ‘syslog'” error
0 (0)

9 October 2019

We have an AWS EC2 with Debian and logrotate. One day its root partition was exhausted and when I started investigating it – found, that we have a bunch of files like /var/log/syslog.N.gz. At the same time by default logrotate creates a config file to rotate syslog log files: [simterm] root@monitoring-dev:~# cat /etc/logrotate.d/syslog # Ansible… Read More: Debian: logrotate won’t rotate logs with an “unknown group ‘syslog'”… »

OpsGenie: Uptrends integration
0 (0)

24 September 2019

Uptrends – just a simple pinging monitoring service already used for the RTFM blog (see the Prometheus: RTFM blog monitoring set up with Ansible – Grafana, Loki, and promtail post for more details). Now I’d like to add it as an additional alerting service for the project’s API-endpoints and configure its alerts to be sent… Read More: OpsGenie: Uptrends integration0 (0) »

Sentry: running self-hosted errors tracking system on an AWS EC2
0 (0)

18 May 2019

Previously we used cloud-based Sentry version but then reached emails limit and our backend-team left without those notifications which are critical for their work. A self-hosted version was planned a long time ago so now we have a chance to spin it up. The post below describes how to start self-hosted Sentry on an AWS… Read More: Sentry: running self-hosted errors tracking system on an AWS EC20… »

Prometheus: Alertmanager’s alerts receivers and routing based on severity level and tags
0 (0)

26 March 2019

We have three working environments – Dev, Stage, Production. Also, there are a bunch of alerts with different severities – info, warning и critical. For example: … – name: SSLexpiry.rules rules: – alert: SSLCertExpiring30days expr: probe_ssl_earliest_cert_expiry{job=”blackbox”} – time() < 86400 * 30 for: 10m labels: severity: info annotations: summary: “SSL certificate warning” description: “SSL certificate… Read More: Prometheus: Alertmanager’s alerts receivers and routing based on severity level… »

Prometheus: Alertmanager – send alerts to a “/dev/null”
0 (0)

26 March 2019

In addition to the Prometheus: Alertmanager’s alerts receivers and routing based on severity level and tags post. Have an Alertmanager config with routes. The task is – send all alerts from a Dev-environment into a “/dev/null”. To do this – create an empty receiver: … receivers: – name: ‘blackhole’ – name: ‘default’ slack_configs: – send_resolved:… Read More: Prometheus: Alertmanager – send alerts to a “/dev/null”0 (0) »

Monit: email alerting on an SSH logins
0 (0)

18 March 2019

The task is to send an email alert when SSH-login was made from a not whitelisted IPs. Will use Monit here. Install it: [simterm] root@jenkins-dev:/home/admin# apt update && apt -y install monit [/simterm] Configure email settings: set localhost (we have a local eximhere), email’s format and email’s receiver. Edit the /etc/monit/monitrc file: … set mailserver localhost… Read More: Monit: email alerting on an SSH logins0 (0) »

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31