So, the task is spin up a Prometheus server and all necessary exporter in an AWS Elastic Kubernetes Cluster and then via /federation pass metrics to our “central” Prometheus server with Alertmanager alerts and Grafana dashboards.
A bit confusing is the whole set of such Helm charts – there is a “simple” Prometheus chart, and kube-prometheus, and prometheus-operator:
description: Provides easy monitoring definitions for Kubernetes services, and deployment
and management of Prometheus instances.
name: stable/prometheus-operator
version: 8.14.0
The difference between stable/prometheus and stable/prometheus-operator is that Operator has built-in Grafana with a set of ready for use dashboards and set of ServiceMonitors to collect metrics from a cluster’s services such as the CoreDNS, API Server, Scheduler, etc.
Which defines which ServiceMonitors will be added under the observation of the Prometheus server in the cluster.
Adding new application under monitoring
Now, let’s try to add an additional service under monitoring:
spin up a Redis server
and redis_exporter
will add ServiceMonitor
and eventually, we will configure Prometheus Operator to use the ServiceMonitor to collect metrics from the redis_exporter
Redis server launch
Create a namespace to make this setup more realistic, when a monitored application is located in one namespace, while monitorings services are living in an another:
kk create ns redis-test
namespace/redis-test created
Run Redis:
kk -n redis-test run redis --image=redis
deployment.apps/redis created
Create a Service object for Redis network communication:
HELP redis_up Information about the Redis instance
TYPE redis_up gauge
redis_up 1
HELP redis_uptime_in_seconds uptime_in_seconds metric
TYPE redis_uptime_in_seconds gauge
redis_uptime_in_seconds 2793
Or without an additional port – just by running port-forward to the redis-svc:
kk -n monitoring port-forward svc/redis-exporter-prometheus-redis-exporter 9121:9121
Forwarding from 127.0.0.1:9121 -> 9121
Forwarding from [::1]:9121 -> 9121
And run from your local PC:
curl localhost:9121/metrics
...
redis_up 1
HELP redis_uptime_in_seconds uptime_in_seconds metric
TYPE redis_uptime_in_seconds gauge
redis_uptime_in_seconds 8818
Okay – we’ve got our Redis Server application and its Prometheus exporter with metrics available via the /metrics URI and 9121 port.
The next thing is t configure Prometheus Operator to collect those metrics to its database so later they are pulled by the “central” monitoring via the Prometheus federation.
Adding Kubernetes ServiceMonitor
Check our redis_exporter‘s labels:
kk -n monitoring get deploy redis-exporter-prometheus-redis-exporter -o yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
...
generation: 1
labels:
app: prometheus-redis-exporter
app.kubernetes.io/managed-by: Helm
chart: prometheus-redis-exporter-3.4.1
heritage: Helm
release: redis-exporter
...
And check serviceMonitorSelector‘s Selector of the prometheus resource created above:
In its labels we’ve set the release: prometheus so Prometheus can find it, and in the selector.matchLabels – specified to look for any Service tagged as release: redis-exporter.
Apply it:
kk -n monitoring apply -f redis-service-monitor.yaml
servicemonitor.monitoring.coreos.com/redis-servicemonitor created
Check Prometheus’ Targets:
Redis Server metrics:
Great – “It works!” (c)
What is next?
And the next thing to do is to remove Alertmanager and Grafana from our Prometheus Operator stack.
Prometheus Operator Helm deployment and its services configuration
Actually, why need to remove Grafana? It has really useful dashboards ready, so let’s leave it there.
The better way seems to be:
on each EKS we will have Prometheus Operator with Grafana but without Alertmanager
on cluster’s local Prometheus servers let’s leave the default retention period to store metrics set to 2 weeks
and on the central monitoring server – we will keep our metrics for one year and will have some Grafana dashboards there
So, we need to add two LoadBalacners – one with the internet-facing type for the Grafana service and another one – internal – for the Prometheus, because it will communicate with our central monitoring host via AWS VPC Peering.
Error: UPGRADE FAILED: failed to create resource: Ingress.extensions "prometheus-grafana" is invalid: spec: Invalid value: []networking.IngressRule(nil): either `backend` or `rules` must be specified
Er…
The documentation says nothing about LoadBalancers configuration, but I googled some in the Jupyter’s docs here.
Let’s try it – now via the values.yaml file to avoid a bunch of --set.
I0617 11:37:48.272749 1 tags.go:43] monitoring/prometheus-grafana: modifying tags { ingress.k8s.aws/cluster: “bttrm-eks-dev-0”, ingress.k8s.aws/stack: “monitoring/prometheus-grafana”, kubernetes.io/service-name: “prometheus-grafa
na”, kubernetes.io/service-port: “80”, ingress.k8s.aws/resource: “monitoring/prometheus-grafana-prometheus-grafana:80”, kubernetes.io/cluster/bttrm-eks-dev-0: “owned”, kubernetes.io/namespace: “monitoring”, kubernetes.io/ingress-name
: “prometheus-grafana”} on arn:aws:elasticloadbalancing:us-east-2:534***385:targetgroup/96759da8-e0b8253ac04c7ceacd7/35de144cca011059
E0617 11:37:48.310083 1 controller.go:217] kubebuilder/controller “msg”=”Reconciler error” “error”=”failed to reconcile targetGroups due to failed to reconcile targetGroup targets due to prometheus-grafana service is not of type NodePort or LoadBalancer and target-type is instance” “controller”=”alb-ingress-controller” “request”={“Namespace”:”monitoring”,”Name”:”prometheus-grafana”}
Okay – add the /target-type: "ip" so AWS ALB will send traffic directly to the Grafana’s pod instead of WorkerNodes port and let’s add valid HTTP codes:
And even more – now we are able to see all graphs in the Lens utility used by our developers – earlier they got the “Metrics are not available due to missing or invalid Prometheus configuration” и “Metrics not available at the moment” error messages:
Actually, that’s all to start using Prometheus Operator to monitor Your Kubernetes cluster