Kubernetes: a cluster’s monitoring with the Prometheus Operator

By | 08/13/2020

Continuing with the Kubernetes: monitoring with Prometheus – exporters, a Service Discovery, and its roles, where we configured Prometheus manually to see how it’s working – now, let’s try to use Prometheus Operator installed via Helm chart.

So, the task is spin up a Prometheus server and all necessary exporter in an AWS Elastic Kubernetes Cluster and then via /federation pass metrics to our “central” Prometheus server with Alertmanager alerts and Grafana dashboards.

A bit confusing is the whole set of such Helm charts – there is a “simple” Prometheus chart, and  kube-prometheus, and prometheus-operator:

Although if look for it via helm search – it returns the only one prometheus-operator:

helm search repo stable/prometheus-operator -o yaml
- app_version: 0.38.1
description: Provides easy monitoring definitions for Kubernetes services, and deployment
and management of Prometheus instances.
name: stable/prometheus-operator
version: 8.14.0

The difference between  stable/prometheus and stable/prometheus-operator is that Operator has built-in Grafana with a set of ready for use dashboards and set of ServiceMonitors to collect metrics from a cluster’s services such as the CoreDNS, API Server, Scheduler, etc.

So, as mentioned – we will use the stable/prometheus-operator.

Prometheus Operator deployment

Deploy it with Helm:

helm install --namespace monitoring --create-namespace prometheus stable/prometheus-operator
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
NAME: prometheus
LAST DEPLOYED: Mon Jun 15 17:54:27 2020
NAMESPACE: monitoring
STATUS: deployed
The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l "release=prometheus"
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

Check pods:

kk -n monitoring get pod
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0   2/2     Running   0          41s
prometheus-grafana-85c9fbc85c-ll58c                      2/2     Running   0          46s
prometheus-kube-state-metrics-66d969ff69-6b7t8           1/1     Running   0          46s
prometheus-prometheus-node-exporter-89mf4                1/1     Running   0          46s
prometheus-prometheus-node-exporter-bpn67                1/1     Running   0          46s
prometheus-prometheus-node-exporter-l9wjm                1/1     Running   0          46s
prometheus-prometheus-node-exporter-zk4cm                1/1     Running   0          46s
prometheus-prometheus-oper-operator-7d5f8ff449-fl6x4     2/2     Running   0          46s
prometheus-prometheus-prometheus-oper-prometheus-0       3/3     Running   1          31

Note: alias kk="kubectl" >> ~/.bashrc

So, the Prometheus Operator’s Helm chart created a whole bunch of services – Prometheus itself, Alertmanager, Grafana, plus a set of ServiceMonitors:

kk -n monitoring get servicemonitor
NAME                                                 AGE
prometheus-prometheus-oper-alertmanager              3m53s
prometheus-prometheus-oper-apiserver                 3m53s
prometheus-prometheus-oper-coredns                   3m53s
prometheus-prometheus-oper-grafana                   3m53s
prometheus-prometheus-oper-kube-controller-manager   3m53s
prometheus-prometheus-oper-kube-etcd                 3m53s
prometheus-prometheus-oper-kube-proxy                3m53s
prometheus-prometheus-oper-kube-scheduler            3m53s
prometheus-prometheus-oper-kube-state-metrics        3m53s
prometheus-prometheus-oper-kubelet                   3m53s
prometheus-prometheus-oper-node-exporter             3m53s
prometheus-prometheus-oper-operator                  3m53s
prometheus-prometheus-oper-prometheus                3m53s

The ServiceMonitors role will be reviewed in this post later in the Adding Kubernetes ServiceMonitor part.

Grafana access

Let’s go to see which dashboards are shipped with Grafana.

Find Grafana’s pod:

kk -n monitoring get pod
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0   2/2     Running   0          103s
prometheus-grafana-85c9fbc85c-wl856                      2/2     Running   0          107s

Run port-forward:

kk -n monitoring port-forward prometheus-grafana-85c9fbc85c-wl856 3000:3000
Forwarding from -> 3000
Forwarding from [::1]:3000 -> 3000

Open localhost:3000, log in with the admin username and prom-operator password and you’ll see a lot of ready for user graphs:

At the time of writing Prometheus Operator is shipped with Grafana version 7.0.3.

Actually, we don’t need Grafana and Alertmanager here, as they are used on our “central” monitoring server, so let’s remove them from here.

Prometheus Operator configuration

Prometheus Operator uses Custom Resource Definitions which describes all its components:

kk -n monitoring get crd
NAME                                    CREATED AT
alertmanagers.monitoring.coreos.com     2020-06-15T14:47:44Z
eniconfigs.crd.k8s.amazonaws.com        2020-04-10T07:21:20Z
podmonitors.monitoring.coreos.com       2020-06-15T14:47:45Z
prometheuses.monitoring.coreos.com      2020-06-15T14:47:46Z
prometheusrules.monitoring.coreos.com   2020-06-15T14:47:47Z
servicemonitors.monitoring.coreos.com   2020-06-15T14:47:47Z
thanosrulers.monitoring.coreos.com      2020-06-15T14:47:48Z

For example, the prometheuses.monitoring.coreos.com CRD describes a Custom Resource named Prometheus:

kk -n monitoring get crd prometheuses.monitoring.coreos.com -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
kind: Prometheus
listKind: PrometheusList
plural: prometheuses
singular: prometheus

And then you can use this name as a common Kubernetes object like pods, nodes, volumes, etc by using its name from the names field:

kk -n monitoring get prometheus -o yaml
apiVersion: v1
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: monitoring
- apiVersion: v2
name: prometheus-prometheus-oper-alertmanager
namespace: monitoring
pathPrefix: /
port: web
baseImage: quay.io/prometheus/prometheus
enableAdminAPI: false
externalUrl: http://prometheus-prometheus-oper-prometheus.monitoring:9090
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
release: prometheus
portName: web
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
app: prometheus-operator
release: prometheus
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-prometheus-oper-prometheus
serviceMonitorNamespaceSelector: {}
release: prometheus
version: v2.18.1

For now, we are interested in the serviceMonitorSelector record:

    release: prometheus

Which defines which ServiceMonitors will be added under the observation of the Prometheus server in the cluster.

Adding new application under monitoring

Now, let’s try to add an additional service under monitoring:

  1. spin up a Redis server
  2. and redis_exporter
  3. will add ServiceMonitor
  4. and eventually, we will configure Prometheus Operator to use the ServiceMonitor to collect metrics from the redis_exporter

Redis server launch

Create a namespace to make this setup more realistic, when a monitored application is located in one namespace, while monitorings services are living in an another:

kk create ns redis-test
namespace/redis-test created

Run Redis:

kk -n redis-test run redis --image=redis
deployment.apps/redis created

Create a Service object for Redis network communication:

kk -n redis-test expose deploy redis --type=ClusterIP --name redis-svc --port 6379
service/redis-svc exposed

Check it:

kk -n redis-test get svc redis-svc
redis-svc   ClusterIP   <none>        6379/TCP   25s

Okay – Redis is working, now let’s add its Prometheus exporter.

redis_exporter launch

Install it with Helm:

helm install -n monitoring redis-exporter --set "redisAddress=redis://redis-svc.redis-test.svc.cluster.local:6379" stable/prometheus-redis-exporter

Check its Service:

kk -n monitoring get svc
NAME                                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                      ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   89m
redis-exporter-prometheus-redis-exporter   ClusterIP    <none>        9121/TCP                     84s

And its endpoint:

kk -n monitoring get endpoints redis-exporter-prometheus-redis-exporter -o yaml
apiVersion: v1
kind: Endpoints
app: prometheus-redis-exporter
app.kubernetes.io/managed-by: Helm
chart: prometheus-redis-exporter-3.4.1
heritage: Helm
release: redis-exporter
name: redis-exporter-prometheus-redis-exporter
namespace: monitoring
- name: redis-exporter
port: 9121
protocol: TCP

Go to check metrics – spin up a new pod, for example with Debian, and install curl:

kk -n monitoring run --rm -ti debug --image=debian --restart=Never bash
If you don't see a command prompt, try pressing enter.
root@debug:/# apt update && apt -y install curl

And make a request to the redis_exporter:

root@debug:/# curl redis-exporter-prometheus-redis-exporter:9121/metrics
HELP redis_up Information about the Redis instance
TYPE redis_up gauge
redis_up 1
HELP redis_uptime_in_seconds uptime_in_seconds metric
TYPE redis_uptime_in_seconds gauge
redis_uptime_in_seconds 2793

Or without an additional port – just by running port-forward to the redis-svc:

kk -n monitoring port-forward svc/redis-exporter-prometheus-redis-exporter 9121:9121
Forwarding from -> 9121
Forwarding from [::1]:9121 -> 9121

And run from your local PC:

curl localhost:9121/metrics
redis_up 1
HELP redis_uptime_in_seconds uptime_in_seconds metric
TYPE redis_uptime_in_seconds gauge
redis_uptime_in_seconds 8818

Okay – we’ve got our Redis Server application and its Prometheus exporter with metrics available via the /metrics URI and 9121 port.

The next thing is t configure Prometheus Operator to collect those metrics to its database so later they are pulled by the “central” monitoring via the Prometheus federation.

Adding Kubernetes ServiceMonitor

Check our redis_exporter‘s labels:

kk -n monitoring get deploy redis-exporter-prometheus-redis-exporter -o yaml
apiVersion: extensions/v1beta1
kind: Deployment
generation: 1
app: prometheus-redis-exporter
app.kubernetes.io/managed-by: Helm
chart: prometheus-redis-exporter-3.4.1
heritage: Helm
release: redis-exporter

And check serviceMonitorSelector‘s Selector of the prometheus resource created above:

kk -n monitoring get prometheus -o yaml

At the end of output find the following lines:

    release: prometheus

So, Prometheus will look for ServiceMonitors with the release tag with the prometheus value.

Create an additional ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
    serviceapp: redis-servicemonitor
    release: prometheus
  name: redis-servicemonitor
  namespace: monitoring
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: redis-exporter
    - monitoring
      release: redis-exporter

In its labels we’ve set the release: prometheus so Prometheus can find it, and in the selector.matchLabels – specified to look for any Service tagged as release: redis-exporter.

Apply it:

kk -n monitoring apply -f redis-service-monitor.yaml
servicemonitor.monitoring.coreos.com/redis-servicemonitor created

Check Prometheus’ Targets:

Redis Server metrics:

Great – “It works!” (c)

What is next?

And the next thing to do is to remove Alertmanager and Grafana from our Prometheus Operator stack.

Prometheus Operator Helm deployment and its services configuration

Actually, why need to remove Grafana? It has really useful dashboards ready, so let’s leave it there.

The better way seems to be:

  • on each EKS we will have Prometheus Operator with Grafana but without Alertmanager
  • on cluster’s local Prometheus servers  let’s leave the default retention period to store metrics set to 2 weeks
  • will remove the Alertmanager from there – will use our “central” monitoring’s Alertmanager with already defined routs and alerts (see the Prometheus: Alertmanager’s alerts receivers and routing based on severity level and tags post for details about routing)
  • and on the central monitoring server – we will keep our metrics for one year and will have some Grafana dashboards there

So, we need to add two LoadBalacners – one with the internet-facing type for the Grafana service and another one – internal – for the Prometheus, because it will communicate with our central monitoring host via AWS VPC Peering.

And later well need to add this to our automation – AWS Elastic Kubernetes Service: a cluster creation automation, part 2 – Ansible, eksctl.

But for now, let’s do it manually.

Well, what do we need to change in Prometheus Operator’s default deployment?

  • drop Alertmanager
  • add settings for:
    • Prometheus and Grafana – they must be deployed behind an AWS LoadBalancer
    • set a login:pass for Grafana

All available options can be found in the documentation – https://github.com/helm/charts/tree/master/stable/prometheus-operator.

At his moment, let’s start with removing Alertmanager.

To do so need to pass the alertmanager.enabled parameter.

Check its pod now:

kk -n monitoring get pod
NAME                                                        READY   STATUS             RESTARTS   AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0      2/2     Running            0          24h

Redeploy it and via the --set specify the alertmanager.enabled=false:

helm upgrade --install --namespace monitoring --create-namespace prometheus stable/prometheus-operator --set "alertmanager.enabled=false"

Check pods again – no Alertmanager must be present now in the stack.

An AWS LoadBalancer configuration

To make our Grafana available externally from the Internet we need to add an Ingress resource for Grafana and a dedicated  Ingress for the Prometheus.

What do we have in the documentation:

grafana.ingress.enabled Enables Ingress for Grafana false
grafana.ingress.hosts Ingress accepted hostnames for Grafana []

Add the Ingress for Grafana:

helm upgrade --install --namespace monitoring --create-namespace prometheus stable/prometheus-operator --set "alertmanager.enabled=false" --set grafana.ingress.enabled=true
Error: UPGRADE FAILED: failed to create resource: Ingress.extensions "prometheus-grafana" is invalid: spec: Invalid value: []networking.IngressRule(nil): either `backend` or `rules` must be specified


The documentation says nothing about LoadBalancers configuration, but I googled some in the Jupyter’s docs here.

Let’s try it – now via the values.yaml file to avoid a bunch of --set.

Add the hosts to the file:

    enabled: true
      kubernetes.io/ingress.class: "alb"
      alb.ingress.kubernetes.io/scheme: "internet-facing"
      - "dev-0.eks.monitor.example.com"


helm upgrade --install --namespace monitoring --create-namespace prometheus stable/prometheus-operator -f oper.yaml

Check the Kubernetes ALB Controller logs:

I0617 11:37:48.272749       1 tags.go:43] monitoring/prometheus-grafana: modifying tags {  ingress.k8s.aws/cluster: “bttrm-eks-dev-0”,  ingress.k8s.aws/stack: “monitoring/prometheus-grafana”,  kubernetes.io/service-name: “prometheus-grafa
na”,  kubernetes.io/service-port: “80”,  ingress.k8s.aws/resource: “monitoring/prometheus-grafana-prometheus-grafana:80”,  kubernetes.io/cluster/bttrm-eks-dev-0: “owned”,  kubernetes.io/namespace: “monitoring”,  kubernetes.io/ingress-name
: “prometheus-grafana”} on arn:aws:elasticloadbalancing:us-east-2:534***385:targetgroup/96759da8-e0b8253ac04c7ceacd7/35de144cca011059
E0617 11:37:48.310083       1 controller.go:217] kubebuilder/controller “msg”=”Reconciler error” “error”=”failed to reconcile targetGroups due to failed to reconcile targetGroup targets due to prometheus-grafana service is not of type NodePort or LoadBalancer and target-type is instance”  “controller”=”alb-ingress-controller” “request”={“Namespace”:”monitoring”,”Name”:”prometheus-grafana”}

Okay – add the /target-type: "ip" so AWS ALB will send traffic directly to the Grafana’s pod instead of WorkerNodes port and let’s add valid HTTP codes:

    enabled: true
      kubernetes.io/ingress.class: "alb"  
      alb.ingress.kubernetes.io/scheme: "internet-facing"
      alb.ingress.kubernetes.io/target-type: "ip"
      alb.ingress.kubernetes.io/success-codes: 200,302
      - "dev-0.eks.monitor.example.com"

Or another way – reconfigure the Grafana’s Service to use the NodePort:

    type: NodePort
    port: 80
    annotations: {}
    labels: {}  
    enabled: true
      kubernetes.io/ingress.class: "alb"
      alb.ingress.kubernetes.io/scheme: "internet-facing"
      alb.ingress.kubernetes.io/success-codes: 200,302
      - "dev-0.eks.monitor.example.com"

Redeploy Operator’s stack:

helm upgrade --install --namespace monitoring --create-namespace prometheus stable/prometheus-operator -f oper.yaml

Wait for a minute and check AWS LoadBalancer again:

kk -n monitoring get ingress
NAME                 HOSTS                              ADDRESS                                                                 PORTS   AGE
prometheus-grafana   dev-0.eks.monitor.example.com   96759da8-monitoring-promet-***.us-east-2.elb.amazonaws.com   80      30m

But now dev-0.eks.monitor.example.com will return the 404 code from our LoadBalancer:

curl -vL dev-0.eks.monitor.example.com
*   Trying 3.***.***.247:80...
* Connected to dev-0.eks.monitor.example.com (3.***.***.247) port 80 (#0)
> GET / HTTP/1.1
> Host: dev-0.eks.monitor.example.com
> User-Agent: curl/7.70.0
> Accept: */*
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< Location: /login
* Connection #0 to host dev-0.eks.monitor.example.com left intact
* Issue another request to this URL: 'http://dev-0.eks.monitor.example.com/login'
> GET /login HTTP/1.1
< HTTP/1.1 404 Not Found
< Server: awselb/2.0

Why does this happen?

  1. the ALB accepts a request to the dev-0.eks.monitor.example.com URL and redirects it to the WorkerNodes TargetGroup with Grafana’s pod(s)
  2. Grafana returns the 302 redirect code to the /login URI
  3. the request is going back to the ALB but at this time with the /login URI

Check the ALB’s Listener rules:

Well, here it is – we will return 404 code for any URI requests excepting /.  Accordingly the /login request also will be dropped with the 404.

I’d like to see any comments from developers, who set this as the default behavior for the path.

Go back to our values.yaml, replace the path from the / to the /*.

And update the hosts field to avoid the “Invalid value: []networking.IngressRule(nil): either `backend` or `rules` must be specified” error.

To not tie yourself to any specific domain name set it to the "" value:

    enabled: true
      kubernetes.io/ingress.class: "alb"
      alb.ingress.kubernetes.io/target-type: "ip"
      alb.ingress.kubernetes.io/scheme: "internet-facing"
      alb.ingress.kubernetes.io/success-codes: 200,302
      - ""
    path: /*

Redeploy, check:

kk -n monitoring get ingress
NAME                 HOSTS   ADDRESS                                                                  PORTS   AGE
prometheus-grafana   *       96759da8-monitoring-promet-bb6c-2076436438.us-east-2.elb.amazonaws.com   80      15m

Open in a browser:

And even more – now we are able to see all graphs in the Lens  utility used by our developers – earlier they got the “Metrics are not available due to missing or invalid Prometheus configuration” и “Metrics not available at the moment” error messages:

Actually, that’s all to start using Prometheus Operator to monitor Your Kubernetes cluster

Useful links

Also published on Medium.