Kubernetes: tracing requests with AWS X-Ray, and Grafana data source

By | 03/02/2024

Tracing allows you to track requests between components, that is, for example, when using AWS and Kubernetes we can trace the entire path of a request from AWS Load Balancer to Kubernetes Pod and to DynamoDB or RDS.

This helps us both to track performance issues – where and which requests are taking a long time to execute – and to have additional information when problems arise, for example, when our API returns 500 errors to clients, and we need to find out which component of the system is causing the problem.

AWS has a service for tracing called X-Ray, where we can send data using AWS X-Ray SDK for Python or AWS Distro for OpenTelemetry Python (or other languages, but we’ll talk about Python here).

AWS X-Ray adds a unique X-Ray ID to each request and allows you to build a picture of the full “route” of the request.

Also, in Kubernetes we can trace with tools like Jaeger or Zipkin, and then build the picture in Grafana Tempo.

Another way is to use the X-Ray Daemon, which we can run in Kubernetes, and add the X-Ray plugin to Grafana. See Introducing the AWS X-Ray integration with Grafana for examples.

Additionally, AWS Distro for OpenTelemetry also works with AWS X-Ray-compliant Trace IDs – see AWS Distro for OpenTelemetry and AWS X-Ray and Collecting traces from EKS with ADOT.

Today, however, we will be adding an X-Ray collector that will create a Kubernetes DaemonSet and a Kubernetes Service to which Kubernetes Pods can send data that we can then see either in the AWS X-Ray Console or in Grafana.

AWS IAM

IAM Policy

To access AWS API from X-Ray daemon Pods, we need to create an IAM Role, which we will then use in the ServiceAccount for X-Ray.

We still use the old way of adding IAM Role via ServiceAccounts, see Kubernetes: ServiceAccount from AWS IAM Role for Kubernetes Pod, although AWS recently announced the Amazon EKS Pod Identity Agent add-on – see AWS: EKS Pod Identities – a replacement for IRSA? Simplifying IAM access management.

So, create an IAM Policy with permissions to write to X-Ray:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "xray:PutTraceSegments",
                "xray:PutTelemetryRecords"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

Save it:

IAM Role

Next, add an IAM Role that the Kubernetes ServiceAccount can use.

Find the Identity provider of our EKS cluster:

Go to the IAM Roles, add a new role.

In the Trusted entity type, select Web Identity, and in Web identity select the Identity provider of our EKS, and in the Audience field – set the AWS STS endpoint:

Attach the IAM Policy created above:

Save it:

Running X-Ray Daemon in Kubernetes

Let’s use the okgolove/aws-xray Helm chart.

Create an  x-ray-values.yaml file, see the default values in values.yaml:

serviceAccount:
  annotations: 
    eks.amazonaws.com/role-arn: arn:aws:iam::492***148:role/XRayAccessRole-test
xray:
  region: us-east-1
  loglevel: prod

Add a repository:

$ helm repo add okgolove https://okgolove.github.io/helm-charts/

Install the chart into the cluster, this will create a DaemonSet and a Service:

$ helm -n ops-monitoring-ns install aws-xray okgolove/aws-xray -f x-ray-values.yaml

Check the Pods:

$ kk get pod -l app.kubernetes.io/name=aws-xray
NAME             READY   STATUS    RESTARTS   AGE
aws-xray-5n2kt   0/1     Pending   0          41s
aws-xray-6cwwf   1/1     Running   0          41s
aws-xray-7dk67   1/1     Running   0          41s
aws-xray-cq7xc   1/1     Running   0          41s
aws-xray-cs54v   1/1     Running   0          41s
aws-xray-mjxlm   0/1     Pending   0          41s
aws-xray-rzcsz   1/1     Running   0          41s
aws-xray-x5kb4   1/1     Running   0          41s
aws-xray-xm9fk   1/1     Running   0          41s

And Kubernetes Service:

$ kk get svc -l app.kubernetes.io/name=aws-xray
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
aws-xray   ClusterIP   None         <none>        2000/UDP,2000/TCP   77s

Checking and working with X-Ray

Create a Python Flask HTTP App with X-Ray

Let’s create a service on Python Flask that will respond to HTTP requests and log X-ray IDs (ChatGPT promt – “Create a simple Python App with AWS X-Ray SDK for Python to run in Kubernetes. Add X-Ray ID output to requests“):

from flask import Flask
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.ext.flask.middleware import XRayMiddleware
import logging

app = Flask(__name__)

# Configure AWS X-Ray
xray_recorder.configure(service='SimpleApp')
XRayMiddleware(app, xray_recorder)

# Set up basic logging
logging.basicConfig(level=logging.INFO)

@app.route('/')
def hello():
    # Retrieve the current X-Ray segment
    segment = xray_recorder.current_segment()
    # Get the trace ID from the current segment
    trace_id = segment.trace_id if segment else 'No segment'
    # Log the trace ID
    logging.info(f"Responding to request with X-Ray trace ID: {trace_id}")
    
    return f"Hello, X-Ray! Trace ID: {trace_id}\n"

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

Create requirements.txt:

flask==2.0.1
werkzeug==2.0.0
aws-xray-sdk==2.7.0

Add Dockerfile:

FROM python:3.8-slim

COPY requirements.txt .
RUN pip install --force-reinstall -r requirements.txt

COPY app.py .

CMD ["python", "app.py"]

Build a Docker image – here we use a repository in AWS ECR:

$ docker build -t 492***148.dkr.ecr.us-east-1.amazonaws.com/x-ray-test .

Log in to the ECR:

$ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 492***148.dkr.ecr.us-east-1.amazonaws.com

Push the image:

$ docker push 492***148.dkr.ecr.us-east-1.amazonaws.com/x-ray-test

Run Flask App in Kubernetes

Create a manifest with Kubernetes Deployment, Service, and Ingress.

For Ingress, enable logging into an AWS S3 bucket – logs will be collected from it to Grafana Loki, see Grafana Loki: collecting AWS LoadBalancer logs from S3 with Promtail Lambda.

For Deployment, set the AWS_XRAY_DAEMON_ADDRESS environment variable, with the URL of the Kubernetes Service of our X-Ray Daemon:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: flask-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: flask-app
  template:
    metadata:
      labels:
        app: flask-app
    spec:
      containers:
      - name: flask-app
        image: 492***148.dkr.ecr.us-east-1.amazonaws.com/x-ray-test
        ports:
        - containerPort: 5000
        env: 
          - name: AWS_XRAY_DAEMON_ADDRESS
            value: "aws-xray.ops-monitoring-ns.svc.cluster.local:2000"
          - name: AWS_REGION
            value: "us-east-1"
---
apiVersion: v1 
kind: Service
metadata:
  name: flask-app-service
spec:
  selector:
    app: flask-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: flask-app-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: "internet-facing"
    alb.ingress.kubernetes.io/target-type: "ip"
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
    alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=ops-1-28-devops-monitoring-ops-alb-logs
spec:
  ingressClassName: alb
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: flask-app-service
            port:
              number: 80

Deploy it and check Ingress/ALB:

$ kk get ingress
NAME                CLASS   HOSTS   ADDRESS                                                                 PORTS   AGE
flask-app-ingress   alb     *       k8s-default-flaskapp-25042181e0-298318111.us-east-1.elb.amazonaws.com   80      10m

Make a request to the endpoint:

$ curl k8s-default-flaskapp-25042181e0-298318111.us-east-1.elb.amazonaws.com
Hello, X-Ray! Trace ID: 1-65e1d287-5fc6f0f34b4fb2120da8bbec

And we see the X-Ray ID.

We can also see it in the Load Balancer Access Logs:

And in the X-Ray itself:

Although, I expected the Load Balancer to be in the request map too, but it wasn’t.

Grafana X-Ray data source

Add a new Data source:

Configure access to AWS – here it’s simple with ACCESS and SECRET keys (see X-Ray documentation):

And now we have a new data source in Explore:

And a new type of visualization – Traces:

And somewhere in another post I will probably describe the creation of a real dashboard with X-Ray.