Click to rate this post!

[Total: 0 Average: 0]

In the first part – LiteLLM: AI Gateway for LLMs – features overview we got familiar with what LiteLLM can do in general – now we can run it in Kubernetes and connect clients.

At the same time we’ll check the integration with our existing monitoring stack – for now just metrics to VictoriaMetrics. Logs to VictoriaLogs will be there by default, and VictoriaTraces we’ll hook up in the next part – though that’s really easy to do.

So, what are we doing today?

deploy to Kubernetes – with our own Helm chart to create resources for the External Secrets Operator (ESO) (see AWS: Kubernetes and the External Secrets Operator for AWS Secrets Manager)
we’ll run it with two Kubernetes Pods and a PodDisruptionBudget – since the service is important
connect AWS PostgreSQL RDS and Redis
pass API keys and passwords from AWS Secrets Manager via the External Secrets Operator
set up Ingress on AWS ALB
create a couple of users and check metrics collection to VictoriaMetrics

Worth a look before we start:

BerriAI/litellm/example_config_yaml: config examples for various use cases
config_settings: all config file parameters

One note before we start – even though LiteLLM in our project is already in production, we’re a small pre-market startup, so “production” is fairly relative – and LiteLLM is still more of a test setup or PoC for us, and I didn’t focus much on security at this stage.

Planning the deployment

See the docs Deployment Options and High Availability Setup (Resolve DB Deadlocks).

What we have – AWS Elastic Kubernetes Service, an existing AWS RDS server with PostgreSQL, an existing AWS Application Load Balancer.

Besides PostgreSQL, LiteLLM recommends having Redis – for Caching and syncing TPM and RPM limits, I’ll add it too – let’s see how it works and what it gives us.

There are Terraform providers, quite a few actually, for example scalepad/litellm, but I’ll do it without Terraform – a bit of clickops in AWS and pure Helm for the deploy.

Ideally – manage all the keys with Terraform via ephemeral resources – but again, in my specific case I’ll skip that for now (see Terraform: using Ephemeral resources and Write-only attributes).

LiteLLM Helm chart

There’s an official Helm chart, though it’s “[BETA] Helm Chart is BETA” – but pretty convenient, looks readable enough, so let’s give it a try.

I also found somewhere (or someone dropped it in the RTFM Telegram chat) a HelmRelease for Flux CD – you can look there for various interesting parameters as an example.

The first thing we’ll do is look at what’s in the chart’s values, to understand what we can configure out of the box – and what we’ll have to define ourselves.

You can just do it in the console with helm show values:

$ helm show values oci://ghcr.io/berriai/litellm-helm

But it’s more convenient to download it and look in an IDE.

We look for the latest version, at the time of writing it was 1.87.1 (updates come out very often, so it’s worth setting up Renovate right away – see Renovate: GitHub and Helm Charts versions management):

$ helm show chart oci://ghcr.io/berriai/litellm-helm
Pulled: ghcr.io/berriai/litellm-helm:1.87.1
...
version: 1.87.1

Pull the chart, unpack it:

$ helm pull oci://ghcr.io/berriai/litellm-helm --version 1.87.1 --untar
$ cd litellm-helm/

Let’s look at what’s in the chart and what we should change for ourselves.

Useful Helm values

What we’ll need to change:

replicaCount: for Production it’s worth setting 2 or 3
image.tag: set a specific version instead of using latest
serviceAccount.name: if you use AWS RDS with IAM Database Authentication (see AWS: RDS with IAM database authentication, EKS Pod Identities and Terraform) or need to grant access to AWS Secrets Manager, you can pass your own ServiceAccount – but in my case RDS is without IAM, and AWS Secrets Manager is accessed by the External Secrets Operator, which has its own permissions configured
environmentSecrets: we can create our own Kubernetes Secret and pass it here – that’s how it’ll be with the secrets from ESO
environmentConfigMaps: we can create a separate ConfigMap with environment variables – handy, we can pass parameters like max_requests_before_restart, see CLI Arguments
ingress: we’ll set up Ingress with AWS ALB
masterkeySecretName and masterkeySecretKey: pass the parameters to retrieve $LITELLM_MASTER_KEY
proxy_config: you can define LiteLLM’s parameters right in values – but I did it via a separate ConfigMap and passed it in values through proxyConfigMap
autoscaling and keda: nice that it’s there – but not relevant for us yet
tolerations, affinity: we need it, since critical services run on a dedicated WorkerNodes group
db: we’ll describe the connection to PostgreSQL
- since our server is external – we’ll disable deployStandalone
- we’ll pass login/password via a Secret that ESO will create
redis: we’ll enable the deploy of the default one from the sub-chart, though you could connect an external one

So, besides the chart, in our own templates/ we’ll only need to define two resources – a ConfigMap with LiteLLM parameters and an ExternalSecret for the External Secrets Operator.

Preparing for the deployment

We need the values of all the keys and passwords before deploying the Helm chart – so we start with those.

Creating the LLM Providers API Keys

For testing we’ll have Anthropic and OpenAI – create keys for them:

Generate $LITELLM_MASTER_KEY:

$ echo "sk-$(openssl rand -hex 16)"
sk-b75***630

Save them, later we’ll add them to AWS Secrets Manager together with the PostgreSQL data.

Creating the PostgreSQL User && Database

Generate the user password:

$ pwgen 12 1
bai***vah

Create the user and the database:

ops_grafana_db=> CREATE USER ops_litellm_user WITH PASSWORD 'bai***vah';
CREATE ROLE
ops_grafana_db=> CREATE DATABASE ops_litellm_db OWNER ops_litellm_user;
CREATE DATABASE
ops_grafana_db=> GRANT ALL PRIVILEGES ON DATABASE ops_litellm_db TO ops_litellm_user;
GRANT

Check the connection:

$ export PGPASSWORD='bai***vah'; psql -h db.monitoring.ops.example.co -U ops_litellm_user -d ops_litellm_db
psql (18.4, server 16.8)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off, ALPN: none)
Type "help" for help.

ops_litellm_db=> \l ops_litellm_db
                                                                    List of databases
      Name      |      Owner       | Encoding | Locale Provider |   Collate   |    Ctype    | Locale | ICU Rules |           Access privileges           
----------------+------------------+----------+-----------------+-------------+-------------+--------+-----------+---------------------------------------
 ops_litellm_db | ops_litellm_user | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | =Tc/ops_litellm_user                 +
                |                  |          |                 |             |             |        |           | ops_litellm_user=CTc/ops_litellm_user

Now we have 4 secrets – two API keys for OpenAI and Anthropic, LiteLLM’s own master key, and the PostgreSQL password.

Creating a secret in AWS Secrets Manager

Create a new secret with the type “Other type of secret”, set the values in JSON:

{
  "LITELLM_MASTER_KEY": "sk-***",
  "OPENAI_API_KEY": "sk-***",
  "ANTHROPIC_API_KEY": "sk-***",
  "DATABASE_USERNAME": "ops_litellm_user",
  "DATABASE_PASSWORD": "***"
}

Save it as /ops/litellm-prod-secrets – we’ll use this name later in the External Secrets Operator:

Our own Helm chart for LiteLLM

Create our own chart, in it we connect the BerriAI chart via dependencies – write Chart.yaml:

apiVersion: v2
name: litellm
description: Helm chart for LiteLLM proxy
type: application
version: 0.1.0
appVersion: "1.0.0"
dependencies:
  - name: litellm-helm
    version: "1.87.1"
    repository: "oci://ghcr.io/berriai"

On to the secrets.

External Secrets Operator for AWS Secrets Manager

We describe the file templates/secretstore.yml with the SecretStore itself and an ExternalSecret for it.

In the ExternalSecret we specify dataFrom.extract and the AWS Secret /ops/litellm-prod-secrets created above with its variables:

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: litellm-secret-store
spec:
  provider:
    aws:
      service: SecretsManager
      region: {{ .Values.aws.region }}
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: litellm-external-secret
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: litellm-secret-store
    kind: SecretStore
  target:
    name: litellm-secrets
    creationPolicy: Owner
    deletionPolicy: Delete
  dataFrom:
    - extract:
        key: /ops/litellm-prod-secrets

With dataFrom.extract ESO will pull the JSON from AWS Secrets Manager and write it into the Kubernetes Secret litellm-secrets under data as $KEY:VALUE, and the Pods will mount this secret via envFrom.secretRef and pass those $KEY:VALUE as environment variables in the LiteLLM containers.

ConfigMap for the LiteLLM Proxy Config

We make it a separate resource – easier to read the values, easier to manage and update.

Minimal for now – we just need to get the service running, we’ll tune it later.

Create the file templates/proxy-config.yaml with two models – I already have a few monitoring parameters here, more on that in the next post:

apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-proxy-config
data:
  config.yaml: |

    general_settings:
      master_key: os.environ/LITELLM_MASTER_KEY
      store_prompts_in_spend_logs: true

    litellm_settings:

      # Monitoring settings
      require_auth_for_metrics_endpoint: false
      prometheus_initialize_budget_metrics: true
      # Enable the 'stream' label to split requests by streaming vs. non-streaming
      prometheus_emit_stream_label: true
      # Enable the 'end_user' label for cost tracking
      enable_end_user_cost_tracking_prometheus_only: true
      callbacks:
      - prometheus
      - otel
      - arize
      service_callback: 
      - prometheus_system

      # Redis: Cache settings
      cache: true
      cache_params:
        type: redis

    model_list:

      - model_name: gpt-5-5
        litellm_params:
          model: openai/gpt-5-5
          api_key: os.environ/OPENAI_API_KEY

      - model_name: claude-sonnet-4-6
        litellm_params:
          model: anthropic/claude-sonnet-4-6
          api_key: os.environ/ANTHROPIC_API_KEY

Kubernetes WorkerNode Taints

In my setup, for the Kubernetes Pods with LiteLLM we need to specify tolerations, since there’s a separate WorkerNode Group for critical services, see Kubernetes: Pods and WorkerNodes – controlling pod placement on nodes.

I already forgot which taints I set – check with kubectl describe node:

$ kk describe node ip-10-0-59-76.ec2.internal | grep -A5 Taints
Taints:             CriticalAddonsOnly=true:NoExecute
                    CriticalAddonsOnly=true:NoSchedule

Docker image tag for LiteLLM

Versions are listed in Recent Releases, but for some reason the latest there is 1.87.0 – even though 1.87.1 is already out (the latest at the time of writing this post).

You can check the appVersion in the chart itself for version 1.87.1:

$ helm show chart oci://ghcr.io/berriai/litellm-helm --version 1.87.1 | grep appVersion
...
appVersion: 1.87.1

Creating our own Values

The full file values/ops/litellm-ops-1-33-values.yaml came out like this for now:

# AWS region for External Secrets Operator (SecretStore pulls from Secrets Manager).
aws:
  region: us-east-1

# Values for the upstream litellm-helm subchart (wrapper chart nests everything here).
litellm-helm:
  # Number of LiteLLM proxy replicas.
  replicaCount: 2

  deploymentAnnotations:
    # Restart pods automatically when mounted ConfigMaps or Secrets change.
    reloader.stakater.com/auto: "true"

  image:
    # LiteLLM proxy image tag; should match the vendored chart version.
    tag: "1.87.1"

  serviceAccount:
    # Use the namespace default ServiceAccount (no dedicated SA created by the chart).
    create: false
    name: ""

  # K8s Secrets mounted as env vars (synced from AWS Secrets Manager via ExternalSecret).
  environmentSecrets:
    - litellm-secrets

  # Master key for LiteLLM admin API and proxy authentication.
  masterkeySecretName: litellm-secrets
  masterkeySecretKey: LITELLM_MASTER_KEY

  proxyConfigMap:
    # Proxy config is provided by the wrapper chart (helm/templates/proxy-config.yaml).
    create: false
    name: litellm-proxy-config

  db:
    # Use existing Postgres instead of deploying a chart-managed database.
    useExisting: true
    deployStandalone: false
    endpoint: db.monitoring.ops.example.co
    database: ops_litellm_db
    secret:
      name: litellm-secrets
      usernameKey: DATABASE_USERNAME
      passwordKey: DATABASE_PASSWORD

  redis:
    enabled: true
    architecture: standalone
    image:
      registry: docker.io
      repository: bitnami/redis
      tag: "latest"

  ingress:
    enabled: true
    className: alb
    annotations:
      # Share an internal ALB with other ops-1-33 services.
      alb.ingress.kubernetes.io/group.name: ops-1-33-internal-alb
      # Route traffic directly to pod IPs (required for ALB on EKS).
      alb.ingress.kubernetes.io/target-type: ip
      # TLS certificate for aigw.ops.example.co (ACM, us-east-1).
      alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:492***148:certificate/5fe4cb67-5af5-49d6-99e0-eb2145b66390
    hosts:
      - host: aigw.ops.example.co
        paths:
          - path: /*
            pathType: ImplementationSpecific

  # Schedule on CriticalAddonsOnly nodes (system/add-on node group).
  tolerations:
    - key: CriticalAddonsOnly
      value: "true"
      operator: Equal
      effect: NoSchedule
    - key: CriticalAddonsOnly
      value: "true"
      operator: Equal
      effect: NoExecute

  pdb:
  # Keep at least one pod available during voluntary disruptions (2 replicas total).
    enabled: true
    minAvailable: 1

Here, as planned above:

we create two Kubernetes Pods
in deploymentAnnotations with Reloader we set automatic restart on changes in the ConfigMap (where proxy_config will be), see Kubernetes: ConfigMap and Secrets – auto-reload of data in pods
via environmentSecrets we pass the name of the Kubernetes Secret that the External Secrets Operator will create
in masterkeySecretName and masterkeySecretKey we pass the value of $LITELLM_MASTER_KEY
in proxyConfigMap we pass our own Kubernetes ConfigMap with LiteLLM parameters
in db we describe the connection to the existing AWS RDS and its credentials from the same Kubernetes Secret created by ESO
in redis we add running Redis from the Bitnami sub-chart (yeah, yeah)
in ingress – my project uses a shared ALB, we pass it via alb.ingress.kubernetes.io/group.name, see Kubernetes: a single AWS Load Balancer for different Kubernetes Ingresses
we describe tolerations
and a PodDisruptionBudget, definitely – see Kubernetes: ensuring High Availability for Pods

Redis and the Docker image tag

I have no idea what the cool, hip way to run Redis in Kubernetes is these days, because the last time I did it was about 5 years ago, and back before Bitnami ~~sold out~~ made its changes.

But with the default minimal “redis.enabled=true” everything started up fine, the only catch was with the Docker tag, since by default it pulled docker.io/bitnami/redis:7.2.4-debian-12-r9, which either doesn’t exist at all or is “behind a paywall” – I haven’t used Bitnami in a long time, so I’m not really up to date on what exactly changed there.

So for now I just grabbed @latest – while the system is more of a PoC, that’s fine. And once we go to a full production – I’ll dig into Redis separately, or just take AWS ElastiCache.

Creating a Makefile

Make our local life easier (you can also add CI/CD targets here) – add a Makefile so we don’t have to type the commands every time:

helm-pull-local:
  helm pull oci://ghcr.io/berriai/litellm-helm --version 1.87.1 -d charts/

helm-dependency-update:
  helm dependency update

helm-template-ops-1-33:
  helm -n ops-litellm-ns template litellm \
  . -f values/ops/litellm-ops-1-33-values.yaml

helm-diff-ops-1-33:
  helm -n ops-litellm-ns diff upgrade --install litellm \
  . -f values/ops/litellm-ops-1-33-values.yaml \
  --dry-run=server

helm-install-ops-1-33:
  helm -n ops-litellm-ns upgrade --install litellm \
  . -f values/ops/litellm-ops-1-33-values.yaml \
  --debug

And all together it now looks like this:

$ tree .
.
├── CLAUDE.md
├── helm
│   ├── Chart.lock
│   ├── charts
│   │   └── litellm-helm-1.87.1.tgz
│   ├── Chart.yaml
│   ├── Makefile
│   ├── templates
│   │   ├── proxy-config.yaml
│   │   └── secretstore.yml
│   └── values
│       └── ops
│           └── litellm-ops-1-33-values.yaml
└── README.md

Deploying to Kubernetes

Create the Kubernetes Namespace – manually for now, normally we do this with Terraform when creating the cluster:

$ kk create ns ops-litellm-ns

Check that the chart renders fine:

$ make helm-template-ops-1-33 
helm -n ops-litellm-ns template litellm \
. -f values/ops/litellm-ops-1-33-values.yaml
...
---
# Source: litellm/templates/proxy-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: litellm-proxy-config
data:
  config.yaml: |
    general_settings:
      master_key: os.environ/LITELLM_MASTER_KEY
    model_list:
      - model_name: gpt-5-5
        litellm_params:
          model: openai/gpt-5-5
          api_key: os.environ/OPENAI_API_KEY
...

Install:

$ make helm-install-ops-1-33

Check the pods:

$ kk get pod
NAME                       READY   STATUS              RESTARTS   AGE
litellm-6d9bdbd689-6fl7q   0/1     ContainerCreating   0          22s
litellm-6d9bdbd689-gh4dw   0/1     ContainerCreating   0          22s

And the Ingress:

$ kk get ingress
NAME      CLASS   HOSTS                ADDRESS                                                PORTS   AGE
litellm   alb     aigw.ops.example.co   internal-k8s-***.us-east-1.elb.amazonaws.com  80 33s

Wait a couple of minutes, check that the DNS has updated:

$ dig aigw.ops.example.co +short
10.0.41.213
10.0.53.252

And open the LiteLLM dashboard:

Log in with admin and $LITELLM_MASTER_KEY, check the available models – both defined in proxy-config.yaml are there:

Checking LiteLLM with Claude Code

It was interesting to see how to proxy access to Anthropic through LiteLLM in Claude Code.

Maybe somewhere down the line I’ll describe how I configured our Backend API and other services, though it’s all pretty simple there – you just need to add a base_url and swap the API keys.

Creating a LiteLLM User and Virtual API Key via the LiteLLM API

Create a new user and their API Key:

$ curl -X POST https://aigw.ops.example.co/user/new \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_email": "[email protected]",
    "user_role": "internal_user"
  }'

Save the key – “key":"sk-CUn***wVg“.

Check the users in LiteLLM:

Running Claude Code through LiteLLM

Here you just need to set a couple of variables – the key and the endpoint:

$ export ANTHROPIC_API_KEY=sk-CUn***wVg
$ export ANTHROPIC_BASE_URL=https://aigw.ops.example.co

And run claude.

Pick a model that’s available in LiteLLM:

❯ /model
  ⎿  Set model to Sonnet 4.6 and saved as your default for new sessions

Make a request like “How are you?” – et voilà! We’ve got a trace in LiteLLM:

Granted, the project’s Anthropic account had zero USD 🙂

But then we topped up the balance – everything works.

Monitoring and metrics to VictoriaMetrics

Very briefly here, just to confirm the data is there. We’ll dig into it in more detail in the next post – I was going to cover it in this one, but it came out long enough as it is.

Docs – Prometheus metrics and OpenTelemetry – Tracing LLMs with any observability tool, plus I wrote a bit in the previous post in the Monitoring, OpenTelemetry and Traces section.

Enable the /metrics endpoint for VictoriaMetrics:

litellm_settings:
  callbacks: 
  - prometheus

LiteLLM /metrics authentication

The docs say that “By default /metrics endpoint is unauthenticated” – but when deployed from this chart, accessing the metrics gives “Unauthorized access to metrics endpoint” – even though I didn’t see any such parameter anywhere in the values.

You can disable it explicitly with require_auth_for_metrics_endpoint=false – since it’s an Internal ALB anyway, but we can take the opportunity to create some more users.

Create a new read-only admin user – the proxy_admin_viewer role (see User Roles), since a regular user wasn’t given access to the metrics anyway:

curl -X POST https://aigw.ops.example.co/user/new \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_email": "[email protected]",
    "user_role": "proxy_admin_viewer"
  }'

Save the obtained key to $LITELLM_RO_ADMIN_KEY in the AWS Secret Store, update the deploy, check the metrics with the header “Authorization: Bearer $LITELLM_RO_ADMIN_KEY“:

$ curl -s -H "Authorization: Bearer $LITELLM_RO_ADMIN_KEY" https://aigw.ops.example.co/metrics/ | tail
# HELP litellm_check_batch_cost_jobs_processed_total Total number of batches successfully cost-tracked by CheckBatchCost
# TYPE litellm_check_batch_cost_jobs_processed_total counter
# HELP litellm_check_batch_cost_errors_total Total number of errors in CheckBatchCost by error type
# TYPE litellm_check_batch_cost_errors_total counter
# HELP litellm_check_batch_cost_last_run_timestamp Unix timestamp of the last CheckBatchCost job run
# TYPE litellm_check_batch_cost_last_run_timestamp gauge
litellm_check_batch_cost_last_run_timestamp 0.0
# HELP litellm_in_flight_requests Number of HTTP requests currently in-flight on this uvicorn worker
# TYPE litellm_in_flight_requests gauge
litellm_in_flight_requests 1.0

Now we can collect them to VictoriaMetrics.

vmagent and VMPodScrape

There are a few options:

enable serviceMonitor in the LiteLLM chart values – it’ll create a Prometheus ServiceMonitor, and the VictoriaMetrics Operator, if operator.disable_prometheus_converter=false is set, will create a VMServiceScrape (I wrote about it in VMServiceScrape with ServiceMonitor and VictoriaMetrics Prometheus Converter)
we can just set inlineScrapeConfig for vmagent and specify static_configs or kubernetes_sd_configs
or we can just create a VMPodScrape

With inlineScrapeConfig, static_configs isn’t a great fit for us – since LiteLLM has several pods, and we need to collect metrics from all of them, but just for a check we can do it this way for now.

Add a new job_name, specify the bearer_token explicitly for now, then ~~we’ll do it via secrets~~ I enabled require_auth_for_metrics_endpoint=false in the LiteLLM parameters:

...

        - job_name: litellm
          metrics_path: /metrics/
          bearer_token: sk-0H0***Dyg
          static_configs:
            - targets: ["litellm.ops-litellm-ns.svc.cluster.local:4000"]

Deploy and check the metrics in VictoriaMetrics:

The famous 40,000-token system prompt in Claude Code 🙂

If you make your own VMPodScrape – it looks like this:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMPodScrape
metadata:
  name: litellm
  namespace: ops-litellm-ns
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: litellm
  podMetricsEndpoints:
    - port: http
      path: /metrics/

And that’s it – the service is ready for configuration.

It’s been running for about a week now, we’re gradually switching over our production services – so far so good.

Next we’ll need to add more models, set up metrics and traces, look at what interesting alerts and Grafana dashboards we can create – the next post is already in drafts.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

LiteLLM: AI Gateway on Kubernetes and Metrics in VictoriaMetrics
0 (0)