Kubernetes: 503 errors with AWS ALB possible causes and solutions

By | 07/09/2025
 

After migration to a new EKS cluster, we started getting alerts about 503 errors sometimes.

The errors were happened in three cases:

  • sometimes without any deployment, when all Pods were Running && Ready
  • sometimes during deployment – but only on Dev, because there is only one Pod for API
  • and sometimes during Karpenter Consolidation.

Let’s dig into the possible reasons.

A little context: our setup

We have AWS EKS running Pods with the Backend API.

To access them, we have a Kubernetes Service with the ClusterIP type and an Ingress resource with the alb.ingress.kubernetes.io/target-type: ip attribute.

We have an AWS LoadBalancer Controller that creates an AWS Application LoadBalancer.

When creating a Kubernetes Service, the Endpoints controller creates an Endpoints resource with a list of Pod IPs, which is then used by the ALB Controller to add targets to the Target Group.

When deploying the Backend API Ingress – ALB Controller creates an ALB with Listeners for each hostname from Ingress with its own SSL, and with a Target Group for each Listener, which specifies IP Pods from the list of addresses in Endpoints to which traffic from clients should be sent.

That is, the requests flow looks like this: client => ALB => Listener => Target Group => Pod IP.

More – Kubernetes: what are Endpoints and Kubernetes: Service, load balancing, kube-proxy and iptables.

However, in Kubernetes 1.33, Endpoints have been already deprecated, see Kubernetes v1.33: Continuing the transition from Endpoints to EndpointSlices.

ALB and EKS: 502 vs 503 vs 504

First, let’s look at the difference between errors.

502 occurs if ALB could not receive a correct response from the backend, i.e. an error at the level of the service (application) itself, the application layer (“I made a phone call, but the other end answered something incomprehensible or hung up in the middle of the conversation”):

  • Pod has no open port
  • Pod crashed, but Readiness probe says it’s alive, and Kubernetes doesn’t disconnect it from traffic (doesn’t remove IP from the list in the Endpoints resource)
  • Pod returns an error from the service when connecting

A bit more real cases about 502 in my blog:

503 occurs if ALB does not have any Healthy targets, or Ingress could not find a Pod (“the remote cannot accept your phone call”):

  • Pods pass the readinessProbe, are added to the Endpoints list and to ALB targets, but do not pass the Health checks in the TargetGroup – then ALB sends traffic to all Pods (targets), see the following. Health checks for Application Load Balancer target groups
  • Pods fail the readinessProbe, Kubernetes removes them from Endpoints, and the Target Group becomes empty – ALB has nowhere to send requests
  • Kubernetes Service has configuration errors (for example, a wrong Pod selector)
  • Kubernetes Service is configured correctly, but the number of running Pods == 0
  • ALB has established a connection to the Pod, but the connection is broken at the TCP level (for example, due to different keep-alive timeouts on the backend and ALB), and ALB received a TCP RST (reset)

504 occurs when ALB sent a request, but did not receive a response within the set timeout (default is 60 seconds on ALB) (“I made a phone call, but I’m not answered for too long, so I hung up”)

  • a process in Pod takes too long to process a request
  • network issues in the AWS VPC or EKS cluster, and packets from ALB to Pod are taking too long

See also Troubleshoot your Application Load Balancers.

Possible causes of 503

Other possible causes of 503, and whether they are related to our case:

  • incorrect Security Groups rules:
    • when creating Target Groups, AWS Load Balancer Controller has to create or update SecurityGroups, and may, for example, not have access to the AWS API for editing SecurityGroups
    • but not our case, because the error occurs periodically, and in this case, it would be immediately and permanently
  • unhealthy targets in Target Group:
    • if all Pods (ALB targets) periodically unhealthy, then we will have a 503, because ALB starts sending traffic to all available targets
    • which is also not our case: I’ve checked the UnHealthyHostCount metric in CloudWatch, and it proves that there were no problems with the targets
  • incorrect tags on VPC Subnets:
    • ALB The controller searches for Subnets by the kubernetes.io/cluster/<cluster-name>: owned tag to find Pods in them and register targets
    • again, not our case, because the error occurs periodically, not permanently
  • connection draining delay:
    • when deploying or scaling, old Pods are deleted, and their IPs are removed from the TargetGroup, but this can be done with a delay – that is, the Pod is already dead, and its IP is still in Targets
    • but in the first case there were no restarts or deployments, see below
  • packets per second or traffic per second limits on Worker Node:
  • inconsistency of Keep-Alive timeouts:
    • idle timeout on the Load Balancer is higher than on the backend – and ALB can send a request on a connection that is already closed on the backend

And now, let’s move to our real causes.

Issue #1: different keep-alive timeouts

ALB idle timeout 600 seconds, Backend – 75 seconds

On our ALB, we have a Connection idle timeout being set to 600 seconds:

 

And on the Backend API it is 75 seconds:

...
CMD ["gunicorn", "challenge_backend.run_api:app", "--bind", "0.0.0.0:8000", "-w", "3", "-k", "uvicorn.workers.UvicornWorker", "--log-level", "debug", "--keep-alive", "75"] 
...

That is, after 75 seconds, the Backend sends a TCP FIN signal and closes the connection – but ALB can still send a request at this time.

How can you check it?

I didn’t do it this time, but for future reference, you can check the traffic with tcpdump:

$ sudo tcpdump -i any -nn -C 100 -W 5 -w /tmp/alb_capture.pcap 'port 8080 and host <ALB_IP_1> or host <ALB_IP_2>)'

What can happen:

  1. if the connection is inactive for 75 seconds, Pod closes the connection – sends a packet with [FIN, ACK]
  2. ALB tries to send a packet over the same connection: <ALB_IP> => <POD_IP> | TCP | [PSH, ACK] Seq=1 Ack=2 Len=...
  3. and Pod responds with a packet with [RST] (reset): <POD_IP> => <ALB_IP> | TCP | [RST] Seq=2

In this case, ALB will return a 503 error to the client.

Therefore, to begin with, just set the default 60 seconds on ALB (less than on the backend). Or increase it on the backend.

Where did the 600 seconds on Ingress come from?

I did a little digging here, because I didn’t immediately find where the 600 seconds for ALB came from.

The default timeout in AWS ALB is 60 seconds, see Configure the idle connection timeout for your Classic Load Balancer.

We use a single Ingress that creates the AWS ALB, and to which other Ingresses connect via the alb.ingress.kubernetes.io/group.name annotation (see Kubernetes: a single AWS Load Balancer for different Kubernetes Ingresses).

In this main Ingress, we don’t have any attributes to change the idle timeout:

...
    alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
    alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket={{ .Values.alb_aws_logs_s3_bucket }}
    alb.ingress.kubernetes.io/actions.default-action: >
      {"Type":"fixed-response","FixedResponseConfig":{"ContentType":"text/plain","StatusCode":"200","MessageBody":"It works!"}}
...

Although this can be done through Custom atributes – there is an example of how to set 600 seconds, but this is custom, not the default for ALB Ingress Controller.

That is, the default should be 60.

I tried to set 60 seconds simply through this Custom – and got the error “conflicting load balancer attributes idle_timeout.timeout_seconds“:

...
aws-load-balancer-controller-6f7576c58b-5nmp7:aws-load-balancer-controller {"level":"error","ts":"2025-06-20T10:39:51Z","msg":"Reconciler error","controller":"ingress","object":{"name":"ops-1-33-external-alb"},"namespace":"","name":"ops-1-33-external-alb","reconcileID":"24bf8a2e-72ca-4008-8308-0f3b5595649c","error":"conflicting load balancer attributes idle_timeout.timeout_seconds: 60 | 600"}
...

Then I decided to check just all Ingresses in all Namespaces:

$ kk get ingress -A -o yaml | grep idle_timeout
      alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600
      alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600

And then, without grep, I just found the name Ingress with this attribute:

...
- apiVersion: networking.k8s.io/v1
  kind: Ingress
  metadata:
    annotations:
      ...
      alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600
      ...
    name: atlas-victoriametrics-victoria-metrics-auth
...

I don’t remember why I added it, but it came out a little sideways.

Well, it was changed, and 503s became much less frequent, but still sometimes occurred.

Issue #2: 503 during deployments

The 503 has arrived again.

Now we can see that there was indeed a deployment at this time:

Although Deployment has maxSurge 100% and maxUnavailable: 0 – see Rolling Update:

...
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 100% # run all replicas before stop old Pods
      maxUnavailable: 0 # keep all old Pods until new Pods will be Running (passed the readinessProbe)
...

That is, even if there is only one Pod on the Dev environment, the new one has to start first, pass all the probes, and only then the old one will be deleted.

But we still get a 503 when deploying to Dev.

Kubernetes, AWS TargetGroup and targets registration

What does the process of adding to Target Groups look like?

  • create a new Pod
  • kubelet checks the readinessProbe
    • until the readinessProbe is passed – Pod in Ready status == False
  • when readinessProbe passes – Kubernetes updates Endpoints and adds Pod IP to the address list
  • ALB Controller sees a new IP in Endpoints, and starts the process of adding this IP to the TargetGroup
  • after registration in the TargetGroup, this IP does not immediately receive traffic, but switches to the Initial status
  • ALB starts performing its Health checks
  • when the Health check passes, the target becomes Healthy and traffic is routed to it

Kubernetes, AWS TargetGroup, and targets deregistration

But there is a nuance in how targets are removed from the Target Group:

  • during a deployment from Rolling Update – Kubernetes creates a new Pod, and waits until it becomes Ready (passes readinessProbe)
  • after that, Kubernetes starts deleting the old Pod – it goes into Terminating status
  • at the same time, the new IP of the new Pod, which was added to the ALB Target Group, can still pass Health checks, that is, be in the Initial status
  • and the target of the Pod that has already started Terminating goes to the Draining status and is then removed from the Target Group

And this is where we can catch 503s.

Testing.

Let’s try to solve this problem:

  • make a deploy
  • watch the Pods status:
    • a new Pod becomes Ready, and the old Pod starts to be killed
    • at this time, check the ALB to see if the new target has passed the Initial and what is the status of the old target

What we have before deploying:

The Pod itself:

$ kk get pod -l app=backend-api -o wide
NAME                                      READY   STATUS    RESTARTS   AGE   IP           NODE                          NOMINATED NODE   READINESS GATES
backend-api-deployment-849557c54b-jmkz4   1/1     Running   0          44m   10.0.47.48   ip-10-0-38-184.ec2.internal   <none>           <none>

And Endpoints:

$ kk get endpoints backend-api-service
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
NAME                  ENDPOINTS         AGE
backend-api-service   10.0.47.48:8000   7d3h

There is now one target in the TargetGroup:

Run the deploy, and have a new Pod in Ready and Running, and the old one in Terminating:

And in the corresponding TargetGroup we have a new target in the Initial status, and the old one is already in Draining:

However, it was not possible to catch 503s with curl in a 1-second loop – the health checks passed too quickly.

Solution: Pod readiness gate

The Pod readiness gate adds another check to the Pod: until the corresponding Target in the ALB passes the Health check, the old Pod will not be deleted:

The condition status on a pod will be set to True only when the corresponding target in the ALB/NLB target group shows a health state of “Healthy”.

But this only works if you have the target_type == IP.

Include the Readiness gate in the Kubernetes Namespace:

$ kubectl label namespace dev-backend-api-ns elbv2.k8s.aws/pod-readiness-gate-inject=enabled
namespace/dev-backend-api-ns labeled

Let’s deploy it again and see.

Pod has moved to Ready, but it is not yet ready in Readiness Gates – 0/1:

The old target is not draining, while the new one is in Initial:

Because the old Pod hasn’t started deleting yet:

As soon as the new target became Healthy:

The old Pod started to be deleted as well:

Problem 3: Karpenter consolidation

A few more times, 503 occurred when Karpenter deleted a WorkerNode.

Let’s check the pods and nodes:

sum(kube_pod_info{namespace="dev-backend-api-ns", pod=~"backend-api.*"}) by (pod, node)

  • old Pod on the ip-10-0-32-209.ec2.internal
  • new Pod on the ip-10-0-36-145.ec2.internal

And at this time the WorkerNodes were rebalanced – the ip-10-0-32-209.ec2.internal instance was deleted due to "reason": "underutilized":

Let’s recall what the Karpenter consolidation process looks like – see Karpenter Disruption Flow in the post Kubernetes: Ensuring High Availability for Pods:

  • Karpenter sees an underutilized EC2
  • he adds a Node taint with NoSchedule to prevent new pods from being created on this WorkerNode (see Kubernetes: Pods and WorkerNodes – controlling the placement of pods on nodes)
  • executes Pod Eviction to remove existing Pods from this node
  • Kubernetes receives the Eviction event, and starts the process of deleting containers – puts them in the Terminating state by sending the SIGTERM signal
  • ALB Controller sees that the Pod is in Terminating status and starts the process of deleting the target – puts it in the Draining state
  • at the same time, Kubernetes sees that the number of replicas in Deployment is not equal to the desired state, and creates a new Pod to replace the old one
    • the new Pod cannot be run on the old WorkerNode, because it has a NoSchedule taint, and if there is no space on the existing WorkerNodes – Karpenter creates a new EC2 instance, which takes a couple of minutes – this will also increase the window for 503
  • the new Pod is in Pending or ContainerCreating status at this time, and is not added to the LoadBalancer TargetGroup – while the old target is already being deleted
    • in fact, it is not deleted immediately, because Draining state is when ALB stops creating new connections to the target, but gives time to complete the existing ones – see Registered targets
  • respectively, at this time ALB simply has nowhere to send traffic
  • we are getting 503

Verification

We can try to reproduce the error:

  • perform node drain where a Dev Pod is
  • look at the status of the Pods
  • look at the ALB Targets

Find the Pod’s Node of the Backend API in the Dev Namespace:

$ kk -n dev-backend-api-ns get pod -l app=backend-api -o wide
NAME                                      READY   STATUS    RESTARTS   AGE     IP          NODE                          NOMINATED NODE   READINESS GATES
backend-api-deployment-7b5fd6bb9b-mrm2m   1/1     Running   0          7m56s   10.0.42.9   ip-10-0-40-204.ec2.internal   <none>           1/1

Let’s drain this WorkerNode:

$ kubectl drain --ignore-daemonsets ip-10-0-40-204.ec2.internal

We see that the old Pod is in Terminating status, and the new one is in ContainerCreating status:

And at this time, we have only one target in the TargetGroup, and it is already in the Draining status – because the old Pod is in Terminating, and the new one has not yet been added to the TargetGroup – because Endpoints will be updated only when the new Pod is Ready:

And at this time we caught a bunch of 503s:

$ while true; curl -X GET -s -o /dev/null -w "%{http_code}\n" https://dev.api.example.co/ping; do sleep 1; done | grep 503
503
503
503
...

Solution: PodDisruptionBudget

How to prevent it?

Have a PodDisruptionBudget, more details in the same post Kubernetes: Ensuring High Availability for Pods.

But in this particular case, the errors occur in the Dev environment, where there is only one container with the API and no PDB – and on Staging and Production they are, so we no longer get 503 errors there:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: backend-api-pdb
spec:
  minAvailable: {{ .Values.deployment_api.poddisruptionbudget.min_avail }}
  selector:
    matchLabels:
      app: backend-api

Actually, the last 503 error is solved as well.