After migration to a new EKS cluster, we started getting alerts about 503 errors sometimes.
The errors were happened in three cases:
- sometimes without any deployment, when all Pods were Running && Ready
- sometimes during deployment – but only on Dev, because there is only one Pod for API
- and sometimes during Karpenter Consolidation.
Let’s dig into the possible reasons.
Contents
A little context: our setup
We have AWS EKS running Pods with the Backend API.
To access them, we have a Kubernetes Service with the ClusterIP type and an Ingress resource with the alb.ingress.kubernetes.io/target-type: ip
attribute.
We have an AWS LoadBalancer Controller that creates an AWS Application LoadBalancer.
When creating a Kubernetes Service, the Endpoints controller creates an Endpoints resource with a list of Pod IPs, which is then used by the ALB Controller to add targets to the Target Group.
When deploying the Backend API Ingress – ALB Controller creates an ALB with Listeners for each hostname
from Ingress with its own SSL, and with a Target Group for each Listener, which specifies IP Pods from the list of addresses in Endpoints to which traffic from clients should be sent.
That is, the requests flow looks like this: client => ALB => Listener => Target Group => Pod IP.
More – Kubernetes: what are Endpoints and Kubernetes: Service, load balancing, kube-proxy and iptables.
However, in Kubernetes 1.33, Endpoints have been already deprecated, see Kubernetes v1.33: Continuing the transition from Endpoints to EndpointSlices.
ALB and EKS: 502 vs 503 vs 504
First, let’s look at the difference between errors.
502 occurs if ALB could not receive a correct response from the backend, i.e. an error at the level of the service (application) itself, the application layer (“I made a phone call, but the other end answered something incomprehensible or hung up in the middle of the conversation”):
- Pod has no open port
- Pod crashed, but Readiness probe says it’s alive, and Kubernetes doesn’t disconnect it from traffic (doesn’t remove IP from the list in the Endpoints resource)
- Pod returns an error from the service when connecting
A bit more real cases about 502 in my blog:
- Kubernetes: NGINX/PHP-FPM graceful shutdown – getting rid of 502 errors: how NGINX and the
SIGTERM
signal led to 502 - Kubernetes: Ingress, error 502, readinessProbe and livenessProbe: how a
panic
call in a Golang service led to 502
503 occurs if ALB does not have any Healthy targets, or Ingress could not find a Pod (“the remote cannot accept your phone call”):
- Pods pass the
readinessProbe
, are added to the Endpoints list and to ALB targets, but do not pass the Health checks in the TargetGroup – then ALB sends traffic to all Pods (targets), see the following. Health checks for Application Load Balancer target groups - Pods fail the
readinessProbe
, Kubernetes removes them from Endpoints, and the Target Group becomes empty – ALB has nowhere to send requests - Kubernetes Service has configuration errors (for example, a wrong Pod selector)
- Kubernetes Service is configured correctly, but the number of running Pods == 0
- ALB has established a connection to the Pod, but the connection is broken at the TCP level (for example, due to different keep-alive timeouts on the backend and ALB), and ALB received a
TCP RST
(reset)
504 occurs when ALB sent a request, but did not receive a response within the set timeout (default is 60 seconds on ALB) (“I made a phone call, but I’m not answered for too long, so I hung up”)
- a process in Pod takes too long to process a request
- network issues in the AWS VPC or EKS cluster, and packets from ALB to Pod are taking too long
See also Troubleshoot your Application Load Balancers.
Possible causes of 503
Other possible causes of 503, and whether they are related to our case:
- incorrect Security Groups rules:
- when creating Target Groups, AWS Load Balancer Controller has to create or update SecurityGroups, and may, for example, not have access to the AWS API for editing SecurityGroups
- but not our case, because the error occurs periodically, and in this case, it would be immediately and permanently
- unhealthy targets in Target Group:
- if all Pods (ALB targets) periodically unhealthy, then we will have a 503, because ALB starts sending traffic to all available targets
- which is also not our case: I’ve checked the
UnHealthyHostCount
metric in CloudWatch, and it proves that there were no problems with the targets
- incorrect tags on VPC Subnets:
- ALB The controller searches for Subnets by the
kubernetes.io/cluster/<cluster-name>: owned
tag to find Pods in them and register targets - again, not our case, because the error occurs periodically, not permanently
- ALB The controller searches for Subnets by the
- connection draining delay:
- when deploying or scaling, old Pods are deleted, and their IPs are removed from the TargetGroup, but this can be done with a delay – that is, the Pod is already dead, and its IP is still in Targets
- but in the first case there were no restarts or deployments, see below
- packets per second or traffic per second limits on Worker Node:
- EC2 has limits depending on the type of instance (see Amazon EC2 instance network bandwidth and Packets-per-second limits in EC2)
- is also not our case, because there is very little traffic
- inconsistency of Keep-Alive timeouts:
- idle timeout on the Load Balancer is higher than on the backend – and ALB can send a request on a connection that is already closed on the backend
And now, let’s move to our real causes.
Issue #1: different keep-alive timeouts
ALB idle timeout 600 seconds, Backend – 75 seconds
On our ALB, we have a Connection idle timeout being set to 600 seconds:
And on the Backend API it is 75 seconds:
... CMD ["gunicorn", "challenge_backend.run_api:app", "--bind", "0.0.0.0:8000", "-w", "3", "-k", "uvicorn.workers.UvicornWorker", "--log-level", "debug", "--keep-alive", "75"] ...
That is, after 75 seconds, the Backend sends a TCP FIN
signal and closes the connection – but ALB can still send a request at this time.
How can you check it?
I didn’t do it this time, but for future reference, you can check the traffic with tcpdump
:
$ sudo tcpdump -i any -nn -C 100 -W 5 -w /tmp/alb_capture.pcap 'port 8080 and host <ALB_IP_1> or host <ALB_IP_2>)'
What can happen:
- if the connection is inactive for 75 seconds, Pod closes the connection – sends a packet with
[FIN, ACK]
- ALB tries to send a packet over the same connection:
<ALB_IP> => <POD_IP> | TCP | [PSH, ACK] Seq=1 Ack=2 Len=...
- and Pod responds with a packet with
[RST]
(reset):<POD_IP> => <ALB_IP> | TCP | [RST] Seq=2
In this case, ALB will return a 503 error to the client.
Therefore, to begin with, just set the default 60 seconds on ALB (less than on the backend). Or increase it on the backend.
Where did the 600 seconds on Ingress come from?
I did a little digging here, because I didn’t immediately find where the 600 seconds for ALB came from.
The default timeout in AWS ALB is 60 seconds, see Configure the idle connection timeout for your Classic Load Balancer.
We use a single Ingress that creates the AWS ALB, and to which other Ingresses
connect via the alb.ingress.kubernetes.io/group.name
annotation (see Kubernetes: a single AWS Load Balancer for different Kubernetes Ingresses).
In this main Ingress, we don’t have any attributes to change the idle timeout:
... alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket={{ .Values.alb_aws_logs_s3_bucket }} alb.ingress.kubernetes.io/actions.default-action: > {"Type":"fixed-response","FixedResponseConfig":{"ContentType":"text/plain","StatusCode":"200","MessageBody":"It works!"}} ...
Although this can be done through Custom atributes – there is an example of how to set 600 seconds, but this is custom, not the default for ALB Ingress Controller.
That is, the default should be 60.
I tried to set 60 seconds simply through this Custom – and got the error “conflicting load balancer attributes idle_timeout.timeout_seconds“:
... aws-load-balancer-controller-6f7576c58b-5nmp7:aws-load-balancer-controller {"level":"error","ts":"2025-06-20T10:39:51Z","msg":"Reconciler error","controller":"ingress","object":{"name":"ops-1-33-external-alb"},"namespace":"","name":"ops-1-33-external-alb","reconcileID":"24bf8a2e-72ca-4008-8308-0f3b5595649c","error":"conflicting load balancer attributes idle_timeout.timeout_seconds: 60 | 600"} ...
Then I decided to check just all Ingresses in all Namespaces:
$ kk get ingress -A -o yaml | grep idle_timeout alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600 alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600
And then, without grep
, I just found the name Ingress with this attribute:
... - apiVersion: networking.k8s.io/v1 kind: Ingress metadata: annotations: ... alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=600 ... name: atlas-victoriametrics-victoria-metrics-auth ...
I don’t remember why I added it, but it came out a little sideways.
Well, it was changed, and 503s became much less frequent, but still sometimes occurred.
Issue #2: 503 during deployments
The 503 has arrived again.
Now we can see that there was indeed a deployment at this time:
Although Deployment has maxSurge 100%
and maxUnavailable:
0
– see Rolling Update:
... strategy: type: RollingUpdate rollingUpdate: maxSurge: 100% # run all replicas before stop old Pods maxUnavailable: 0 # keep all old Pods until new Pods will be Running (passed the readinessProbe) ...
That is, even if there is only one Pod on the Dev environment, the new one has to start first, pass all the probes, and only then the old one will be deleted.
But we still get a 503 when deploying to Dev.
Kubernetes, AWS TargetGroup and targets registration
What does the process of adding to Target Groups look like?
- create a new Pod
kubelet
checks thereadinessProbe
- until the
readinessProbe
is passed – Pod inReady
status== False
- until the
- when
readinessProbe
passes – Kubernetes updates Endpoints and adds Pod IP to the address list - ALB Controller sees a new IP in Endpoints, and starts the process of adding this IP to the TargetGroup
- after registration in the TargetGroup, this IP does not immediately receive traffic, but switches to the Initial status
- ALB starts performing its Health checks
- when the Health check passes, the target becomes Healthy and traffic is routed to it
Kubernetes, AWS TargetGroup, and targets deregistration
But there is a nuance in how targets are removed from the Target Group:
- during a deployment from Rolling Update – Kubernetes creates a new Pod, and waits until it becomes Ready (passes
readinessProbe
) - after that, Kubernetes starts deleting the old Pod – it goes into Terminating status
- at the same time, the new IP of the new Pod, which was added to the ALB Target Group, can still pass Health checks, that is, be in the Initial status
- and the target of the Pod that has already started Terminating goes to the Draining status and is then removed from the Target Group
And this is where we can catch 503s.
Testing.
Let’s try to solve this problem:
- make a deploy
- watch the Pods status:
- a new Pod becomes
Ready
, and the old Pod starts to be killed
- at this time, check the ALB to see if the new target has passed the Initial and what is the status of the old target
- a new Pod becomes
What we have before deploying:
The Pod itself:
$ kk get pod -l app=backend-api -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES backend-api-deployment-849557c54b-jmkz4 1/1 Running 0 44m 10.0.47.48 ip-10-0-38-184.ec2.internal <none> <none>
And Endpoints:
$ kk get endpoints backend-api-service Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice NAME ENDPOINTS AGE backend-api-service 10.0.47.48:8000 7d3h
There is now one target in the TargetGroup:
Run the deploy, and have a new Pod in Ready
and Running
, and the old one in Terminating
:
And in the corresponding TargetGroup we have a new target in the Initial status, and the old one is already in Draining:
However, it was not possible to catch 503s with
curl
in a 1-second loop – the health checks passed too quickly.
Solution: Pod readiness gate
The Pod readiness gate adds another check to the Pod: until the corresponding Target in the ALB passes the Health check, the old Pod will not be deleted:
The condition status on a pod will be set to
True
only when the corresponding target in the ALB/NLB target group shows a health state of “Healthy”.
But this only works if you have the target_type
== IP.
Include the Readiness gate in the Kubernetes Namespace:
$ kubectl label namespace dev-backend-api-ns elbv2.k8s.aws/pod-readiness-gate-inject=enabled namespace/dev-backend-api-ns labeled
Let’s deploy it again and see.
Pod has moved to Ready
, but it is not yet ready in Readiness Gates – 0/1
:
The old target is not draining, while the new one is in Initial:
Because the old Pod hasn’t started deleting yet:
As soon as the new target became Healthy:
The old Pod started to be deleted as well:
Problem 3: Karpenter consolidation
A few more times, 503 occurred when Karpenter deleted a WorkerNode.
Let’s check the pods and nodes:
sum(kube_pod_info{namespace="dev-backend-api-ns", pod=~"backend-api.*"}) by (pod, node)
- old Pod on the ip-10-0-32-209.ec2.internal
- new Pod on the ip-10-0-36-145.ec2.internal
And at this time the WorkerNodes were rebalanced – the ip-10-0-32-209.ec2.internal instance was deleted due to "reason": "underutilized"
:
Let’s recall what the Karpenter consolidation process looks like – see Karpenter Disruption Flow in the post Kubernetes: Ensuring High Availability for Pods:
- Karpenter sees an underutilized EC2
- he adds a Node
taint
withNoSchedule
to prevent new pods from being created on this WorkerNode (see Kubernetes: Pods and WorkerNodes – controlling the placement of pods on nodes) - executes Pod Eviction to remove existing Pods from this node
- Kubernetes receives the Eviction event, and starts the process of deleting containers – puts them in the Terminating state by sending the
SIGTERM
signal - ALB Controller sees that the Pod is in Terminating status and starts the process of deleting the target – puts it in the Draining state
- at the same time, Kubernetes sees that the number of
replicas
in Deployment is not equal to the desired state, and creates a new Pod to replace the old one- the new Pod cannot be run on the old WorkerNode, because it has a
NoSchedule
taint, and if there is no space on the existing WorkerNodes – Karpenter creates a new EC2 instance, which takes a couple of minutes – this will also increase the window for 503
- the new Pod cannot be run on the old WorkerNode, because it has a
- the new Pod is in
Pending
orContainerCreating
status at this time, and is not added to the LoadBalancer TargetGroup – while the old target is already being deleted- in fact, it is not deleted immediately, because Draining state is when ALB stops creating new connections to the target, but gives time to complete the existing ones – see Registered targets
- respectively, at this time ALB simply has nowhere to send traffic
- …
- we are getting 503
Verification
We can try to reproduce the error:
- perform
node drain
where a Dev Pod is - look at the status of the Pods
- look at the ALB Targets
Find the Pod’s Node of the Backend API in the Dev Namespace:
$ kk -n dev-backend-api-ns get pod -l app=backend-api -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES backend-api-deployment-7b5fd6bb9b-mrm2m 1/1 Running 0 7m56s 10.0.42.9 ip-10-0-40-204.ec2.internal <none> 1/1
Let’s drain
this WorkerNode:
$ kubectl drain --ignore-daemonsets ip-10-0-40-204.ec2.internal
We see that the old Pod is in Terminating
status, and the new one is in ContainerCreating
status:
And at this time, we have only one target in the TargetGroup, and it is already in the Draining status – because the old Pod is in Terminating, and the new one has not yet been added to the TargetGroup – because Endpoints will be updated only when the new Pod is Ready:
And at this time we caught a bunch of 503s:
$ while true; curl -X GET -s -o /dev/null -w "%{http_code}\n" https://dev.api.example.co/ping; do sleep 1; done | grep 503 503 503 503 ...
Solution: PodDisruptionBudget
How to prevent it?
Have a PodDisruptionBudget
, more details in the same post Kubernetes: Ensuring High Availability for Pods.
But in this particular case, the errors occur in the Dev environment, where there is only one container with the API and no PDB – and on Staging and Production they are, so we no longer get 503 errors there:
apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: backend-api-pdb spec: minAvailable: {{ .Values.deployment_api.poddisruptionbudget.min_avail }} selector: matchLabels: app: backend-api
Actually, the last 503 error is solved as well.