Disruption budgets were introduced in version 0.36, and it looks like a very interesting tool to limit Karpenter from recreating WorkerNodes.
For example, in my case, we don’t want EC2 instances to be killed during business hours in the US because we have customers there, so we currently have consolidationPolicy=whenEmpty
to prevent “unnecessary” deletion of servers and Pods on them.
Instead, with Disruption budgets, we can configure policies in such a way that operations with WhenEmpty
are allowed in one period of time, and WhenEmptyOrUnderutilized
in another.
See also Kubernetes: ensuring High Availability for Pods – because when using Karpenter, even with Disruption budgets configured, you need to have pods with Topology Spread and PodDisruptionBudget configured accordingly.
Contents
Karpenter Disruption types
Documentation – Automated Graceful Methods.
First, let’s see in which cases Disruption occurs at all:
- Drift: occurs when there is a difference between the created NodePools or EC2NodeClass configurations and the existing WorkerNodes – then Karpenter will start recreating EC2 to bring them in line with the specified parameters
- Interruption: occurs if Karpenter receives an AWS Event about an instance will be terminated, for example, if it is a Spot
- Consolidation: if we have a Consolidation set to
WhenEmptyOrUnderutilized
orWhenEmpty
, and Karpenter moves our Pods to other WorkerNodes- Note: we have Karpenter v1.0, so the policy is called
WhenEmptyOrUnderutilized
, for the v0.36 or v0.37 it’sWhenUnderutilized
- Note: we have Karpenter v1.0, so the policy is called
Karpenter Disruption Budgets
With the help of Disruption budgets, we can very flexibly configure when and what operations Karpenter can perform, and set a limit on how many WorkerNodes will be deleted at the same time.
Documentation – NodePool Disruption Budgets.
The configuration format is quite simple:
budgets: - nodes: "20%" reasons: - "Empty" schedule: "@daily" duration: 10m
Here we set:
- allow deletion of WorkerNodes for 20% of the total number
- for the operation when Disruption is triggered by the
WhenEmpty
condition - we do this every day
- for 10 minutes
Parameters here can have values as:
nodes
: as a percentage or a number of nodesreasons
:Drifted
,Underutilized
, orEmpty
schedule
: the schedule by which the rule is applied, in UTC (other timezones are not yet supported), see Kubernetes Schedule syntaxduration
: and how long the rule is in effect, for example1h15m
Also, it is not necessary to set all the parameters.
For example, we can describe two such budgets:
- nodes: "25%" - nodes: "10"
Then we will have both rules working all the time, and the first one limits the number of nodes to 25% of the total number, and the second one limits the number of nodes to no more than 10 instances if we have more than 40 servers.
Also, Budgets can be combined, and if you set several of them, the limits will be taken according to the most strict one.
In the first example, we apply the rule for 20% of nodes and the WhenEmpty
condition, and the rest of the time the default disruption rules will work – that is, 10% of the total number of servers with the specified consolidationPolicy
.
Therefore, we can write the rule as follows:
budgets: - nodes: "20%" reasons: - "Empty" schedule: "@daily" duration: 10m - nodes: 0
Here, the last rule works all the time, and will be a kind of fuse: we prohibit everything, but allow disruptions to be executed according to the WhenEmpty
policy for 10 minutes once a day starting from 00:00 UTC.
Disruption Budgets example
Going back to my task:
- we have a Backend API in Kubernetes on a dedicated NodePool, and our customers are mostly from the USA, so we want to minimize the down-scaling of WorkerNodes during US business hours
- to do this, we want to block all operations on
WhenUnderutilized
during working hours in the USA Central Time’s - Karpenter’s
schedule
uses the UTC zone, so the start of the working day in the USA Central Time 9:00 is 15:00 UTC
- to do this, we want to block all operations on
- operations with
WhenEmpty
are allowed at any time, but only 1 WorkerNode at a time Drift
– similarly, because when I deploy changes, I want to see the result immediately
So, in fact, we need to set two budgets:
- for
Underutilized
– we prohibit everything from Monday to Friday for 9 hours starting from 15:00 UTC - for
Empty
andDrifted
– allow at any time, but only 1 node at a time, not the default 10%
Then our NodePool will look like this:
apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: backend1a spec: template: metadata: labels: created-by: karpenter component: devops spec: taints: - key: BackendOnly operator: Exists effect: NoSchedule nodeClassRef: group: karpenter.k8s.aws kind: EC2NodeClass name: defaultv1a requirements: - key: karpenter.k8s.aws/instance-family operator: In values: ["c5"] - key: karpenter.k8s.aws/instance-size operator: In values: ["large", "xlarge"] - key: topology.kubernetes.io/zone operator: In values: ["us-east-1a"] - key: karpenter.sh/capacity-type operator: In values: ["spot", "on-demand"] # total cluster limits limits: cpu: 1000 memory: 1000Gi disruption: consolidationPolicy: WhenEmptyOrUnderutilized consolidateAfter: 600s budgets: - nodes: "0" # block all reasons: - "Underutilized" # if reason == underutilized schedule: "0 15 * * mon-fri" # starting at 15:00 UTC during weekdays duration: 9h # during 9 hours - nodes: "1" # allow by 1 WorkerNode at a time reasons: - "Empty" - "Drifted"
Deploy it, check NodePool:
$ kk describe nodepool backend1a Name: backend1a ... API Version: karpenter.sh/v1 Kind: NodePool ... Spec: Disruption: Budgets: Duration: 9h Nodes: 0 Reasons: Underutilized Schedule: 0 15 * * mon-fri Nodes: 1 Reasons: Empty Drifted Consolidate After: 600s Consolidation Policy: WhenEmptyOrUnderutilized ...
And we can see in the Karpenter’s logs that a Disruption was triggered by WhenUnderutilized
:
karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:48:26.777Z","logger":"controller","message":"disrupting nodeclaim(s) via delete, terminating 1 nodes (2 pods) ip-10-0-42-250.ec2.internal/t3.small/spot","commit":"62a726c","controller":"disruption","namespace":"","name":"","reconcileID":"db2233c3-c64b-41f2-a656-d6a5addeda8a","command-id":"1cd3a8d8-57e9-4107-a701-bd167ed23686","reason":"underutilized"} karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:48:27.016Z","logger":"controller","message":"tainted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-42-250.ec2.internal"},"namespace":"","name":"ip-10-0-42-250.ec2.internal","reconcileID":"f0815e43-94fb-4546-9663-377441677028","taint.Key":"karpenter.sh/disrupted","taint.Value":"","taint.Effect":"NoSchedule"} karpenter-55b845dd4c-tlrdr:controller {"level":"INFO","time":"2024-09-16T10:50:35.212Z","logger":"controller","message":"deleted node","commit":"62a726c","controller":"node.termination","controllerGroup":"","controllerKind":"Node","Node":{"name":"ip-10-0-42-250.ec2.internal"},"namespace":"","name":"ip-10-0-42-250.ec2.internal","reconcileID":"208e5ff7-8371-442a-9c02-919e3525001b"}
Done.