How to Scale Down AWS EKS Clusters Nightly to Lower EC2 Costs

Are you running non-production EKS clusters on EC2 instances? In this guide, we'll show you how you can scale them down nightly to lower your cloud costs.

Patrick Londa
Author
Oct 4, 2022
 • 
4
 min read
Share this post

Amazon Elastic Kubernetes Service (EKS) enables organizations to run and scale Kubernetes applications. The main ways to run EKS nodes are by using EC2 instances, AWS Fargate, or using AWS Outposts to run on-premises. For this post, we’re only going to be covering the scale-down process for EKS nodes running on EC2.

For each cluster you are running, you pay a basic hourly rate per cluster ($0.10 per hour) and also pay for the cost of running nodes on EC2 instances and any associated volumes. Hourly EC2 costs vary depending on machine type.

If you have non-production clusters for testing or QA purposes, you might not need them to be available for 24 hours per day. By setting up a process to scale down these clusters on a nightly basis, you can reduce your hourly EC2 computing costs.

In this guide, we’ll show you how you can reduce your cloud costs by using the AWS CLI to scale down your EKS clusters nightly.

Blink Automation: Scale Down AWS EKS Clusters Nightly with Slack Approval
AWS + Slack
Try This Automation

Step 1: Send an alert to the cluster owner

Pausing EKS clusters is impossible as the EKS control plane has no concept of pausing or stopping and is AWS-controlled. Instead of actually pausing, you achieve the same result by setting the node group sizes to 0 when the workload is low to lower AWS costs.

Any change in an EKS cluster will disrupt running jobs, so it’s important to send an alert to anyone working with the cluster to warning them about the imminent scaled-down scheduled for the night. If a team is working past normal hours, this alert gives them the opportunity to intervene and prevent the scale-down.

Step 2: List all node pools

After receiving an OK to scale EKS clusters, you must list all node pools or node groups. EKS clusters have both managed and unmanaged node groups. By using the following CLI command, you can list both types of node groups:

eksctl get nodegroup --cluster-name my-cluster
  • eksctl get nodegroup — shows all node pools in the EKS cluster
  • my-cluster — cluster name

Step 3. Scale all node groups to 0

Next, autoscale all node groups to 0 to shut down worker nodes. EKS has a default Cluster Autoscaler that uses EC2 Auto Scaling Group (ASG) for scaling managed node groups to and from 1.

If you want to scale unmanaged node groups to and from 0. By default, the Cluster Autoscaler doesn’t discover the AGS unmanaged node groups. You’ll have to add tags to AGS to detect the AGS and deploy Cluster Autoscaler:

export ASG_NAME=$(aws autoscaling \
  describe-auto-scaling-groups \
  --query "AutoScalingGroups[? Tags[? (Key=='alpha.eksctl.io/nodegroup-name') && 
Value=='unmanaged-ml-nodegroup']].[AutoScalingGroupName]" \   
  --output text)

echo ${ASG_NAME}

aws autoscaling create-or-update-tags \
  --tags 
ResourceId=${ASG_NAME},ResourceType=auto-scaling-group,
Key=k8s.io/cluster-autoscaler/scale-to-from-zero,Value=owned,PropagateAtLaunch=true

aws autoscaling create-or-update-tags \
  --tags 
ResourceId=${ASG_NAME},ResourceType=auto-scaling-group,
Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=true

With this CLI command, you can scale node groups to 0. The syntax is the same for both managed and unmanaged node groups:

eksctl scale nodegroup --cluster=<clusterName> --nodes=<desiredCount> 
--name=<nodegroupName> [ --nodes-min=<0> ] [ --nodes-max=<1> ]

You can also scale a node group using the config file passed to --config-file and to the node group name scaled with --name. Eksctl will locate the config file to discover that node group and its configuration values.

An error will occur if the desired number of nodes is not within the current maximum and minimum range. You can use --nodes-min and --nodes-max flags to represent values.

Eksctl can scale multiple node groups found in a config file. But the rules for scaling a single node group and multiple node groups would be the same.

Step 4: Scale back all node groups to default

For restarting worker nodes in the morning, you need to scale back all node groups to their default position. If you have labels defined on your node groups, you’ll need AGS tags to scale up. For instance, a node group has the following labels:

nodeGroups:
  - name: ng1-public
    ...    
    labels:      
      my-cool-label: pizza
    taints:
      feaster: "true:NoSchedule"

You need to add the following ASG tags:

nodeGroups:
  - name: ng1-public
    ...
    labels:
      my-cool-label: pizza
    taints:
      feaster: "true:NoSchedule"
    tags:
      k8s.io/cluster-autoscaler/node-template/label/my-cool-label: pizza
      k8s.io/cluster-autoscaler/node-template/taint/feaster: "true:NoSchedule"

The Cluster Autoscaler assumes all nodes in a group to be equivalent. So, for zone-specific workloads, you’ll need to create a separate node group for each particular zone for scaling back.

Use the same CLI command in step 3 to scale back to default, but remember to change the maximum and minimum values.

Automate a Cluster Scale-Down Process with Blink Copilot

You can consistently lower your costs by pausing non-production clusters at night, but only if the process isn’t time-intensive. In the method described above, you need to run these commands for each cluster and node pool for each region. If pausing your clusters takes too much time, it might not be worth doing it every day.

With Blink, you can either import this pre-built automation from the Blink Library, or easily generate a custom workflow by typing a prompt into Blink Copilot.

 

When this automation runs, it executes the following actions:

  1. Asks for approval from the DevOps team in a Slack channel to scale down for the night.
  2. If denied, then no action is taken.
  3. If approved or not responded to for 2 hours, it will stop the non-production clusters.
  4. After 10 hours, it will restart these paused EKS clusters.

With one simple automation, you could start saving on your cloud costs.

You can try typing your own prompts in Blink Copilot here. Automation has never been easier.

Get started with Blink today to see how easy automation can be.

Automate your security operations everywhere.

Blink is secure, decentralized, and cloud-native. 
Get modern cloud and security operations today.

Get a Demo