How to Pause Your AKS Clusters Nightly

If you are running non-production AKS clusters, you might not need them operating 24 hours per day. In this guide, we'll show you how to stop your clusters at night and restart them in the morning to lower your Azure costs.

Patrick Londa
Author
Sep 23, 2022
 • 
5
 min read
Share this post

Azure Kubernetes Service (AKS) enables organizations to run and scale Kubernetes applications, either in the cloud or on-premises. 

The cost of running AKS clusters varies depending on the machine type and sizes for cluster nodes. For example, running an A2 instance with only 2 cores is billed at $0.12/hr. Running that instance for a full month would cost $87.60.

If you have non-production clusters for testing or QA purposes, you might not need them to be available for 24 hours per day. If your team has somewhat predictable work hours, you can set up a process to stop these clusters on a nightly basis and start them back up in the morning to save costs.

In this guide, we’ll show you how you can reduce your cloud costs by using the Azure CLI to scale down your AKS clusters nightly.

“microsoft
“Slack
Blink Automation: Pause AKS Clusters Nightly with Slack Approval
Azure + Slack
Try This Automation

Step 1: Send an alert to the cluster owner

Before you stop AKS clusters, it’s important to alert any stakeholder who may be working with that cluster so they have a chance to intervene.

If your team is working unusual hours and has important jobs running, stopping the cluster could be disruptive. If no one objects to stopping the cluster for the night, then you can move on to the next step.

Step 2: Stop the cluster

To stop your AKS cluster, you can run the “az aks stop” command and specify which cluster you want to stop. This will stop running both the AKS cluster’s nodes and control plane.

az aks stop --name myAKSCluster --resource-group myResourceGroup

To verify that the cluster has successfully stopped, you can run the following command

az aks show --name myAKSCluster --resource-group myResourceGroup

In the output, it should show a “powerState” of “Stopped”:

{
[...]
  "nodeResourceGroup": "MC_myResourceGroup_myAKSCluster_westus2",
  "powerState":{    
    "code":"Stopped"  
  },
  "privateFqdn": null,
  "provisioningState": "Succeeded",
  "resourceGroup": "myResourceGroup",
[...]
}

With the cluster stopped, you’ll have lower costs since you won't be paying for computing time during your team’s off hours. 

Step 3: Start the cluster

The next morning, or whenever you want to resume using the cluster, you can start it up again using this command: 

az aks start --name myAKSCluster --resource-group myResourceGroup

Just like in the last step, you can use the “az aks show” command to validate that the “powerState” is now “Running”. That output looks like this:

{
[...]
  "nodeResourceGroup": "MC_myResourceGroup_myAKSCluster_westus2",
  "powerState":{
    "code":"Running"
  },
  "privateFqdn": null,
  "provisioningState": "Succeeded",
  "resourceGroup": "myResourceGroup",
[...]
}

Stopping and starting your AKS cluster on a nightly basis is a good way to lower your Azure Costs. To avoid errors, It is recommended that you don’t stop and start your cluster repeatedly (more than once within 30 minutes).

Automating AKS Cluster Pauses with Blink

You can consistently lower your costs by pausing non-production clusters at night, but only if the process isn’t time-intensive. 

In the method described above, you need to run these commands for each cluster. If pausing your clusters takes too much time or requires manual execution, it might not become a habit and consistently lower your costs.

With Blink, you can run this automation to kick off this scale down at a scheduled time, send notifications, and wait for approvals.

Blink Automation: Stop AKS Cluster
Blink Automation: Stop AKS Cluster

When you run this automation, it executes the following steps:

  1. Sends an approval request to a designated slack channel or email to shut down the cluster for the night.
  2. If denied, no action happens.
  3. If the request is approved or times out, it scales the node groups to zero.

You can easily customize this automation to add other steps. For example, you could add steps to scale the cluster back up after a certain amount of time.

There are over 5K automations in the Blink library that you can use right away, or you can build new automations from scratch with drag-and-drop actions to fit your unique needs.

Get started with Blink today to see how easy automation can be.