How to Pause Your GKE Cluster Nightly

If you have non-production GKE clusters, you might be able to reduce your cloud costs by pausing them at night and restarting them the next morning. In this guide, we show you the steps to do it.

Patrick Londa
Author
Jul 21, 2022
 • 
4
 min read
Share this post

Google Kubernetes Engine (GKE) clusters consist of multiple nodes and at least one control plane. This allows you to manage, deploy, and scale your containerized applications using the Google infrastructure for a recurring charge of $0.10 per cluster per hour in one second increments. You also pay for the computing and storage resources running on that cluster.

If your organization has non-production clusters for testing or QA for example, you can pause your clusters on a nightly basis and restart them in the morning to save costs. 

In this guide, we’ll walk you through the steps of pausing and restarting your GKE cluster to reduce your cloud costs, showing the steps for both the gcloud CLI and the Console options.

Blink Automation: Pause Your GKE Cluster Nightly with Slack Approval
GCP + Slack
Try This Automation

Step 1: Send a Notification to Your Team Members 

The first step to pausing a cluster for the night is sending a notification to the cluster owner alerting them of the shutdown. This gives the cluster owner and anyone who relies on the cluster an opportunity to stop or prolong the shutdown. 

Pausing a GKE cluster stops all cluster Compute Engine VMs, which means that a paused cluster will cause any currently running jobs to fail. Sending a notification ensures that you are not disrupting anyone’s work by pausing the cluster for the night.

Step 2: Get Your Cluster’s Node Groups

After you have received the "go ahead" to pause your GKE cluster, you need to get all node groups.  Node groups or node pools use a Nodeconfig specification

Using the gcloud CLI:

You can view your node pools in gcloud using the "gcloud container node-pools list" command followed by the cluster name ("non-prod-cluster" in this example):

gcloud container node-pools list --cluster=non-prod-cluster

Using the Google Cloud Console: 

You can access the Google Kubernetes Engine page using these steps:

  1. Go to the cluster list and select the name of the cluster you wish to stop.
  2. Click the "Nodes" tab.
  3. Under "Nodes Pool," click the name of the node pool you wish to view.

Step 3: Pausing Your GKE Cluster

Next, you need to set all of the node pool sizes to 0.

Using the gcloud CLI:

You can resize a cluster's node pools by running the "gcloud container clusters resize" command. Follow this command with the name of the cluster, then the name of the pool, and the number of nodes for each region the pool is in. Set the number of nodes to "0":

gcloud container clusters resize non-prod-cluster --node-pool non-prod-pool \    
    --num-nodes 0

You will need to repeat this command for each node pool. If your cluster has only one node pool, you don’t need to specify which pool in the command.

Using the Google Cloud Console: 

These are the steps in the Google Cloud Console:

  1. Go to the GKE page and select the name of the cluster you want to pause.
  2. Click on the "Nodes" tab.
  3. In the "Node Pools" section, click the name of the pool you wish to resize.
  4. Click on "Resize."
  5. In the "Number of Nodes" field, enter "0" and click "Resize."

Step 4: Restarting Your GKE Cluster

To restart your GKE cluster the following morning, you simply have to reset all your nodes to their default sizes. Redo steps 2 through 3 at the beginning of the day to relocate your GKE cluster node pools and resize them to their original values instead of "0." Do this for every GKE cluster and node pool for each region when you are starting your GKE cluster again.

Automating Cluster Pauses with Blink

By pausing non-production clusters at night, you can consistently lower your costs, but only if the process isn’t time-intensive. In the method described above, you need to run these commands for each cluster and node pool for each region. At a certain point, it can feel like it’s too time-consuming to be worth doing.

With Blink, you can automate this process so it kicks off at a scheduled time, automatically sends notifications, waits for approvals, and restarts everything in the morning.

This automation in the Blink library does just that:

Blink Automation: Pause GKE Cluster
Blink Automation: Pause GKE Cluster

When this automation runs, it does the following steps:

  1. Asks via Slack whether to proceed with pausing the GKE cluster.
  2. If the answer is no, it stops running.
  3. If the answer is yes, then it lists all node pools in the GKE cluster.
  4. Next, it checks if the node pools have autoscaling disabled.
  5. If autoscaling is enabled, then it skips scaling down those node pools.
  6. If autoscaling is disabled, then it scales down all node pools to the minimum node size specified.

This simple automation can run nightly. You can also customize it to scale back up after a certain time or with conditional logic.

With over 5K automations in the Blink library, it's easy to start automating or you can create automations from scratch to fit your unique needs.

Start a free trial of Blink today and see how easy automation can be.

Automate security beyond the SOC.

Transform your security and platform operations today with 5000+ no-code automations.

Start a Free Trial