How to Stop Kubernetes Cluster Nightly to Save Costs

Reduce cloud costs by pausing non-production GKE clusters at night. Learn how to pause and restart them to save money.

Patrick Londa
Author
Jul 21, 2022
 • 
4
 min read
Share this post

Google Kubernetes Engine (GKE) clusters consist of multiple nodes and at least one control plane. This allows you to manage, deploy, and scale your containerized applications using the Google infrastructure for a recurring charge of $0.10 per cluster per hour in one second increments. You also pay for the computing and storage resources running on that cluster.

If your organization has non-production clusters for testing or QA for example, you can pause your clusters on a nightly basis and restart them in the morning to save costs. 

In this guide, we’ll walk you through the steps of pausing and restarting your GKE cluster to reduce your cloud costs, showing the steps for both the gcloud CLI and the Console options.

Blink Automation: Pause Your GKE Cluster Nightly with Slack Approval
GCP + Slack
Try This Automation

Step 1: Send a Notification to Your Team Members 

The first step to pausing a cluster for the night is sending a notification to the cluster owner alerting them of the shutdown. This gives the cluster owner and anyone who relies on the cluster an opportunity to stop or prolong the shutdown. 

Pausing a GKE cluster stops all cluster Compute Engine VMs, which means that a paused cluster will cause any currently running jobs to fail. Sending a notification ensures that you are not disrupting anyone’s work by pausing the cluster for the night.

Step 2: Get Your Cluster’s Node Groups

After you have received the "go ahead" to pause your GKE cluster, you need to get all node groups.  Node groups or node pools use a Nodeconfig specification

Using the gcloud CLI:

You can view your node pools in gcloud using the "gcloud container node-pools list" command followed by the cluster name ("non-prod-cluster" in this example):

gcloud container node-pools list --cluster=non-prod-cluster

Using the Google Cloud Console: 

You can access the Google Kubernetes Engine page using these steps:

  1. Go to the cluster list and select the name of the cluster you wish to stop.
  2. Click the "Nodes" tab.
  3. Under "Nodes Pool," click the name of the node pool you wish to view.

Step 3: Pausing Your GKE Cluster

Next, you need to set all of the node pool sizes to 0.

Using the gcloud CLI:

You can resize a cluster's node pools by running the "gcloud container clusters resize" command. Follow this command with the name of the cluster, then the name of the pool, and the number of nodes for each region the pool is in. Set the number of nodes to "0":

gcloud container clusters resize non-prod-cluster --node-pool non-prod-pool \    
    --num-nodes 0

You will need to repeat this command for each node pool. If your cluster has only one node pool, you don’t need to specify which pool in the command.

Using the Google Cloud Console: 

These are the steps in the Google Cloud Console:

  1. Go to the GKE page and select the name of the cluster you want to pause.
  2. Click on the "Nodes" tab.
  3. In the "Node Pools" section, click the name of the pool you wish to resize.
  4. Click on "Resize."
  5. In the "Number of Nodes" field, enter "0" and click "Resize."

Step 4: Restarting Your GKE Cluster

To restart your GKE cluster the following morning, you simply have to reset all your nodes to their default sizes. Redo steps 2 through 3 at the beginning of the day to relocate your GKE cluster node pools and resize them to their original values instead of "0." Do this for every GKE cluster and node pool for each region when you are starting your GKE cluster again.

Automating Cluster Pauses with Blink

By pausing non-production clusters at night, you can consistently lower your costs, but only if the process isn’t time-intensive. In the method described above, you need to run these commands for each cluster and node pool for each region. At a certain point, it can feel like it’s too time-consuming to be worth doing.

With Blink Copilot, you can easily automate this process so it kicks off at a scheduled time, automatically sends notifications, waits for approvals, and restarts everything in the morning.

 

Just type a prompt to create this automated workflow. It executes the following steps:

  1. Every night, send a question via Slack to the DevOps team asking if the non-production GKE clusters can be paused.
  2. If the answer is no, it stops running.
  3. If the answer is yes, or if there is no response in 2 hours, it pauses the clusters for 10 hours.
  4. After 10 hours, it restarts the clusters.

You can find this workflow pre-built in the Blink Library, or you can generate it by typing a prompt.

You can try typing any of your own prompts here to see how easy automating workflows can be.

Get started with Blink today and see how easy automation can be.