How to Detect AWS DynamoDB Tables with Stale Data

Learn how to identify and manage stale data in Amazon DynamoDB to save costs and optimize your database in our guide.

Patrick Londa
Author
May 19, 2022
 • 
6
 min read
Share this post

With AWS and public clouds in general, organizations have been able to reduce their total cost of ownership (TCO), and shift more of their focus to their core business over operational activities. Managed cloud services that can scale and unscale on demand offer a consistent quality of service and more flexible cost controls.

Amazon DynamoDB is a popular managed database service that handles workloads associated with read/write performance, storing and retrieving any amount of data and serving any level of request traffic.

With a service like DynamoDB, your team doesn’t have to personally handle hardware provisioning, setup and configuration, replication, or cluster scaling. Still, there are some best practices for optimizing your usage and keeping your costs optimal.

In this guide, we’ll explain a bit about DynamoDB and then look at how you can handle stale data and ensure that it doesn’t turn into wasted spend.

Blink Automation: Ensure AWS DynamoDB Tables with Stale Data are Reviewed
Blink + AWS
Try This Automation

What Is the Structure of DynamoDB?

Your data in DynamoDB is stored on solid-state disks (SSDs) and is automatically replicated across multiple Availability Zones in an AWS Region. This results in data durability, consistent performance, and high availability. It also means that when your application saves data back to a DynamoDB table, it can take a second or so to make reads consistent across every Availability Zone.

Like other AWS services, Amazon DynamoDB is a regional service. This makes it possible to have one table called Accounts in an East Coast region and another table also called Accounts in a West Coast region that get treated as separate entities. If you want to keep DynamoDB tables in sync across AWS Regions, then you need to use Global tables that are multi-regional.

Let’s look at the two different types of consistency reads you can issue to a DynamoDB.

Eventually Consistent Reads

Eventually consistent reads performed on DynamoDB tables might not bring back the most current results. Because of that, it’s possible for users to enter a query and return data from a database table that has an update. The consistency of reads depends on the speed at which each DynamoDB table gets an update.

Strongly Consistent Reads

A strongly consistent read brings back the most recently updated data from DynamoDB. It includes all updates from every successful write operation. But there are some disadvantages to using strongly consistent reads.

A network delay outage may prevent the use of a strongly consistent read and give you a server error instead. There may also be higher latency issues associated with a strongly consistent read since you’re waiting until every table gets an update.

It should be noted that global secondary indexes do not support strongly consistent reads. You also use up more throughput capacity compared to an eventually consistent read. DynamoDB uses eventually consistent reads by default unless you change the settings.

When your application writes data to a DynamoDB table and receives an HTTP 200 response (OK), the write has occurred and is durable. The data is eventually consistent across all storage locations, usually within one second or less.

Stale Data in a DynamoDB Tables

The autoscalabilty of DynamoDB is what enables organizations to closely match spending on resources with their required data capacity. If your tables include stale data, or data which has not changed over a certain period, this can be a sign that you are still spending more than you need too.

 If you can instead move or remove this old data, you’ll lower your required data capacity and subsequently your costs. Before we discuss what you can do with stale data, let’s locate it first.

Finding Stale Data in DynamoDB Tables

Any modifications made to a DynamoDB table gets captured in a stream record. The following query provides a way for you to review stream changes to determine if a specific table contains stale data. Make sure you enable the use of streams on relevant tables through the AWS console. Next, modify the following query to locate old data within a table using PartiQL for DynamoDB syntax.

Select partition, region, account, name, account_id
From dynamodbtable
Where date_part('day', now() - (latest_stream_label::timestamptz)) > NumberOfDays

"NumberOfDays" is a threshold you can set to consider data as stale data, meaning that there have been no changes during the specified “NumberOfDays”

Relocating or Removing Stale Data

Once you have located stale data in your database, you have a few options.

  1. Relocate Data to S3 Storage

Since this data has not been changed recently, then you might not need the high level of performance you get with DynamoDB. Moving this old data to another storage solution, such as S3, offers high data durability at a lower cost.

  1. Delete Stale Data

If you don’t need to archive or keep this old data, then you can simply delete it. If this approach fits your use case, you can even set up Time to Live (TTL)  to automatically remove old data from DynamoDB tables.

  1. Do Nothing and Keep the Data

If for some reason you want to keep this stale data in AWS DynamoDB, then you are accepting added costs for this moving forward. You might also see some performance degradation over time.

Handle Stale Data Regularly with Blink

You can run this check for stale data manually, but you might have a hard time building this check into your regular routine. 

With Blink, you can schedule automated checks like this one to run regularly so you get the benefit without the context-switching.

Blink Automation: Ensure Tables with Stale Data are Reviewed in AWS
Blink Automation: Ensure Tables with Stale Data are Reviewed in AWS

When this cost optimization automation runs, it executes the following steps:

  1. Checks for any Tables with Stale Data and compiles a report.
  2. Sends results via email.

You can use and customize any of the 5K automations in the Blink library, or build automations from scratch to fit your unique needs.

Get started with Blink today to see how easy automation can be.