Thursday 16 November 2017

Extracting S3 bucket sizes using the AWS CLI

A quick one liner for printing out the size (in bytes) of S3 StandardStorage buckets in your account (using bash):

for name in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do size=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time $(date --date="yesterday" +%Y-%m-%d) --end-time $(date +%Y-%m-%d) --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=$name Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum' | sed 's/null/0.0/' | cut -d. -f1); echo "$name,$size"; done


Some of the individual components may be independently useful, starting off with listing buckets:

aws s3api list-buckets --query 'Buckets[*].Name' --output text

Pretty straight forward, uses the s3api in the CLI to list buckets returning only their names in text format.


To get the bucket sizes we are actually querying CloudWatch (using get-metric-statistics) which provides bucket size and object count metrics for S3:

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time $(date --date="yesterday" +%Y-%m-%d) --end-time $(date +%Y-%m-%d) --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=$name Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum' | sed 's/null/0.0/' | cut -d. -f1

Most of this is fairly straight forward some of the parameters worth explaining further:
--start-time: we are setting the start date to yesterday and formatting it in yyyy-mm-dd format, this ensures we get at least one data point containing the bucket size
--period: 86400 = 1 day as the bucket size metric is only published once a day (at 00:00:00)
--query: we are only interested in the the actual value for one of the metric datapoints

The sed and cut commands are just to clean up formatting, if the bucket is empty the CloudWatch metric request will return null so we replace null with 0.0. The bucket size metric value will always end in .0 so we truncate it using cut (yes there are other ways of doing this).

If you are only wanting to check the size of a single bucket or the bucket size on a specific date you can use a simpler version:

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time 2017-11-15 --end-time 2017-11-16 --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=MY_BUCKET_NAME Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum'