Saturday, 25 August 2018

AWS S3 event aggregation with Lambda and DynamoDB

Introduction

S3 has had event notifications since 2014 and for individual object notifications these events work well with Lambda, allowing you to perform an action on every object event in a bucket. It is harder to use this approach when you want to perform an action a limited number of times or at an aggregated bucket level. An example use case would be refreshing a dependency (like Storage Gateway RefreshCache) when you are expecting a large number of objects events in a bucket. Performing a relatively expensive action for every event is not practical or efficient in this case. This post provides a solution for aggregating these events using Lambda, DynamoDB, and SQS.

The problem

We want to call RefreshCache on our Storage Gateway (SGW) whenever the contents of the S3 bucket it exposes are updated by an external process. If the external process is updating a large number of (small) S3 objects then a large number of S3 events will be triggered. We don't want to overload our SGW with refresh requests so we need a way to aggregate these events to only send occasional refresh requests.

The solution

The solution is fairly simple and uses DynamoDB's Conditional Writes for synchronisation and SQS Message Timers to enable aggregation. When the Lambda function processes a new object event it first checks to see if the event falls within the window of the currently active refresh request. If the event is within the window it will automatically be included when the refresh executes and the event can be ignored. If the event occurred after the last refresh then a new refresh request is sent to an SQS queue with a message timer equal to the refresh window period. This allows for all messages received within a refresh window to be included in a single refresh operation.

Implementation

At a high level we need to create resources (SQS queue, DynamoDB table, Lambda functions), set up permissions (create and assign IAM roles), and apply some configuration (linking Lambda to S3 event notification and SQS queues). This implementation really belongs in a CloudFormation template (and I may actually create one) but I was interested to try and do this entirely via the AWS CLI, masochistic as that may be. If you are not interested in the gory implementation details then skip ahead to 'Creation and deletion script' section

Let's start with the S3 event aggregation piece. We need:
  1. A DynamoDB table to track state
  2. An SQS queue as a destination for aggregated actions
  3. A Lambda function for processing and aggregating the S3 events
  4. IAM permissions for all of the above
As the DynamoDB table and SQS queue are independent we can create these first:
aws dynamodb create-table --table-name S3EventAggregator --attribute-definitions AttributeName=BucketName,AttributeType=S --key-schema AttributeName=BucketName,KeyType=HASH --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=5

aws sqs create-queue --queue-name S3EventAggregatorActionQueue


Naturally this needs to be done with a user that has sufficient permissions and assumes your default region is set.

The Lambda function is a bit trickier as it requires a role to be created before the function can be created. So let's start with the IAM permissions. First let's create a policy allowing DynamoDB GetItem and UpdateItem to be performed on the DynamoDB table we created earlier. To do this we need a JSON file containing the necessary permissions. The dynamo-writer.json file looks like this:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
   "dynamodb:UpdateItem",
   "dynamodb:GetItem"
   ],
            "Resource": "arn:aws:dynamodb:REGION:ACCOUNT_ID:table/S3EventAggregator"
        }
    ]
}

We need to replace REGION and ACCOUNT_ID with the relevant values. As we are aiming at using the command line for this exercise, let's use STS to retrieve our account ID, set our region, and then use sed to substitute both variables:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) 
AWS_DEFAULT_REGION="eu-west-1" 
wget -O dynamo-writer.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/dynamo-writer.json 
sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" dynamo-writer.json
sed -i "s/REGION/$AWS_DEFAULT_REGION/g" dynamo-writer.json

aws iam create-policy --policy-name S3EventAggregatorDynamo --policy-document file://dynamo-writer.json

We now have a policy that allows the caller (our soon to be created Lambda function in this case) to update items in the S3EventAggregator DynamoDB table. Next we need to create a policy to allow the function to write messages to SQS. The sqs-writer.json policy file contents are similar to the DynamoDB policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sqs:SendMessage",
            "Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:S3EventAggregatorActionQueue"
        }
    ]
}

Retrieving the file and substituting the ACCOUNT_ID and REGION using the environment variables we created for the DynamoDB policy:
wget -O sqs-writer.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/sqs-writer.json 
sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" sqs-writer.json
sed -i "s/REGION/$AWS_DEFAULT_REGION/g" sqs-writer.json
aws iam create-policy --policy-name S3EventAggregatorSqsWriter --policy-document file://sqs-writer.json

Having defined the two resource access policies let's create an IAM role for the Lambda function. 
wget -O lambda-trust.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/lambda-trust.json
aws iam create-role --role-name S3EventAggregatorLambdaRole --assume-role-policy-document file://lambda-trust.json

The lambda-trust.json policy allows Lambda access to assume roles via STS and looks like this (no substitutions required for this one):
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

We can now attach the SQS and DynamoDB policies to the new created Lambda role. We also need the AWSLambdaBasicExecutionRole which is an AWS managed policy providing access to CloudWatch logs and Lambda function execution:
aws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorDynamo --role-name S3EventAggregatorLambdaRole
aws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorSqsWriter --role-name S3EventAggregatorLambdaRole
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole --role-name S3EventAggregatorLambdaRole

We are now finally ready to create the S3EventAggregator Lambda function. Starting by creating the Python deployment package:
wget -O s3_aggregator.py https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/src/s3_aggregator.py
zip function.zip s3_aggregator.py

And then creating the function (using the AWS_DEFAULT_REGION and ACCOUNT_ID environment variables again):
aws lambda create-function --function-name S3EventAggregator --runtime python3.6 --role arn:aws:iam::$ACCOUNT_ID:role/S3EventAggregatorLambdaRole --zip-file fileb://function.zip --handler s3_aggregator.lambda_handler --timeout 10 --environment "Variables={QUEUE_URL=https://sqs.$AWS_DEFAULT_REGION.amazonaws.com/$ACCOUNT_ID/S3EventAggregatorActionQueue,REFRESH_DELAY_SECONDS=30,LOG_LEVEL=INFO}"
aws lambda put-function-concurrency --function-name S3EventAggregator --reserved-concurrent-executions 1

The function concurrency is set to 1 as there is no benefit to having the function processing S3 events concurrently and 'single threading' the function will limit the maximum concurrent DynamoDB request rate to reduce DynamoDB capacity usage and costs.

All that is left now is to give S3 permission to execute the Lambda function and link the bucket notification events to the S3EventAggregator function. Giving S3 permission on the specific bucket:
BUCKET=my-bucket
aws lambda add-permission --function-name S3EventAggregator --statement-id SID_$BUCKET --action lambda:InvokeFunction --principal s3.amazonaws.com --source-account $ACCOUNT_ID --source-arn arn:aws:s3:::$BUCKET

Interestingly, the --source-arn can be omitted to avoid needing to add permissions for each bucket you want the function to operate on but it is required (and must match a specific bucket) for the Lambda Console to display the function and trigger correctly. The S3 event.json configuration creates an event on any object creation or removal events:
{
    "LambdaFunctionConfigurations": [
        {
            "Id": "s3-event-aggregator",
            "LambdaFunctionArn": "arn:aws:lambda:REGION:ACCOUNT_ID:function:S3EventAggregator",
            "Events": [
                "s3:ObjectCreated:*",
                "s3:ObjectRemoved:*"
            ]
        }
    ]
}

Once again substituting the relevant region and account IDs:
wget -O event.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/s3/event.json
sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" event.json
sed -i "s/REGION/$AWS_DEFAULT_REGION/g" event.json

And linking the event configuration to a bucket:
aws s3api put-bucket-notification-configuration --bucket $BUCKET --notification-configuration file://event.json

Thus concludes the event aggregation part of the solution. A quick test confirms the event aggregation is working as expected:

time for i in $(seq 1 5); do aws s3 cp test.txt s3://$BUCKET/test$i.txt; done
upload: ./test.txt to s3://s3-test-net/test1.txt                  
upload: ./test.txt to s3://s3-test-net/test2.txt                  
upload: ./test.txt to s3://s3-test-net/test3.txt                  
upload: ./test.txt to s3://s3-test-net/test4.txt                  
upload: ./test.txt to s3://s3-test-net/test5.txt                  

real 0m2.106s
user 0m1.227s
sys 0m0.140s

STREAM=$(aws logs describe-log-streams --log-group-name /aws/lambda/S3EventAggregator --order-by LastEventTime --descending --query 'logStreams[0].logStreamName' --output text); aws logs get-log-events --log-group-name /aws/lambda/S3EventAggregator --log-stream-name $STREAM --query 'events[*].{msg:message}' --output text | grep "^\[" | sed 's/\t/ /g'
[INFO] 2018-08-12T18:14:03.647Z 7da07415-9e5b-11e8-ab6d-8f962149ce24 Sending refresh request for bucket: s3-test-net, timestamp: 1534097642149
[INFO] 2018-08-12T18:14:04.207Z 7e1d6ca2-9e5b-11e8-ac9d-e1f0f9729f66 Refresh for bucket: s3-test-net within refresh window, skipping. S3 Event timestamp: 1534097642938
[INFO] 2018-08-12T18:14:04.426Z 7eefb013-9e5b-11e8-ab6d-8f962149ce24 Refresh for bucket: s3-test-net within refresh window, skipping. S3 Event timestamp: 1534097643812
[INFO] 2018-08-12T18:14:04.635Z 7e5b5feb-9e5b-11e8-8aa9-7908c99c450a Refresh for bucket: s3-test-net within refresh window, skipping. S3 Event timestamp: 1534097643371
[INFO] 2018-08-12T18:14:05.915Z 7ddb5a72-9e5b-11e8-80de-0dd6a15c3f62 Refresh for bucket: s3-test-net within refresh window, skipping. S3 Event timestamp: 1534097642517

From the 'within refresh window' log messages we can see 4 of the 5 events were skipped as they fell within the refresh aggregation window. Checking the SQS queue we can see the refresh request event:

aws sqs receive-message --queue-url https://$AWS_DEFAULT_REGION.queue.amazonaws.com/$ACCOUNT_ID/S3EventAggregatorActionQueue --attribute-names All --message-attribute-names All
{
    "Messages": [
        {
            "MessageId": "c0027dd2-30bc-48bc-b622-b5c85d862c92",
            "ReceiptHandle": "AQEB9DQXkIWsWn...5XU2a13Q8=",
            "MD5OfBody": "99914b932bd37a50b983c5e7c90ae93b",
            "Body": "{}",
            "Attributes": {
                "SenderId": "AROAI55PXBF63XVSEBNYM:S3EventAggregator",
                "ApproximateFirstReceiveTimestamp": "1534097653846",
                "ApproximateReceiveCount": "1",
                "SentTimestamp": "1534097642728"
            },
            "MD5OfMessageAttributes": "6f6eaf397811cbece985f3e8d87546c3",
            "MessageAttributes": {
                "bucket-name": {
                    "StringValue": "s3-test-net",
                    "DataType": "String"
                },
                "timestamp": {
                    "StringValue": "1534097642149",
                    "DataType": "Number"
                }
            }
        }
    ]
}

Moving onto the final part of the solution, we need a Lambda function that processes the events that the S3EventAggregator function sends to SQS. For the function's permissions we can reuse the S3EventAggregatorDynamo policy for DynamoDB access but will need to create a new policy for reading and deleting SQS messages and refreshing the Storage Gateway cache.

The sgw-refresh.json is as follows, note that SMB file shares are included but the current Lambda execution environment only supports boto3 1.7.30 which does not actually expose the SMB APIs (more on working around this later):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "storagegateway:RefreshCache",
                "storagegateway:ListFileShares",
                "storagegateway:DescribeNFSFileShares",
                "storagegateway:DescribeSMBFileShares"
            ],
            "Resource": "*"
        }
    ]
}

Creating the policy:
wget -O sgw-refresh.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/sgw-refresh.json
aws iam create-policy --policy-name StorageGatewayRefreshPolicy --policy-document file://sgw-refresh.json

The sqs-reader.json gives the necessary SQS read permissions  on the S3EventAggregatorActionQueue:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:DeleteMessage",
                "sqs:GetQueueAttributes",
                "sqs:ReceiveMessage"
            ],
            "Resource": "arn:aws:sqs:REGION:ACCOUNT_ID:S3EventAggregatorActionQueue"
        }
    ]
}

Substituting  and creating the policy:
wget -O sqs-reader.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/sqs-reader.json
sed -i "s/ACCOUNT_ID/$ACCOUNT_ID/g" sqs-reader.json 
sed -i "s/REGION/$AWS_DEFAULT_REGION/g" sqs-reader.json
aws iam create-policy --policy-name S3EventAggregatorSqsReader --policy-document file://sqs-reader.json

And then creating the role and adding the relevant policies:
wget -O lambda-trust.json https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/iam/lambda-trust.json
aws iam create-role --role-name S3AggregatorActionLambdaRole --assume-role-policy-document file://lambda-trust.json
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole --role-name S3AggregatorActionLambdaRole
aws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorSqsReader --role-name S3AggregatorActionLambdaRole
aws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/S3EventAggregatorDynamo --role-name S3AggregatorActionLambdaRole
aws iam attach-role-policy --policy-arn arn:aws:iam::$ACCOUNT_ID:policy/StorageGatewayRefreshPolicy --role-name S3AggregatorActionLambdaRole

Next we will create the Lambda function, this will however depend on whether or not you require SMB file share support. As mentioned earlier the current Lambda execution environment does not expose the new SMB file share APIs so if you have SMB shares mapped on your Storage Gateway you will have to include the latest botocore and boto3 libraries with your deployment. The disadvantage of this is that you are not able to view the code in the Lambda console (due to the deployment file size limitation). If you are only using NFS shares then you only need the code without the latest libraries but it will break if you add an SMB share before the Lambda execution environment supports it. Including the dependency in the deployment is the preferred option so that is what we are going to do:

mkdir deploy
pip install boto3 botocore -t deploy
wget -O deploy/s3_sgw_refresh.py https://raw.githubusercontent.com/watchamcb/s3-event-aggregator/master/src/s3_sgw_refresh.py
cd deploy
zip -r function.zip *

aws lambda create-function --function-name S3StorageGatewayRefresh --runtime python3.6 --role arn:aws:iam::$ACCOUNT_ID:role/S3AggregatorActionLambdaRole --zip-file fileb://function.zip --handler s3_sgw_refresh.lambda_handler --timeout 5 --environment "Variables={LOG_LEVEL=INFO}"
aws lambda put-function-concurrency --function-name S3StorageGatewayRefresh --reserved-concurrent-executions 1

And finally create the event mapping to execute the S3StorageGatewayRefresh function when messages are received on the queue:

aws lambda create-event-source-mapping --function-name S3StorageGatewayRefresh --event-source arn:aws:sqs:$AWS_DEFAULT_REGION:$ACCOUNT_ID:S3EventAggregatorActionQueue --batch-size 1

And that is the final solution. Verifying it works as expected, let's mount the NFS share and upload some files via the CLI and confirm the share is refreshed. Mounting the share:

$ sudo mount -t nfs -o nolock 172.31.2.13:/s3-test-net share
$ cd share/sgw
$ ls -l
total 0

Uploading the files through the CLI (from a different machine):

$ for i in $(seq 1 20); do aws s3 cp test.txt s3://$BUCKET/sgw/test$i.txt; done
upload: ./test.txt to s3://s3-test-net/test1.txt                
upload: ./test.txt to s3://s3-test-net/test2.txt                
...
upload: ./test.txt to s3://s3-test-net/test19.txt              
upload: ./test.txt to s3://s3-test-net/test20.txt    

And confirming the refresh of the share:

$ ls -l
total 10
-rw-rw-rw- 1 nobody nogroup 19 Aug 25 07:46 test10.txt
-rw-rw-rw- 1 nobody nogroup 19 Aug 25 07:46 test11.txt
...
-rw-rw-rw- 1 nobody nogroup 19 Aug 25 07:46 test8.txt
-rw-rw-rw- 1 nobody nogroup 19 Aug 25 07:46 test9.txt

Creation and deletion scripts

For convenience a script to create (and remove) this stack is provided on GitHub. Clone the s3-event-aggregator repository and run the create-stack.sh and delete-stack.sh scripts respectively. You need to have the AWS CLI installed and configured and sed and zip must be available. Be sure to edit the BUCKET variable in the script to match your bucket name and change the REGION if appropriate.

Note that the delete stack script will not remove the S3 event notification configuration by default. There is no safe and convenient way to remove only the S3EventAggregator configuration (other than removing all configuration which may result in unintended loss of other event configuration). If you have other events configured on a bucket it is best to use the AWS Console to remove the s3-event-aggregator event configuration. If there are no other events configured on your bucket you can safely uncomment the relevant line in the deletion script.

Configuration

The two Lambda functions both have a LOG_LEVEL environment variable to control the details logged to CloudWatch Logs, the functions were created with the level set to INFO but DEBUG may be useful for troubleshooting and WARN is probably appropriate for use in production.

The S3EventAggregator function also has an environment variable called REFRESH_DELAY_SECONDS for controlling the event aggregation window. It was initialised to 30 seconds when the function was created but it may be appropriate to change it depending on your S3 upload pattern. If the uploads are mostly small and complete quickly, or if you need the Storage Gateway to reflect changes quickly then this may be a reasonable value. If you are performing larger uploads or the total upload process takes significantly longer then the refresh window would need to be increased to be longer than the total expected upload time.

The DynamoDB table was created with 5 write capacity units and as the entries are less than 1KB this should be sufficient as long as you are not writing more than 5 objects a second to the Storage Gateway S3 bucket. Writing more than this will required additional write capacity to be provisioned (or auto scaling enabled).

The same code can be used for multiple buckets by simply adding additional bucket event configurations via the CLI put-bucket-notification-configuration as above or using the AWS Console.

Cost

There are three component costs involved in this solution, the two Lambda functions, DynamoDB, and SQS. The Lambda and DynamoDB costs will scale fairly linearly with usage with both the S3EventAggregator and DynamoDB being charged for each S3 event that is triggered. To get an idea of the number of events to expect you can enable S3 metrics on the bucket and check the PUT and DELETE counts. The S3StorageGatewayRefresh function and SQS messages will be a fraction of the total S3 event counts and dependent on the REFRESH_DELAY_SECONDS configuration. A longer refresh delay will result in fewer SQS messages and S3StorageGatewayRefresh function executions.

As an example lets use an example of 1000 objects uploaded a day with these being aggregated into 50 refresh events. For simplicity we will also assume that the free tier has been exhausted and that there are 30 days in the month. The total Lambda request count will then be:
(30 days x 1000 S3 events) + (30 days x 50 refresh events)  = 31 500 request
As Lambda requests are charged in 1 million increments this will result in a charge of $0.2 for the requests

The compute charges are based on duration, with the S3EventAggregator executing in less than 100ms for all aggregate events and around 300 - 600ms for the refresh events. The S3StorageGatewayRefesh function takes between 400ms and 800ms. Giving us:
(30 days x 950 S3 requests x 0.1s) + (30 days x 50 S3 requests x 0.5s) + (30 days x 50 refresh events x 0.7s) = 4650 seconds
Lambda compute is charged in GB-s, so:
4650 seconds x 128MB/1024 = 581.25 GB-s at $0.00001667 = $0.0096894
Bringing the total Lambda charges for the month to $0.21

For DynamoDB we have provisioned 5 WCU at $0.000735/WCU/hour and 1 RCU at $0.000147/RCU/hour working out as:
(5 WCU x 0.000735 per hour x 24 hours a day x 30 days) + (1 RCU x 0.000147 per hour x 24 hours x 30 days) = $2.75 a month

SQS charges per million requests with the 1500 send message requests and a further 1500 receive and delete requests all falling under this limit (and thus only costing $0.40 for the month). It is worth noting that Lambda does poll the SQS queue roughly 4 times a minute and this will contribute to your total SQS request costs, using around 172,800 SQS requests a month.

There are some other costs associated with CloudWatch Logs and DynamoDB storage but these should be fairly small compared to the request costs and I would not expect the total cost of the stack to be more than $10 - $15 a month.

Conclusion

And so ends this post, well done for reading to the end. I quite enjoyed building this solution and will look at converting it to a CloudFormation template at a later stage. Feel free to log issues or pull requests against the GitHub repo.

Friday, 18 May 2018

Working with 20+ node ElastiCache Memcached clusters

ElastiCache Memcached limits the number of nodes in a cluster to 20 by default. This limit can be increased with ElastiCache recommending against clusters larger than 50 nodes.

The increased cluster node limits don't reflect in the console so even after the limit has been increased the maximum number of nodes you can select when creating a new cluster is 20. Fortunately the CLI allows you to work around this and you can tweak the examples in the ElastiCache cluster creation documentation to achieve your desired outcome. Creating a cluster with more than 20 nodes using the CLI:

$ aws elasticache create-cache-cluster --cache-cluster-id my-super-cluster --cache-node-type cache.t2.micro --engine memcached --engine-version 1.4.34 --cache-parameter-group default.memcached1.4 --num-cache-nodes 30
{
    "CacheCluster": {
        "CacheClusterId": "my-super-cluster",
        "ClientDownloadLandingPage": "https://console.aws.amazon.com/elasticache/home#client-download:",
        "CacheNodeType": "cache.t2.micro",
        "Engine": "memcached",
        "EngineVersion": "1.4.34",
        "CacheClusterStatus": "creating",
        "NumCacheNodes": 30,
        "PreferredMaintenanceWindow": "mon:02:30-mon:03:30",
        "PendingModifiedValues": {},
        "CacheSecurityGroups": [],
        "CacheParameterGroup": {
            "CacheParameterGroupName": "default.memcached1.4",
            "ParameterApplyStatus": "in-sync",
            "CacheNodeIdsToReboot": []
        },
        "CacheSubnetGroupName": "default",
        "AutoMinorVersionUpgrade": true,
        "TransitEncryptionEnabled": false,
        "AtRestEncryptionEnabled": false
    }
}

Adding nodes can be done via the console as it allows you to specify the number of nodes rather than selecting from a drop down. For completeness this can also be done via the CLI:

$ aws elasticache modify-cache-cluster --cache-cluster-id test-limits --num-cache-nodes 26 --apply-immediately
{
    "CacheCluster": {
        "CacheClusterId": "test-limits",
        "ConfigurationEndpoint": {
            "Address": "test-limits.rftr8g.cfg.euw1.cache.amazonaws.com",
            "Port": 11211
        },
        "ClientDownloadLandingPage": "https://console.aws.amazon.com/elasticache/home#client-download:",
        "CacheNodeType": "cache.t2.micro",
        "Engine": "memcached",
        "EngineVersion": "1.4.34",
        "CacheClusterStatus": "modifying",
        "NumCacheNodes": 6,
        "PreferredAvailabilityZone": "Multiple",
        "CacheClusterCreateTime": "2018-05-18T07:10:22.530Z",
        "PreferredMaintenanceWindow": "sat:23:30-sun:00:30",
        "PendingModifiedValues": {
            "NumCacheNodes": 26
        },
        "CacheSecurityGroups": [],
        "CacheParameterGroup": {
            "CacheParameterGroupName": "default.memcached1.4",
            "ParameterApplyStatus": "in-sync",
            "CacheNodeIdsToReboot": []
        },
        "CacheSubnetGroupName": "default",
        "AutoMinorVersionUpgrade": true,
        "SecurityGroups": [
            {
                "SecurityGroupId": "sg-5814fd37",
                "Status": "active"
            }
        ],
        "TransitEncryptionEnabled": false,
        "AtRestEncryptionEnabled": false
    }
}


And finally removing nodes via the CLI (note the spaces rather than commas between the node IDs):

$ aws elasticache modify-cache-cluster --cache-cluster-id test-limits --num-cache-nodes 6 --cache-node-ids-to-remove 0007 0008 0009 0010 0011 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0026 --apply-immediately  
{
    "CacheCluster": {
        "CacheClusterId": "test-limits",
        "ConfigurationEndpoint": {
            "Address": "test-limits.rftr8g.cfg.euw1.cache.amazonaws.com",
            "Port": 11211
        },
        "ClientDownloadLandingPage": "https://console.aws.amazon.com/elasticache/home#client-download:",
        "CacheNodeType": "cache.t2.micro",
        "Engine": "memcached",
        "EngineVersion": "1.4.34",
        "CacheClusterStatus": "modifying",
        "NumCacheNodes": 26,
        "PreferredAvailabilityZone": "Multiple",
        "CacheClusterCreateTime": "2018-05-18T07:10:22.530Z",
        "PreferredMaintenanceWindow": "sat:23:30-sun:00:30",
        "PendingModifiedValues": {
            "NumCacheNodes": 6,
            "CacheNodeIdsToRemove": [
                "0007",
                "0008",
                "0009",
                "0010",
                "0011",
                "0012",
                "0013",
                "0014",
                "0015",
                "0016",
                "0017",
                "0018",
                "0019",
                "0020",
                "0021",
                "0022",
                "0023",
                "0024",
                "0025",
                "0026"
            ]
        },
        "CacheSecurityGroups": [],
        "CacheParameterGroup": {
            "CacheParameterGroupName": "default.memcached1.4",
            "ParameterApplyStatus": "in-sync",
            "CacheNodeIdsToReboot": []
        },
        "CacheSubnetGroupName": "default",
        "AutoMinorVersionUpgrade": true,
        "SecurityGroups": [
            {
                "SecurityGroupId": "sg-5814fd37",
                "Status": "active"
            }
        ],
        "TransitEncryptionEnabled": false,
        "AtRestEncryptionEnabled": false
    }
}

Thursday, 16 November 2017

Extracting S3 bucket sizes using the AWS CLI

A quick one liner for printing out the size (in bytes) of S3 StandardStorage buckets in your account (using bash):

for name in $(aws s3api list-buckets --query 'Buckets[*].Name' --output text); do size=$(aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time $(date --date="yesterday" +%Y-%m-%d) --end-time $(date +%Y-%m-%d) --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=$name Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum' | sed 's/null/0.0/' | cut -d. -f1); echo "$name,$size"; done


Some of the individual components may be independently useful, starting off with listing buckets:

aws s3api list-buckets --query 'Buckets[*].Name' --output text

Pretty straight forward, uses the s3api in the CLI to list buckets returning only their names in text format.


To get the bucket sizes we are actually querying CloudWatch (using get-metric-statistics) which provides bucket size and object count metrics for S3:

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time $(date --date="yesterday" +%Y-%m-%d) --end-time $(date +%Y-%m-%d) --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=$name Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum' | sed 's/null/0.0/' | cut -d. -f1

Most of this is fairly straight forward some of the parameters worth explaining further:
--start-time: we are setting the start date to yesterday and formatting it in yyyy-mm-dd format, this ensures we get at least one data point containing the bucket size
--period: 86400 = 1 day as the bucket size metric is only published once a day (at 00:00:00)
--query: we are only interested in the the actual value for one of the metric datapoints

The sed and cut commands are just to clean up formatting, if the bucket is empty the CloudWatch metric request will return null so we replace null with 0.0. The bucket size metric value will always end in .0 so we truncate it using cut (yes there are other ways of doing this).

If you are only wanting to check the size of a single bucket or the bucket size on a specific date you can use a simpler version:

aws cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes --start-time 2017-11-15 --end-time 2017-11-16 --period 86400 --statistics Maximum --dimensions Name=BucketName,Value=MY_BUCKET_NAME Name=StorageType,Value=StandardStorage --query 'Datapoints[0].Maximum'

Sunday, 20 August 2017

Understanding EC2 "Up to 10 Gigabit" network performance for R4 instances

This post investigates the network performance of AWS R4 instances with a focus on the "Up to 10 Gigabit" networking expected from smaller (r4.large - r4.4xlarge) instance types. Before starting it should be noted that this post is based on observation and as such is prone to imprecision and variance, it is intended as a guide for what can be expected and not a comprehensive or scientific review.


The R4 instance documentation states "The smaller R4 instance sizes offer peak throughput of 10 Gbps. These instances use a network I/O credit mechanism to allocate network bandwidth to instances based on average bandwidth utilization. These instances accrue credits when their network throughput is below their baseline limits, and can use these credits when they perform network data transfers." This is not particularly helpful in understanding the lower bounds on network performance and gives no indication of the baseline limits with AWS recommending customers benchmark the networking performance of various instances to evaluate whether the instance type and size will meet the application network performance requirements.
Logically we would expect the r4.large to have a fraction of the total 20 Gbps available on an r4.16xlarge. From the instance size normalisation table under the reserved instance modification documentation a *.large instance (factor of 4) should expect 1/32 of the resources available on a *.16xlarge instance (factor of 128) which works out at 0.625 Gbps (20 Gbps / 32) or 625 Mbps.


Testing r4.large baseline network performance

Using iperf3 between two newly launched Amazon Linux r4.large instances in the same availability zone in eu-west-1, we run into the first interesting anomaly with the network stream maxing out at 5 Gbps rather than the expected 10 Gbps:


$ iperf3 -p 5201 -c 172.31.7.67 -i 1 -t 3600 -f m -V 
iperf 3-CURRENT
Linux ip-172-31-10-235 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64
Control connection MSS 8949
Time: Sun, 20 Aug 2017 07:35:48 GMT
Connecting to host 172.31.7.67, port 5201
      Cookie: p2v6ry2kzjo2udittrzgmxotz7we3in5etmv
      TCP MSS: 8949 (default)
[  5] local 172.31.10.235 port 41270 connected to 172.31.7.67 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 3600 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   598 MBytes  5015 Mbits/sec    9    664 KBytes       
[  5]   1.00-2.00   sec   596 MBytes  4999 Mbits/sec    3    559 KBytes       
[  5]   2.00-3.00   sec   595 MBytes  4992 Mbits/sec    9    586 KBytes       
[  5]   3.00-4.00   sec   595 MBytes  4989 Mbits/sec    0    638 KBytes       
[  5]   4.00-5.00   sec   596 MBytes  5000 Mbits/sec    0    638 KBytes       
[  5]   5.00-6.00   sec   595 MBytes  4989 Mbits/sec    0    638 KBytes       
[  5]   6.00-7.00   sec   595 MBytes  4990 Mbits/sec    6    638 KBytes       
[  5]   7.00-8.00   sec   595 MBytes  4990 Mbits/sec    3    524 KBytes       
[  5]   8.00-9.00   sec   596 MBytes  4997 Mbits/sec    0    586 KBytes       
[  5]   9.00-10.00  sec   596 MBytes  4997 Mbits/sec    0    603 KBytes       
[  5]  10.00-11.00  sec   595 MBytes  4990 Mbits/sec    0    638 KBytes       

Interestingly, using 2 parallel streams results in us (mostly) reaching the advertised 10 Gbps:

$ iperf3 -p 5201 -c 172.31.7.67 -i 1 -t 3600 -f m -V -P 2
iperf 3-CURRENT
Linux ip-172-31-10-235 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64
Control connection MSS 8949
Time: Sun, 20 Aug 2017 07:37:38 GMT
Connecting to host 172.31.7.67, port 5201
      Cookie: q343avscwpva5uyg2ayeinboxi5pllvw5l7r
      TCP MSS: 8949 (default)
[  5] local 172.31.10.235 port 41274 connected to 172.31.7.67 port 5201
[  7] local 172.31.10.235 port 41276 connected to 172.31.7.67 port 5201
Starting Test: protocol: TCP, 2 streams, 131072 byte blocks, omitting 0 seconds, 3600 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   597 MBytes  5010 Mbits/sec    0    690 KBytes       
[  7]   0.00-1.00   sec   592 MBytes  4968 Mbits/sec    0    717 KBytes       
[SUM]   0.00-1.00   sec  1.16 GBytes  9979 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   595 MBytes  4994 Mbits/sec    0    690 KBytes       
[  7]   1.00-2.00   sec   592 MBytes  4962 Mbits/sec   18    638 KBytes       
[SUM]   1.00-2.00   sec  1.16 GBytes  9956 Mbits/sec   18             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   591 MBytes  4957 Mbits/sec  137    463 KBytes       
[  7]   2.00-3.00   sec   587 MBytes  4924 Mbits/sec   41    725 KBytes       
[SUM]   2.00-3.00   sec  1.15 GBytes  9881 Mbits/sec  178             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   593 MBytes  4973 Mbits/sec   46    367 KBytes       
[  7]   3.00-4.00   sec   591 MBytes  4956 Mbits/sec   40    419 KBytes       
[SUM]   3.00-4.00   sec  1.16 GBytes  9929 Mbits/sec   86             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   592 MBytes  4968 Mbits/sec  141    542 KBytes       
[  7]   4.00-5.00   sec   591 MBytes  4960 Mbits/sec   36    559 KBytes       
[SUM]   4.00-5.00   sec  1.16 GBytes  9928 Mbits/sec  177             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   595 MBytes  4995 Mbits/sec   30    664 KBytes       
[  7]   5.00-6.00   sec   588 MBytes  4934 Mbits/sec    8    568 KBytes       
[SUM]   5.00-6.00   sec  1.16 GBytes  9929 Mbits/sec   38             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   596 MBytes  5000 Mbits/sec    0    664 KBytes       
[  7]   6.00-7.00   sec   589 MBytes  4945 Mbits/sec    0    629 KBytes       
[SUM]   6.00-7.00   sec  1.16 GBytes  9945 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   588 MBytes  4935 Mbits/sec    7    655 KBytes       
[  7]   7.00-8.00   sec   594 MBytes  4982 Mbits/sec    0    682 KBytes       
[SUM]   7.00-8.00   sec  1.15 GBytes  9917 Mbits/sec    7             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   593 MBytes  4974 Mbits/sec    8    620 KBytes       
[  7]   8.00-9.00   sec   593 MBytes  4978 Mbits/sec   12    717 KBytes       
[SUM]   8.00-9.00   sec  1.16 GBytes  9952 Mbits/sec   20             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   596 MBytes  4999 Mbits/sec    0    638 KBytes       
[  7]   9.00-10.00  sec   590 MBytes  4951 Mbits/sec    0    717 KBytes       
[SUM]   9.00-10.00  sec  1.16 GBytes  9950 Mbits/sec    0             


This behaviour is not consistent, stopping and restarting the instances often resulted in the full 10 Gbps on a single stream suggesting the issue relates to instance placement, something that appears to be supported by the placement group documentation which states: "Network traffic to and from resources outside the placement group is limited to 5 Gbps." It is also possible that the streams are incorrectly being treated as placement group or public internet flows with different limits. For consistency I have used two parallel streams to avoid this issue in the rest of the article.

The CloudWatch graphs shows us reaching a steady baseline after around eight minutes of starting iperf3:




A quick word on the graph above, firstly it is in bytes and, having enabled detailed monitoring, one minute granularity. For conversion purpose this means we need to divide the value of the metric by 60 to get bytes per second and then multiple by 8 to get bits per seconds. Looking at the actual data from the graph above:

$ aws cloudwatch get-metric-statistics --metric-name NetworkOut --start-time 2017-08-20T08:23:00 --end-time 2017-08-20T08:35:00 --period 60 --namespace AWS/EC2 --statistics Average --dimensions Name=InstanceId,Value=i-0a7e009e7c0bf8fa8  --query 'Datapoints[*].[Timestamp,Average]' --output=text | sort
2017-08-20T08:23:00Z 486.0
2017-08-20T08:24:00Z 5726.0
2017-08-20T08:25:00Z 22711496136.0
2017-08-20T08:26:00Z 76376122845.0
2017-08-20T08:27:00Z 76403033046.0
2017-08-20T08:28:00Z 76357957564.0
2017-08-20T08:29:00Z 76304994405.0
2017-08-20T08:30:00Z 48667898310.0
2017-08-20T08:31:00Z 5776989873.0
2017-08-20T08:32:00Z 5816890095.0
2017-08-20T08:33:00Z 5692555065.0
2017-08-20T08:34:00Z 5692014471.0


The maximum average throughput (between 08:26 and 08:29) is around 76 GByte/minute which works out at around 1.2 GByte/second or approximately 10.1 Gbit/second. Similarly the baseline (from 08:31 onwards) is in the region of 5.7 GByte/minute which translates to around 94 MByte/second or around 750 Mbit/second. These numbers are naturally averages but are fairly close to the actual iperf3 results, with the peak throughput of just over 10Gbit/second:


[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   604 MBytes  5065 Mbits/sec    0    551 KBytes       
[  8]   0.00-1.00   sec   604 MBytes  5062 Mbits/sec    0    524 KBytes       
[SUM]   0.00-1.00   sec  1.18 GBytes  10127 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   1.00-2.00   sec   601 MBytes  5046 Mbits/sec    0    551 KBytes       
[  8]   1.00-2.00   sec   602 MBytes  5048 Mbits/sec    0    551 KBytes       
[SUM]   1.00-2.00   sec  1.18 GBytes  10094 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   2.00-3.00   sec   602 MBytes  5046 Mbits/sec    0    577 KBytes       
[  8]   2.00-3.00   sec   602 MBytes  5047 Mbits/sec    0    577 KBytes       
[SUM]   2.00-3.00   sec  1.17 GBytes  10093 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   3.00-4.00   sec   601 MBytes  5045 Mbits/sec    0    577 KBytes       
[  8]   3.00-4.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[SUM]   3.00-4.00   sec  1.18 GBytes  10095 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   4.00-5.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[  8]   4.00-5.00   sec   601 MBytes  5045 Mbits/sec    0    577 KBytes       
[SUM]   4.00-5.00   sec  1.18 GBytes  10094 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   5.00-6.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[  8]   5.00-6.00   sec   601 MBytes  5042 Mbits/sec    0    577 KBytes       
[SUM]   5.00-6.00   sec  1.17 GBytes  10092 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   6.00-7.00   sec   602 MBytes  5046 Mbits/sec    0    577 KBytes       
[  8]   6.00-7.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[SUM]   6.00-7.00   sec  1.18 GBytes  10095 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   7.00-8.00   sec   603 MBytes  5063 Mbits/sec    0   1.44 MBytes       
[  8]   7.00-8.00   sec   601 MBytes  5041 Mbits/sec   66    524 KBytes       
[SUM]   7.00-8.00   sec  1.18 GBytes  10104 Mbits/sec  66             
- - - - - - - - - - - - - - - - - - - - - - - - -

And the baseline of around 750Mbit/second:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6] 376.00-377.00 sec  43.8 MBytes   367 Mbits/sec  157    114 KBytes       
[  8] 376.00-377.00 sec  43.8 MBytes   367 Mbits/sec  157   78.7 KBytes       
[SUM] 376.00-377.00 sec  87.5 MBytes   734 Mbits/sec  314             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 377.00-378.00 sec  45.0 MBytes   377 Mbits/sec  161   69.9 KBytes       
[  8] 377.00-378.00 sec  45.0 MBytes   377 Mbits/sec  167   78.7 KBytes       
[SUM] 377.00-378.00 sec  90.0 MBytes   755 Mbits/sec  328             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 378.00-379.00 sec  43.8 MBytes   367 Mbits/sec  182   69.9 KBytes       
[  8] 378.00-379.00 sec  45.0 MBytes   377 Mbits/sec  168    105 KBytes       
[SUM] 378.00-379.00 sec  88.8 MBytes   744 Mbits/sec  350             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 379.00-380.00 sec  42.5 MBytes   357 Mbits/sec  150   61.2 KBytes       
[  8] 379.00-380.00 sec  46.2 MBytes   388 Mbits/sec  165   96.1 KBytes       
[SUM] 379.00-380.00 sec  88.8 MBytes   744 Mbits/sec  315             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 380.00-381.00 sec  36.2 MBytes   304 Mbits/sec  129   78.7 KBytes       
[  8] 380.00-381.00 sec  52.5 MBytes   440 Mbits/sec  203    105 KBytes       
[SUM] 380.00-381.00 sec  88.8 MBytes   744 Mbits/sec  332             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 381.00-382.00 sec  36.2 MBytes   304 Mbits/sec  147   96.1 KBytes       
[  8] 381.00-382.00 sec  52.5 MBytes   440 Mbits/sec  220   87.4 KBytes       
[SUM] 381.00-382.00 sec  88.8 MBytes   744 Mbits/sec  367             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 382.00-383.00 sec  46.2 MBytes   388 Mbits/sec  175   52.4 KBytes       
[  8] 382.00-383.00 sec  42.5 MBytes   357 Mbits/sec  167    114 KBytes       
[SUM] 382.00-383.00 sec  88.8 MBytes   744 Mbits/sec  342             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 383.00-384.00 sec  41.2 MBytes   346 Mbits/sec  165   61.2 KBytes       
[  8] 383.00-384.00 sec  47.5 MBytes   398 Mbits/sec  170   96.1 KBytes       
[SUM] 383.00-384.00 sec  88.8 MBytes   744 Mbits/sec  335             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 384.00-385.00 sec  50.0 MBytes   419 Mbits/sec  195   87.4 KBytes       
[  8] 384.00-385.00 sec  38.8 MBytes   325 Mbits/sec  157   52.4 KBytes       
[SUM] 384.00-385.00 sec  88.8 MBytes   744 Mbits/sec  352             
- - - - - - - - - - - - - - - - - - - - - - - - -

Calculating network credit rates

Using the baseline for network performance we can draw some inferences about the rate at which network credits are accrued. For simplicity I am going to define a network credit as having a value of 1 Gbps, so an instance with 10 network credits could transmit at 10 Gbps for 1 second, naturally if the instance network limit is 10 Gbps the maximum rate can't be exceeded even if the instance credit balance is sufficient (20 credits allows 2 seconds at 10 Gbps rather than 1 second at 20 Gbps) . Given the base network performance in the previous section, we can assume that an r4.large has a network credit rate of around 0.75 credits per second. We can also assume a starting balance of around 2700 as we were able to maintain 10 Gbps for around 295 seconds ((10 - 0.75) * 295) at the start of the iperf3 run. Finally it appears the maximum credit balance on the r4.large is the same as the initial balance. Leaving the instances idle for 3 hours should have resulted in a credit balance of around 8100 (0.75 rate * 3600 seconds in an hour * 3 hours) which should have theoretically allowed 810 seconds at 10 Gbps but instead provided only around 295 seconds.

R4 network performance table

Below is a table of the expected performance for R4 instance sizes.

Instance Baseline Gbps (approximate) Initial/Max Credit (approximate) Maximum time
at 10Gbps
(approximate seconds)
r4.large 0.75 2700 295
r4.xlarge 1.25 5145 589
r4.2xlarge 2.5 8925 1191
r4.4xlarge 5 11950 2390

To calculate whether or not an instance will match your network throughput requirements take the difference from the baseline rate and your application base network utilisation and divide by the required burst rate.

For example, if your application requires a baseline of 0.6 Gbps, you would accrue credits at around 0.15 per second allowing you to burst for 10 Gbps for approximately one second every 66 seconds (10 / 0.15) or for 10 seconds every 660 seconds.