Sunday, 14 October 2018

Finding S3 API requests from previous versions of the AWS CLI and SDKs

Introduction

Earlier this year the S3 team announced that S3 will stop accepting API requests signed using AWS Signature Version 2 after June 24th, 2019. Customers will need to update their SDKs, CLIs, and custom implementations to make use of AWS Signature Version 4 to avoid impact after this date. It might be difficult to find older applications or instances using outdated versions of the AWS CLI or SDKs that need to be updated, the purpose of this post is to explain how AWS CloudTrail data events and Amazon Athena can be used to help identify applications that may need to be updated. We will cover the setup of the CloudTrail data events, the Athena table creation, and some Athena queries to filter and refine the results to help with this process.


Setting up CloudTrail data events in the AWS console

The first step is to create a trail to capture S3 data events. This should be done in the region you plan on running your Athena queries in order to avoid unnecessary data transfer charges. In the CloudTrail console for the region, create a new trail specifying the trail name. The ‘Apply trail to all regions’ option should be left as ‘Yes’ unless you plan on running separate analyses for each region. Given that we are creating a data events trail, select ‘None’ under the Management Events section and check the “Select all S3 buckets in your account” checkbox. Finally select the S3 location where the CloudTrail data will be written, we will create new bucket for simplicity:



Setting up CloudTrail data events using the AWS CLI

If you prefer to create the trail using the AWS CLI then you can use the create-subscription command to create the S3 bucket and trail with the correct permissions, updating it to be a global trail and then adding the S3 data event configuration:



A word on cost

Once the trail has been created, CloudTrail will start recording S3 data events and delivering them to the configured S3 bucket. Data events are currently priced at $0.10 per 100,000 events with the storage costs being the standard S3 data storage charges for the (compressed) events, see the CloudTrail pricing page for additional details. It is recommend that you disable the data event trail once you are satisfied that you have gathered sufficient request data, it can be re-enabled if further analysis is required at a later stage.

Creating the Athena table

The CloudTrail team simplified the process for using Athena to analyse CloudTrail logs by adding a feature to allow customers to create an Athena table directly from the CloudTrail console event history page by simply clicking on the ‘Run advanced queries in Amazon Athena’ link and selecting the corresponding S3 CloudTrail bucket:





An explanation of how to create the Athena table manually can be found in the Athena CloudTrail documentation.

Analysing the data events with Athena

We now have all the components needed to begin searching for clients that may need to be updated. Starting with a basic query that filters out most of the AWS requests (for example the AWS Console, CloudTrail, Athena, Storage Gateway, CloudFront):





These results should mostly be client API/CLI requests but the large number of requests can still be refined by only including regions that actually support AWS Signature Version 2. From the region and endpoint documentation for S3 we can see that we only need to check eight of the regions. We can safely exclude the AWS Signature Version 4 (SigV4) regions as clients would not work correctly against these regions if they did not already have SigV4 support. Let’s also look at distinct user agents and extract the version from the user agent string:




We are unfortunately not able to filter on the calculated ‘version’ column and as it is a string it is also difficult to perform direct numerical version comparison. We can use some arithmetic to create a version number that can be compared. Using the AWS CLI requests as an example for the moment and adding back the source IP address and user identity

The version comparison number (10110108) translates to the version string 1.11.108 which is the first version of AWS CLI supporting SigV4 by default. This results in a list of clients accessing S3 objects in this account using a version of the AWS CLI that needs to be updated:


The same query can be applied to all the AWS CLI and SDK user agent strings by substituting the corresponding agent string and version number for SDK versions using SigV4 by default:

AWS Client
SigV4 default version
User Agent String
Version comparator
Java
1.11.x
aws-sdk-java
10110000
.NET
3.1.10.0
aws-sdk-dotnet
30010010
Node.js
2.68.0
aws-sdk-nodejs
20680000
PHP
3
aws-sdk-php
30000000
Python Botocore
1.5.71
Botocore
10050071
Python Boto3
1.4.6
Boto3
10040006
Ruby
2.2.0
aws-sdk-ruby
20020000
AWS CLI
1.11.108
aws-cli
10110108
Powershell
3.1.10.0
AWSPowerShell
30010010

Note:
.NET35,.NET45, and CoreCLR only, PCL, Xamarin, UWP platforms do not support SigV4 at all
All versions of Go and C++ SDKs support SigV4 by default

Tracing the source of the requests

The source IP address will reflect the private IP of the EC2 instance accessing S3 through a VPC endpoint or the public IP if accessing S3 directly. You can search for either of these IPs in EC2 AWS Console for the corresponding region. For non-EC2 or NAT access you should be able to use the ARN to track down the source of the requests.