Sunday 20 August 2017

Understanding EC2 "Up to 10 Gigabit" network performance for R4 instances

This post investigates the network performance of AWS R4 instances with a focus on the "Up to 10 Gigabit" networking expected from smaller (r4.large - r4.4xlarge) instance types. Before starting it should be noted that this post is based on observation and as such is prone to imprecision and variance, it is intended as a guide for what can be expected and not a comprehensive or scientific review.


The R4 instance documentation states "The smaller R4 instance sizes offer peak throughput of 10 Gbps. These instances use a network I/O credit mechanism to allocate network bandwidth to instances based on average bandwidth utilization. These instances accrue credits when their network throughput is below their baseline limits, and can use these credits when they perform network data transfers." This is not particularly helpful in understanding the lower bounds on network performance and gives no indication of the baseline limits with AWS recommending customers benchmark the networking performance of various instances to evaluate whether the instance type and size will meet the application network performance requirements.
Logically we would expect the r4.large to have a fraction of the total 20 Gbps available on an r4.16xlarge. From the instance size normalisation table under the reserved instance modification documentation a *.large instance (factor of 4) should expect 1/32 of the resources available on a *.16xlarge instance (factor of 128) which works out at 0.625 Gbps (20 Gbps / 32) or 625 Mbps.


Testing r4.large baseline network performance

Using iperf3 between two newly launched Amazon Linux r4.large instances in the same availability zone in eu-west-1, we run into the first interesting anomaly with the network stream maxing out at 5 Gbps rather than the expected 10 Gbps:


$ iperf3 -p 5201 -c 172.31.7.67 -i 1 -t 3600 -f m -V 
iperf 3-CURRENT
Linux ip-172-31-10-235 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64
Control connection MSS 8949
Time: Sun, 20 Aug 2017 07:35:48 GMT
Connecting to host 172.31.7.67, port 5201
      Cookie: p2v6ry2kzjo2udittrzgmxotz7we3in5etmv
      TCP MSS: 8949 (default)
[  5] local 172.31.10.235 port 41270 connected to 172.31.7.67 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 3600 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   598 MBytes  5015 Mbits/sec    9    664 KBytes       
[  5]   1.00-2.00   sec   596 MBytes  4999 Mbits/sec    3    559 KBytes       
[  5]   2.00-3.00   sec   595 MBytes  4992 Mbits/sec    9    586 KBytes       
[  5]   3.00-4.00   sec   595 MBytes  4989 Mbits/sec    0    638 KBytes       
[  5]   4.00-5.00   sec   596 MBytes  5000 Mbits/sec    0    638 KBytes       
[  5]   5.00-6.00   sec   595 MBytes  4989 Mbits/sec    0    638 KBytes       
[  5]   6.00-7.00   sec   595 MBytes  4990 Mbits/sec    6    638 KBytes       
[  5]   7.00-8.00   sec   595 MBytes  4990 Mbits/sec    3    524 KBytes       
[  5]   8.00-9.00   sec   596 MBytes  4997 Mbits/sec    0    586 KBytes       
[  5]   9.00-10.00  sec   596 MBytes  4997 Mbits/sec    0    603 KBytes       
[  5]  10.00-11.00  sec   595 MBytes  4990 Mbits/sec    0    638 KBytes       

Interestingly, using 2 parallel streams results in us (mostly) reaching the advertised 10 Gbps:

$ iperf3 -p 5201 -c 172.31.7.67 -i 1 -t 3600 -f m -V -P 2
iperf 3-CURRENT
Linux ip-172-31-10-235 4.9.32-15.41.amzn1.x86_64 #1 SMP Thu Jun 22 06:20:54 UTC 2017 x86_64
Control connection MSS 8949
Time: Sun, 20 Aug 2017 07:37:38 GMT
Connecting to host 172.31.7.67, port 5201
      Cookie: q343avscwpva5uyg2ayeinboxi5pllvw5l7r
      TCP MSS: 8949 (default)
[  5] local 172.31.10.235 port 41274 connected to 172.31.7.67 port 5201
[  7] local 172.31.10.235 port 41276 connected to 172.31.7.67 port 5201
Starting Test: protocol: TCP, 2 streams, 131072 byte blocks, omitting 0 seconds, 3600 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   597 MBytes  5010 Mbits/sec    0    690 KBytes       
[  7]   0.00-1.00   sec   592 MBytes  4968 Mbits/sec    0    717 KBytes       
[SUM]   0.00-1.00   sec  1.16 GBytes  9979 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec   595 MBytes  4994 Mbits/sec    0    690 KBytes       
[  7]   1.00-2.00   sec   592 MBytes  4962 Mbits/sec   18    638 KBytes       
[SUM]   1.00-2.00   sec  1.16 GBytes  9956 Mbits/sec   18             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec   591 MBytes  4957 Mbits/sec  137    463 KBytes       
[  7]   2.00-3.00   sec   587 MBytes  4924 Mbits/sec   41    725 KBytes       
[SUM]   2.00-3.00   sec  1.15 GBytes  9881 Mbits/sec  178             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec   593 MBytes  4973 Mbits/sec   46    367 KBytes       
[  7]   3.00-4.00   sec   591 MBytes  4956 Mbits/sec   40    419 KBytes       
[SUM]   3.00-4.00   sec  1.16 GBytes  9929 Mbits/sec   86             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec   592 MBytes  4968 Mbits/sec  141    542 KBytes       
[  7]   4.00-5.00   sec   591 MBytes  4960 Mbits/sec   36    559 KBytes       
[SUM]   4.00-5.00   sec  1.16 GBytes  9928 Mbits/sec  177             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec   595 MBytes  4995 Mbits/sec   30    664 KBytes       
[  7]   5.00-6.00   sec   588 MBytes  4934 Mbits/sec    8    568 KBytes       
[SUM]   5.00-6.00   sec  1.16 GBytes  9929 Mbits/sec   38             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec   596 MBytes  5000 Mbits/sec    0    664 KBytes       
[  7]   6.00-7.00   sec   589 MBytes  4945 Mbits/sec    0    629 KBytes       
[SUM]   6.00-7.00   sec  1.16 GBytes  9945 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec   588 MBytes  4935 Mbits/sec    7    655 KBytes       
[  7]   7.00-8.00   sec   594 MBytes  4982 Mbits/sec    0    682 KBytes       
[SUM]   7.00-8.00   sec  1.15 GBytes  9917 Mbits/sec    7             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   593 MBytes  4974 Mbits/sec    8    620 KBytes       
[  7]   8.00-9.00   sec   593 MBytes  4978 Mbits/sec   12    717 KBytes       
[SUM]   8.00-9.00   sec  1.16 GBytes  9952 Mbits/sec   20             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec   596 MBytes  4999 Mbits/sec    0    638 KBytes       
[  7]   9.00-10.00  sec   590 MBytes  4951 Mbits/sec    0    717 KBytes       
[SUM]   9.00-10.00  sec  1.16 GBytes  9950 Mbits/sec    0             


This behaviour is not consistent, stopping and restarting the instances often resulted in the full 10 Gbps on a single stream suggesting the issue relates to instance placement, something that appears to be supported by the placement group documentation which states: "Network traffic to and from resources outside the placement group is limited to 5 Gbps." It is also possible that the streams are incorrectly being treated as placement group or public internet flows with different limits. For consistency I have used two parallel streams to avoid this issue in the rest of the article.

The CloudWatch graphs shows us reaching a steady baseline after around eight minutes of starting iperf3:




A quick word on the graph above, firstly it is in bytes and, having enabled detailed monitoring, one minute granularity. For conversion purpose this means we need to divide the value of the metric by 60 to get bytes per second and then multiple by 8 to get bits per seconds. Looking at the actual data from the graph above:

$ aws cloudwatch get-metric-statistics --metric-name NetworkOut --start-time 2017-08-20T08:23:00 --end-time 2017-08-20T08:35:00 --period 60 --namespace AWS/EC2 --statistics Average --dimensions Name=InstanceId,Value=i-0a7e009e7c0bf8fa8  --query 'Datapoints[*].[Timestamp,Average]' --output=text | sort
2017-08-20T08:23:00Z 486.0
2017-08-20T08:24:00Z 5726.0
2017-08-20T08:25:00Z 22711496136.0
2017-08-20T08:26:00Z 76376122845.0
2017-08-20T08:27:00Z 76403033046.0
2017-08-20T08:28:00Z 76357957564.0
2017-08-20T08:29:00Z 76304994405.0
2017-08-20T08:30:00Z 48667898310.0
2017-08-20T08:31:00Z 5776989873.0
2017-08-20T08:32:00Z 5816890095.0
2017-08-20T08:33:00Z 5692555065.0
2017-08-20T08:34:00Z 5692014471.0


The maximum average throughput (between 08:26 and 08:29) is around 76 GByte/minute which works out at around 1.2 GByte/second or approximately 10.1 Gbit/second. Similarly the baseline (from 08:31 onwards) is in the region of 5.7 GByte/minute which translates to around 94 MByte/second or around 750 Mbit/second. These numbers are naturally averages but are fairly close to the actual iperf3 results, with the peak throughput of just over 10Gbit/second:


[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   604 MBytes  5065 Mbits/sec    0    551 KBytes       
[  8]   0.00-1.00   sec   604 MBytes  5062 Mbits/sec    0    524 KBytes       
[SUM]   0.00-1.00   sec  1.18 GBytes  10127 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   1.00-2.00   sec   601 MBytes  5046 Mbits/sec    0    551 KBytes       
[  8]   1.00-2.00   sec   602 MBytes  5048 Mbits/sec    0    551 KBytes       
[SUM]   1.00-2.00   sec  1.18 GBytes  10094 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   2.00-3.00   sec   602 MBytes  5046 Mbits/sec    0    577 KBytes       
[  8]   2.00-3.00   sec   602 MBytes  5047 Mbits/sec    0    577 KBytes       
[SUM]   2.00-3.00   sec  1.17 GBytes  10093 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   3.00-4.00   sec   601 MBytes  5045 Mbits/sec    0    577 KBytes       
[  8]   3.00-4.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[SUM]   3.00-4.00   sec  1.18 GBytes  10095 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   4.00-5.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[  8]   4.00-5.00   sec   601 MBytes  5045 Mbits/sec    0    577 KBytes       
[SUM]   4.00-5.00   sec  1.18 GBytes  10094 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   5.00-6.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[  8]   5.00-6.00   sec   601 MBytes  5042 Mbits/sec    0    577 KBytes       
[SUM]   5.00-6.00   sec  1.17 GBytes  10092 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   6.00-7.00   sec   602 MBytes  5046 Mbits/sec    0    577 KBytes       
[  8]   6.00-7.00   sec   602 MBytes  5049 Mbits/sec    0    577 KBytes       
[SUM]   6.00-7.00   sec  1.18 GBytes  10095 Mbits/sec   0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6]   7.00-8.00   sec   603 MBytes  5063 Mbits/sec    0   1.44 MBytes       
[  8]   7.00-8.00   sec   601 MBytes  5041 Mbits/sec   66    524 KBytes       
[SUM]   7.00-8.00   sec  1.18 GBytes  10104 Mbits/sec  66             
- - - - - - - - - - - - - - - - - - - - - - - - -

And the baseline of around 750Mbit/second:

[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6] 376.00-377.00 sec  43.8 MBytes   367 Mbits/sec  157    114 KBytes       
[  8] 376.00-377.00 sec  43.8 MBytes   367 Mbits/sec  157   78.7 KBytes       
[SUM] 376.00-377.00 sec  87.5 MBytes   734 Mbits/sec  314             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 377.00-378.00 sec  45.0 MBytes   377 Mbits/sec  161   69.9 KBytes       
[  8] 377.00-378.00 sec  45.0 MBytes   377 Mbits/sec  167   78.7 KBytes       
[SUM] 377.00-378.00 sec  90.0 MBytes   755 Mbits/sec  328             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 378.00-379.00 sec  43.8 MBytes   367 Mbits/sec  182   69.9 KBytes       
[  8] 378.00-379.00 sec  45.0 MBytes   377 Mbits/sec  168    105 KBytes       
[SUM] 378.00-379.00 sec  88.8 MBytes   744 Mbits/sec  350             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 379.00-380.00 sec  42.5 MBytes   357 Mbits/sec  150   61.2 KBytes       
[  8] 379.00-380.00 sec  46.2 MBytes   388 Mbits/sec  165   96.1 KBytes       
[SUM] 379.00-380.00 sec  88.8 MBytes   744 Mbits/sec  315             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 380.00-381.00 sec  36.2 MBytes   304 Mbits/sec  129   78.7 KBytes       
[  8] 380.00-381.00 sec  52.5 MBytes   440 Mbits/sec  203    105 KBytes       
[SUM] 380.00-381.00 sec  88.8 MBytes   744 Mbits/sec  332             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 381.00-382.00 sec  36.2 MBytes   304 Mbits/sec  147   96.1 KBytes       
[  8] 381.00-382.00 sec  52.5 MBytes   440 Mbits/sec  220   87.4 KBytes       
[SUM] 381.00-382.00 sec  88.8 MBytes   744 Mbits/sec  367             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 382.00-383.00 sec  46.2 MBytes   388 Mbits/sec  175   52.4 KBytes       
[  8] 382.00-383.00 sec  42.5 MBytes   357 Mbits/sec  167    114 KBytes       
[SUM] 382.00-383.00 sec  88.8 MBytes   744 Mbits/sec  342             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 383.00-384.00 sec  41.2 MBytes   346 Mbits/sec  165   61.2 KBytes       
[  8] 383.00-384.00 sec  47.5 MBytes   398 Mbits/sec  170   96.1 KBytes       
[SUM] 383.00-384.00 sec  88.8 MBytes   744 Mbits/sec  335             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  6] 384.00-385.00 sec  50.0 MBytes   419 Mbits/sec  195   87.4 KBytes       
[  8] 384.00-385.00 sec  38.8 MBytes   325 Mbits/sec  157   52.4 KBytes       
[SUM] 384.00-385.00 sec  88.8 MBytes   744 Mbits/sec  352             
- - - - - - - - - - - - - - - - - - - - - - - - -

Calculating network credit rates

Using the baseline for network performance we can draw some inferences about the rate at which network credits are accrued. For simplicity I am going to define a network credit as having a value of 1 Gbps, so an instance with 10 network credits could transmit at 10 Gbps for 1 second, naturally if the instance network limit is 10 Gbps the maximum rate can't be exceeded even if the instance credit balance is sufficient (20 credits allows 2 seconds at 10 Gbps rather than 1 second at 20 Gbps) . Given the base network performance in the previous section, we can assume that an r4.large has a network credit rate of around 0.75 credits per second. We can also assume a starting balance of around 2700 as we were able to maintain 10 Gbps for around 295 seconds ((10 - 0.75) * 295) at the start of the iperf3 run. Finally it appears the maximum credit balance on the r4.large is the same as the initial balance. Leaving the instances idle for 3 hours should have resulted in a credit balance of around 8100 (0.75 rate * 3600 seconds in an hour * 3 hours) which should have theoretically allowed 810 seconds at 10 Gbps but instead provided only around 295 seconds.

R4 network performance table

Below is a table of the expected performance for R4 instance sizes.

Instance Baseline Gbps (approximate) Initial/Max Credit (approximate) Maximum time
at 10Gbps
(approximate seconds)
r4.large 0.75 2700 295
r4.xlarge 1.25 5145 589
r4.2xlarge 2.5 8925 1191
r4.4xlarge 5 11950 2390

To calculate whether or not an instance will match your network throughput requirements take the difference from the baseline rate and your application base network utilisation and divide by the required burst rate.

For example, if your application requires a baseline of 0.6 Gbps, you would accrue credits at around 0.15 per second allowing you to burst for 10 Gbps for approximately one second every 66 seconds (10 / 0.15) or for 10 seconds every 660 seconds.