Amazon-ecs-agent: Determine a proper timeout for LoadImage

Created on 14 Jun 2017  路  3Comments  路  Source: aws/amazon-ecs-agent

LoadImageTimeout needs to be set based on benchmarking experiments to load a docker image. See https://github.com/aws/amazon-ecs-agent/pull/841/files#r121754481 for details.

kinenhancement scopECS Agent

Most helpful comment

Currently the LoadImageTimeout is only used to load the pause container image. Benchmarked the time of loading the pause container image with the following script:

#!/bin/bash
min=1000.0
max=0.0
total=0.0
for i in {0..99}
do
    sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
    start=`date +%s.%N`
    docker image load < amazon-ecs-pause.tar &> /dev/null
    end=`date +%s.%N`

    cnt=$( echo "$i + 1" | bc -l )
    loadtime=$( echo "$end - $start" | bc -l )
    total=$(echo "$total + $loadtime" | bc -l )
    avg=$(echo "$total / $cnt" | bc -l)

    if (( $(echo "$min > $loadtime" | bc -l) )); then
        min=$loadtime
    fi
    if (( $(echo "$max < $loadtime" | bc -l) )); then
        max=$loadtime
    fi
    printf "Image load time %d: %.4fs\n" ${cnt} ${loadtime}
    printf "Total time: %d, Avg: %.4fs, Min: %.4fs, Max: %.4fs\n" ${cnt} ${avg} ${min} ${max}
    docker rmi amazon/amazon-ecs-pause:0.1.0 &> /dev/null
done

Tested on latest ecs optimized ami (agent version 1.32.1) for al1/al2/al2gpu/al2arm with smallest instance types available (a1.medium for arm, t2.nano for other), with ebs volume burst balance = 100 and burst balance = 0. Loaded image for 100 times.
Result:

Burst balance = 100:

| Instance | Avg (s) | Min (s) | Max (s) |
|:---------|:--------|:--------|:--------|
| AL1 (t2.nano) | 1.1994 | 0.6178 | 1.3431 |
| AL2 (t2.nano) | 0.7433 | 0.3992 | 0.7788 |
| AL2/GPU (t2.nano) | 0.8019 | 0.3935 | 1.1164 |
| AL2/ARM (a1.medium) | 0.4261 | 0.4128 | 0.5022 |

Burst balance = 0:

| Instance | Avg (s) | Min (s) | Max (s) |
|:---------|:--------|:--------|:--------|
| AL1 (t2.nano) | 18.5915 | 17.6324 | 22.5326 |
| AL2 (t2.nano) | 18.6168 | 17.0224 | 21.5424 |
| AL2/GPU (t2.nano) | 23.6569 | 20.1130 | 29.4856 |
| AL2/ARM (a1.medium) | 12.5409 | 10.6842 | 14.9941 |

Seems like worst case is around half minute.

All 3 comments

Currently the LoadImageTimeout is only used to load the pause container image. Benchmarked the time of loading the pause container image with the following script:

#!/bin/bash
min=1000.0
max=0.0
total=0.0
for i in {0..99}
do
    sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
    start=`date +%s.%N`
    docker image load < amazon-ecs-pause.tar &> /dev/null
    end=`date +%s.%N`

    cnt=$( echo "$i + 1" | bc -l )
    loadtime=$( echo "$end - $start" | bc -l )
    total=$(echo "$total + $loadtime" | bc -l )
    avg=$(echo "$total / $cnt" | bc -l)

    if (( $(echo "$min > $loadtime" | bc -l) )); then
        min=$loadtime
    fi
    if (( $(echo "$max < $loadtime" | bc -l) )); then
        max=$loadtime
    fi
    printf "Image load time %d: %.4fs\n" ${cnt} ${loadtime}
    printf "Total time: %d, Avg: %.4fs, Min: %.4fs, Max: %.4fs\n" ${cnt} ${avg} ${min} ${max}
    docker rmi amazon/amazon-ecs-pause:0.1.0 &> /dev/null
done

Tested on latest ecs optimized ami (agent version 1.32.1) for al1/al2/al2gpu/al2arm with smallest instance types available (a1.medium for arm, t2.nano for other), with ebs volume burst balance = 100 and burst balance = 0. Loaded image for 100 times.
Result:

Burst balance = 100:

| Instance | Avg (s) | Min (s) | Max (s) |
|:---------|:--------|:--------|:--------|
| AL1 (t2.nano) | 1.1994 | 0.6178 | 1.3431 |
| AL2 (t2.nano) | 0.7433 | 0.3992 | 0.7788 |
| AL2/GPU (t2.nano) | 0.8019 | 0.3935 | 1.1164 |
| AL2/ARM (a1.medium) | 0.4261 | 0.4128 | 0.5022 |

Burst balance = 0:

| Instance | Avg (s) | Min (s) | Max (s) |
|:---------|:--------|:--------|:--------|
| AL1 (t2.nano) | 18.5915 | 17.6324 | 22.5326 |
| AL2 (t2.nano) | 18.6168 | 17.0224 | 21.5424 |
| AL2/GPU (t2.nano) | 23.6569 | 20.1130 | 29.4856 |
| AL2/ARM (a1.medium) | 12.5409 | 10.6842 | 14.9941 |

Seems like worst case is around half minute.

So, there's scope to tighten the current value (10m). I'd say something like 3m should be good (adding some buffer to account for unexpected delays etc). Would be interested to know what will gets picked though.

The LoadImageTimeout has been updated to 2m in https://github.com/aws/amazon-ecs-agent/pull/2269, leaving 1.5m as buffer time. Closing this now

Was this page helpful?
0 / 5 - 0 ratings