Aws-cli: s3 sync file hash collision?

Created on 11 Apr 2017  路  4Comments  路  Source: aws/aws-cli

We generated some certs, synced them so s3 and then needed to resync them. All of the files were successfully updated, except one. They were obviously different files, and you could verify it quickly, by using openssl openssl md5 FILE. But the sync couldn't tell they were different. Syncing to and from s3, this one file wasn't replaced. I had to manually copy it to s3 and then others had to manually cp it down.

$ aws --version
aws-cli/1.11.36 Python/2.7.10 Darwin/16.4.0 botocore/1.4.93
$ type east
east is aliased to `aws --profile=prod-east'
$ east s3 mb s3://a5y-test
make_bucket: a5y-test
$ openssl md5 /tmp/admin-key-1.pem /tmp/admin-key-2.pem
MD5(/tmp/admin-key-1.pem)= 02e80f255154a809657c65b68eac2832
MD5(/tmp/admin-key-2.pem)= 4871081872064aec627eab0fd9534d7a
$ mkdir -p /tmp/foo{1,2}
$ cp /tmp/admin-key-1.pem /tmp/foo1/admin-key.pem
$ cp /tmp/admin-key-2.pem /tmp/foo2/admin-key.pem
$ east s3 sync /tmp/foo1/ s3://a5y-test
upload: foo/admin-key.pem to s3://a5y-test/admin-key.pem
$ east s3 sync foo2/ s3://a5y-test/ --debug
[output attached]
$ east s3 sync s3://a5y-test /tmp/foo3
download: s3://a5y-test/admin-key.pem to ../../tmp/foo3/admin-key.pem
$ openssl md5 foo*/*
MD5(foo1/admin-key.pem)= 02e80f255154a809657c65b68eac2832
MD5(foo2/admin-key.pem)= 4871081872064aec627eab0fd9534d7a
MD5(foo3/admin-key.pem)= 02e80f255154a809657c65b68eac2832

foo2.txt

closing-soon guidance

Most helpful comment

Doesn't the ETag returned by aws s3api head reliably get an MD5 hash?

$ md5 favicon.png 
MD5 (favicon.png) = ae295cbe3efcaf76537818cc11a76b16

$ aws s3api head-object --bucket www.dmoles.net --key favicon.png
{
    "AcceptRanges": "bytes",
    "LastModified": "Thu, 27 Sep 2018 00:29:57 GMT",
    "ContentLength": 2430,
    "ETag": "\"ae295cbe3efcaf76537818cc11a76b16\"",
    "ContentType": "image/png",
    "Metadata": {}
}

Is there a reason why this couldn't be used to sync based on an MD5 comparison?

All 4 comments

sync doesn't do hash comparisons since there's no reliable way for us to get a hash from an object in S3 and reproduce it locally without downloading the whole thing. Comparisons are based on last modified time and file size.

Ooooh. Whoops, assumptions and asses. Thanks for the info.

Well, maybe you still can add this option even with files downloading? I for example have to switch to s3cmd because of an absence of such feature.

Doesn't the ETag returned by aws s3api head reliably get an MD5 hash?

$ md5 favicon.png 
MD5 (favicon.png) = ae295cbe3efcaf76537818cc11a76b16

$ aws s3api head-object --bucket www.dmoles.net --key favicon.png
{
    "AcceptRanges": "bytes",
    "LastModified": "Thu, 27 Sep 2018 00:29:57 GMT",
    "ContentLength": 2430,
    "ETag": "\"ae295cbe3efcaf76537818cc11a76b16\"",
    "ContentType": "image/png",
    "Metadata": {}
}

Is there a reason why this couldn't be used to sync based on an MD5 comparison?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maanbsat picture maanbsat  路  3Comments

ypant picture ypant  路  3Comments

brettswift picture brettswift  路  3Comments

rahul003 picture rahul003  路  3Comments

KimberleySDU picture KimberleySDU  路  3Comments