Aws-cli: S3 sync should use --exact-timestamps by default

Created on 28 Jun 2018 · 7Comments · Source: aws/aws-cli

It's very easy for people to miss that --exact-timestamps is necessary for aws s3 sync to produce correct results, especially since the default settings are the opposite of what common tools like rsync use. --exact-timestamps should be enabled by default so sync matches common English usage and the description in the man page which is currently incorrect:

Recursively copies new and updated files from the source directory to the destination.

If someone has an odd case where this is an optimization the current behaviour could be offered as a --no-exact-timestamps or, better, following rsync's lead with --size-only to make the implications more obvious.

documentation

Source

acdha

👍9

Most helpful comment

This is a huge gotcha for anyone familiar with rsync's behavior. S3's sync and copy functions behave differently enough from standard nix tools that I have to write wrappers around them to correct their behavior. I don't see any way of changing this now as the tools are established. When we someday abandon s3 and move on to the next better thing please maintain uniformity with similar and established tools.

JohnLunzer on 16 May 2019

👍6

All 7 comments

A newer S3 timestamp does not necessarily mean a newer version of the file, since S3 does not allow setting the timestamp like a filesystem does. While doing exact timestamps for same-sized files would not be wrong, it would very often be slow and expensive since you would always be downloading.

Right now we have a pattern of spending as little of customer resources as possible. In general this is a great philosophy, but in the case of sync in particular we could have a better experience by being more liberal with requests. In this case we could actively set our own timestamps and/or checksums in the object metadata and then compare those to determine if a sync should happen. This would still be slow and expensive, but possibly less so than just downloading every time depending on the use case.

In any case, we won't be able to change the default without a major version bump. We'll keep this frustration in mind when we get to that time. Thanks for the feedback!

JordonPhillips on 2 Jul 2018

Correcting the documentation would also help. It specifically mentions handling updated files in the description without noting the caveats.

acdha on 2 Jul 2018

What about calculating a hash from both local and S3 files instead (or in addition to) the file size, and in case of different values, copy the newer to replace the older?

froblesmartin on 15 Apr 2019

👍5

@froblesmartin This is complicated by the way S3 generates ETags for multi-part uploads, so you either need to implement the same hash on the local side — which does at least save needing to recalculate the hash for the S3 object — or add a custom header, which would have the advantage of not tying you to S3's implementation and could be compatible with existing hashes if you happen to have them. The systems I've designed typically use that approach so we can say something like X-Original-SHA512: … and be able to compare it with local manifests.

acdha on 15 Apr 2019

JohnLunzer on 16 May 2019

👍6

A s3 object will require downloading if the size of the s3 object differs from the size of the local file, the last modified time of the s3 object is newer than the last modified time of the local file, or the s3 object does not exist in the local directory.
aws s3 sync s3://mybucket .
(source: aws sync docs)

as far as I understand "or" means any of the conditions mentioned has to be true for sync to require downloading. am I reading this wrong?

sonalkr132 on 20 May 2020

@JohnLunzer would you be open to open sourcing your wrapper? I (and I suspect the community) would be very interested in your enhancements!

jwfu on 15 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Problems with aws cli.....

ypant · 3Comments

aws apigateway put-integration-response: response-templates value of `null` not accepted

ehammond · 3Comments

aws s3 ls - can't ls bucket

pawelkilian · 3Comments

fresh install using pip failed

vadimkim · 3Comments

ec2 describe-route-tables with filter "Name=association.main,Values=false" not working as expected

levequej · 3Comments