Rclone: Sync unnecessarily re-uploads some files to S3, when a marker filename contains a space

Created on 10 Dec 2019  Â·  3Comments  Â·  Source: rclone/rclone

What is the problem you are having with rclone?

Some files are being unnecessarily re-uploaded to S3 every time I run sync.

My understanding of the cause so far:

  • This happens when the NextMarker – the key of the last object returned in a single S3 ListObjects request – contains a space.
  • e.g. in the example below the object key a 1000.txt is encoded by AWS to a+1000.txt in the S3 ListObjects response XML.
  • When setting it to the next request's marker parameter, rclone encodes it again, so it becomes marker=a%2B1000.txt.
  • AWS decodes this to a+1000.txt rather than the expected a 1000.txt, so any subsequent objects that start with a and a space are omitted.
  • This makes rsync think they aren't on the remote, so it re-uploads them.
  • I've linked a log below (with request and response bodies) demonstrating this, and a minimal reproduction case
  • By default this can only affect _folders_ containing >1000 files, but it becomes less of an edge case when --fast-list is used, as it could affect any sync of >1000 files in total

What is your rclone version (output from rclone version)

v1.50.2-086-ga186284b-beta (also tested with v1.50.2)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

macOS 10.14.6, 64-bit

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

This is my minimal reproduction case:

mkdir files
for i in {0001..1100}; do touch "files/a $i.txt"; done
rclone sync files/ "s3:<bucket name>" --config rclone.conf --use-server-modtime --update --log-level DEBUG --dump headers,bodies
rclone sync files/ "s3:<bucket name>" --config rclone.conf --use-server-modtime --update --log-level DEBUG --dump headers,bodies

rclone.conf:

[s3]
type = s3
provider = AWS
region = us-west-2
env_auth = true

On the second run (and all subsequent runs), the last 100 files are re-uploaded unnecessarily.

In my case, this was causing tens of GBs of photos to be re-uploaded every time I ran a sync.

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Log from the second run: https://paste.ee/p/FQu37

S3 bug

All 3 comments

Excellent debugging :-) And thank you for the repro and log - both of which were very useful.

I managed to replicate this after setting the provider correctly in my config (yes that is code for a 30 minute trip down a rabbit hole ;-)

This bug got introduced when we introduced URL encoding into the listings to fix listings with non XML representable characters (eg control characters).

Try this - it should fix it hopefully!

https://beta.rclone.org/branch/v1.50.2-085-g2dddcc52-fix-3799-s3-nextmarker-beta/ (uploaded in 15-30 mins)

@ncw Thanks for looking into this so quickly! I can confirm that your fix resolves the issue for me.

Thanks for testing.

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.51

Was this page helpful?
0 / 5 - 0 ratings

Related issues

acuteaura picture acuteaura  Â·  3Comments

jcarter087 picture jcarter087  Â·  3Comments

Chewie9999 picture Chewie9999  Â·  3Comments

treyd picture treyd  Â·  3Comments

suityou01 picture suityou01  Â·  3Comments