Hi! We are using gsutil rsync
to upload our backups to coldline storage. After our files become larger (more than 30GB) rsync
starts hanging on each such file. The same situation is for gsutil cp
.
I've already sent a log file with gsutil -D rsync
to [email protected]
gsutil version: 4.25
boto version: 2.42.0
python version: 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]
OS: Linux 4.4.0-77-generic
multiprocessing available: True
using cloud sdk: True
config path(s): /etc/boto.cfg
gsutil path: /usr/lib/google-cloud-sdk/platform/gsutil/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False
Updated gsutil
version 4.27 has the same issue :(
It hangs unexpectedly during gsutil rsync
. Here is the tail of gsutil rsync -D
output before the hang (sensitive info such as bucket name, path to file and 'magic' ids has been obfuscated):
send: 'POST /resumable/upload/storage/v1/b/our-bucket-name/o?fields=generation%2CcustomerEncryption%2Cmd5Hash%2Ccrc32c%2Cetag%2Csize&alt=json&uploadType=resumable HTTP/1.1\r\nHost: www.googleapis.com\r\ncontent-length: 230\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nuser-agent: apitools gsutil/4.27 Python/2.7.12 (linux2) google-cloud-sdk/163.0.0 analytics/enabled\r\nx-upload-content-length: 45527920\r\nx-upload-content-type: application/octet-stream\r\ncontent-type: application/json\r\nauthorization: Bearer REDACTED\r\n\r\n{"bucket": "REDACTED", "contentType": "application/octet-stream", "metadata": {"goog-reserved-file-mtime": "1500481259"}, "name": "path/to/uploaded/file"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: X-GUploader-UploadID: AEnB2UqtChk6t_UW6Ouxw9Wn8IGytGWixmQ16ty17fYJJsPo_own5-TXMW0RO9DLbBftx7D_pbep8jkC6BjlIiDNy-AgxRQ
header: Location: https://www.googleapis.com/resumable/upload/storage/v1/b/our-bucket-name/o?fields=generation%2CcustomerEncryption%2Cmd5Hash%2Ccrc32c%2Cetag%2Csize&alt=json&uploadType=resumable&upload_id=AEnB2UqtChk6t_UW6Ouxw9Wn8IGytGWixfmQ16ty17fYJJsPo_own5-TXMW0RO9DLbBftx7D_pbep8jkC6BjlIiDNy-AgxRQ
header: Vary: Origin
header: Vary: X-Origin
header: Cache-Control: no-cache, no-store, max-age=0, must-revalidate
header: Pragma: no-cache
header: Expires: Mon, 01 Jan 1990 00:00:00 GMT
header: Date: Fri, 11 Aug 2017 18:40:38 GMT
header: Content-Length: 0
header: Server: UploadServer
header: Content-Type: text/html; charset=UTF-8
As you can see, now it hung when uploading quite small 45MB file.
I've redacted both your bucket name and your oauth2 access token from the above request. As some users may have received an email with your original post content, you'll probably want to revoke the token (see the docs on revoking a token) if it hasn't already expired by the time you read this.
For future submissions of debug information, I advise doing a find-all-and-replace for things like your bucket name, strings for auth tokens, generally beginning with "Authorization: Bearer ya.29", and any sensitive file paths.
Also, I'll have to run a continuous rsync, occasionally changing files to induce diffs, and see if I can reproduce the issue. I don't know off the top of my head why we'd hang after creating a resumable upload session (as seen above).
Hi @houglum ! Thank you for the quick reply. I could provide you with full rsync log if it helps.
gs-team{at}google.com - is okay for it?
I've redacted both your bucket name and your oauth2 access token from the above request. As some users may have received an email with your original post content, you'll probably want to revoke the token (see the docs on revoking a token) if it hasn't already expired by the time you read this.
The bucket name, path to file and access token ids have been already redacted before posting the message above :)
That's not what I saw, even when going to the web UI -- I replaced the relevant texts with the string "REDACTED". Regardless, they're gone now, but do be careful in the future :)
@hagen1778 : Sure. If you have logs for this from v4.27, those would be preferable.
@hagen1778 : Sure. If you have logs for this from v4.27, those would be preferable.
Done
@houglum any updates about issue or logs?
Apologies - I've been on-call, amongst dealing with other things, and haven't been able to look into it any further :(
This problem still seems to persist. For small files it works well, but as soon as a file above ~100mb appears to be synced it hangs forever.
I am using the following command:
gsutil -m rsync -d -r gs://[GS-BUCKET-NAME] s3://[S3-BUCKET-NAME]
gsutil version: 4.38
checksum: 58d3e78c61e7e0e80813a6ebc26085f6 (OK)
boto version: 2.49.0
python version: 2.7.13 (default, Sep 26 2018, 18:42:22) [GCC 6.3.0 20170516]
OS: Linux 4.9.0-8-amd64
multiprocessing available: True
using cloud sdk: True
pass cloud sdk credentials to gsutil: True
config path(s): /etc/boto.cfg, /home/tanerca/.boto
gsutil path: /usr/lib/google-cloud-sdk/bin/gsutil
compiled crcmod: True
installed via package manager: False
editable install: False
I am also seeing this same issue with gsutil. Is there any fix or workaround for this?
I am also seeing this same issue with gsutil. Is there any fix or workaround for this?
I just switched to rclone (https://rclone.org/). As a side effect it is also way faster.
Hello,
I'm trying to rsync from gcs to s3 buckets but some files are bigger than 5 GiB.
I receive this error: "exceeds the maximum gsutil-supported size for an S3 upload. S3 objects greater than 5 GiB in size require multipart uploads, which gsutil does not support."
As you can see here https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html there is a limitation in S3 where you have to use multipart upload for files bigger than 5 GiB.
Please help me, how can I fix that?
Any help will be really appreciated!
thanks @houglum for your time
Best Regards
Fabio Rigato
any update/ETA on that?
Most helpful comment
Hello,
I'm trying to rsync from gcs to s3 buckets but some files are bigger than 5 GiB.
I receive this error: "exceeds the maximum gsutil-supported size for an S3 upload. S3 objects greater than 5 GiB in size require multipart uploads, which gsutil does not support."
As you can see here https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html there is a limitation in S3 where you have to use multipart upload for files bigger than 5 GiB.
Please help me, how can I fix that?
Any help will be really appreciated!
thanks @houglum for your time
Best Regards
Fabio Rigato