gsutil rsync is broken in 4.39

Created on 17 Jun 2019  ยท  24Comments  ยท  Source: GoogleCloudPlatform/gsutil

Command: gsutil rsync -r -j -n /source/path/ gs://$GS_STATIC_BUCKET/DIR/
Here is the error encountered on running it:

Building synchronization state...
Starting synchronization...
Traceback (most recent call last):
File "/usr/local/bin/gsutil", line 10, in
sys.exit(main())
File "/usr/local/lib/python2.7/site-packages/gslib/__main__.py", line 431, in main
user_project=user_project)
File "/usr/local/lib/python2.7/site-packages/gslib/__main__.py", line 760, in _RunNamedCommandAndHandleExceptions
_HandleUnknownFailure(e)
File "/usr/local/lib/python2.7/site-packages/gslib/__main__.py", line 626, in _RunNamedCommandAndHandleExceptions
user_project=user_project)
File "/usr/local/lib/python2.7/site-packages/gslib/command_runner.py", line 411, in RunNamedCommand
return_code = command_inst.RunCommand()
File "/usr/local/lib/python2.7/site-packages/gslib/commands/rsync.py", line 1658, in RunCommand
seek_ahead_iterator=seek_ahead_iterator)
File "/usr/local/lib/python2.7/site-packages/gslib/command.py", line 1515, in Apply
arg_checker, should_return_results, fail_on_error)
File "/usr/local/lib/python2.7/site-packages/gslib/command.py", line 1561, in _SequentialApply
args = next(args_iterator)
File "/usr/local/lib/python2.7/site-packages/gslib/commands/rsync.py", line 1188, in __iter__
src_md5) = (self._ParseTmpFileLine(next(self.sorted_src_urls_it)))
File "/usr/local/lib/python2.7/site-packages/gslib/commands/rsync.py", line 1049, in _ParseTmpFileLine
md5) = line.split()
ValueError: too many values to unpack

bug

Most helpful comment

Just released 4.40 an hour or so ago. The gcloud-installed version will be a bit behind, as we have to get our release bundled into their next release cycle, but the pypi and tarball installations of 4.40 are available now.

All 24 comments

Thanks for the bug report @anuj-kumar ! This seems to happen particularly with files that include spaces; a fix was submitted in https://github.com/GoogleCloudPlatform/gsutil/pull/805 and we will include it in our next release.

As a workaround until our next release, please feel free to clone gsutil from the most recent head of our source repository. :slightly_smiling_face: We hope to have the next release ready in the near future!

As a workaround until our next release, please feel free to clone gsutil from the most recent head of our source repository. We hope to have the next release ready in the near future!

Thanks @catleeball for solving the issue, any update on a ETA for the next release?

I am testing with the fix applied now, and hit a few more edge cases of interest.

  • One file name had plus signs: "this+should+get+tested.txt"
  • Another (perhaps sufficiently covered by the first test case) had a space after the plus: "Child Age 6+ sign-up form.pdf"

Still working through a backed up storage bucket restoring it w/ rsync. Will update with other test cases if I find any!

@langboost We actually found the root cause of this in https://github.com/GoogleCloudPlatform/gsutil/issues/833, and published the correct fix (plus tests) in https://github.com/GoogleCloudPlatform/gsutil/issues/835 :)

I would be very happy if you could consider pushing this fix (or revert) soon, as we rely on this for our nightly backups and more, which have been failing since the 27th because of this issue.

Since I need to get gsutil back working, I tried to clone this repo and run it, but I just get this:

$ ./gsutil
Traceback (most recent call last):
  File "./gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/home/fredrik/google-cloud-sdk/bin/gsutil.py", line 123, in RunMain
    import gslib.__main__
  File "/home/fredrik/google-cloud-sdk/bin/gslib/__main__.py", line 75, in <module>
    from gslib.command_runner import CommandRunner
  File "/home/fredrik/google-cloud-sdk/bin/gslib/command_runner.py", line 64, in <module>
    from gslib.tests.util import HAS_NON_DEFAULT_GS_HOST
  File "/home/fredrik/google-cloud-sdk/bin/gslib/tests/util.py", line 38, in <module>
    import mock_storage_service  # From boto/tests/integration/s3
ImportError: No module named mock_storage_service

Do you have any advise on how to proceed?

We're aiming to do a release early this week. Apologies for the delay - we've been working to finish up Python 3 support as well.

As for your question, I'd guess that you need to pull in the submodules. Run git submodule update --init --recursive after cloning the repo and cding into it.

Looking forward to this fix. Thanks.

As for your question, I'd guess that you need to pull in the submodules. Run git submodule update --init --recursive after cloning the repo and cding into it.

Ah, that was it. Thanks.

Just released 4.40 an hour or so ago. The gcloud-installed version will be a bit behind, as we have to get our release bundled into their next release cycle, but the pypi and tarball installations of 4.40 are available now.

@houglum What do you recommend in terms of Cloud Build? Our builds are all failing because we use the default gsutil image provided by Cloud Build and it is using the latest and greatest. Ideally, I wouldn't want to create my own new image with the older gsutil tool as a dependency. Is there a way to specify a version of gsutil builder in cloudbuild so I could use 4.38 until the gcloud tool catches up?

Something like this could very likely happen again and this sorts of puts us in a hostage situation if we can't use specific versions (unless we can (I am not aware)). Appreciate your feedback!

Thanks!

I'm not terribly familiar with Cloud Builder, but it looks like the dockerfile for it that uses the "gcloud-slim" image?
https://github.com/GoogleCloudPlatform/cloud-builders/blob/master/gsutil/Dockerfile

If it's possible to update the gcloud version while running that image, you could revert gcloud back to the 251.0.0 version, which was the most recent one up until they added v4.39 of gsutil (in 252.0.0), by running something like gcloud components update --version 251.0.0. If that's not possible, maybe roll your own temporary version of the gcloud slim dockerfile that downloads a specific version of the Cloud SDK? Looks like the gcloud slim dockerfile is here:
https://github.com/GoogleCloudPlatform/cloud-builders/blob/master/gcloud/Dockerfile.slim

Sorry if these are poor suggestions; I don't really use Docker or Cloud Build in my day-to-day workflow :upside_down_face: Other than the suggestions above, I can't think of any alternatives that use the (current version of the) gsutil Cloud Builder image.

And apologies again for the breakage - I understand this is a pretty big pain to deal with. We've added some unit and integration tests to make sure object/file URIs are being properly quote_plus-escaped to prevent this from recurring in the future.

I'm not terribly familiar with Cloud Builder, but it looks like the dockerfile for it that uses the "gcloud-slim" image?
https://github.com/GoogleCloudPlatform/cloud-builders/blob/master/gsutil/Dockerfile

If it's possible to update the gcloud version while running that image, you could revert gcloud back to the 251.0.0 version, which was the most recent one up until they added v4.39 of gsutil (in 252.0.0), by running something like gcloud components update --version 251.0.0. If that's not possible, maybe roll your own temporary version of the gcloud slim dockerfile that downloads a specific version of the Cloud SDK? Looks like the gcloud slim dockerfile is here:
https://github.com/GoogleCloudPlatform/cloud-builders/blob/master/gcloud/Dockerfile.slim

Sorry if these are poor suggestions; I don't really use Docker or Cloud Build in my day-to-day workflow ๐Ÿ™ƒ Other than the suggestions above, I can't think of any alternatives that use the (current version of the) gsutil Cloud Builder image.

And apologies again for the breakage - I understand this is a pretty big pain to deal with. We've added some unit and integration tests to make sure object/file URIs are being properly quote_plus-escaped to prevent this from recurring in the future.

Whether you're using gsutil in Docker or not, running gcloud components update --version 251.0.0 is a really simple way to get back to the non-breaking version. That's what I did, and I was able to restore and backup files with spaces and/or '+' characters in their name.

Hmm seems like something broke with the release, since all our cloud builds are failing today with

- name: gcr.io/cloud-builders/gsutil
  args: ["-m", "rsync", "-r", "-c", "-d", "tempproject/staticfiles", "gs://bucket_eu/develop"]
Step #8: / [0/1 files][ 0.0 B/ 16.0 KiB] 0% Done - CommandException: 1 files/objects could not be copied/removed.
Step #8: Copying mtime from src to dst for gs://bucket_eu/develop/admin/css/base.7ea12481fb59.css
Step #8: too many values to unpack
Step #8: Starting synchronization...

Reproduced. 252.0.0 breaks, 251.0.0 works.

Why is this issue closed if by multiple parties it has been proven to break rsync? 251.0.0 works as state above.

I got it to work by following this https://github.com/GoogleCloudPlatform/cloud-builders/issues/508#issuecomment-507646005 . So the solution is basically to stop using this repo for now, and use the gsuitl bundled with the google/cloud-sdk:251.0.0-slim package

@ain the issue is fixed, but hasn't been released with gcloud. So you need to either build up your custom image using 4.38 or below or wait until gcloud releases a new version which could take a week. Or use the solution mentioned in the comment above. Read my comment on that ticket as that is something I think I will do by default going forward.

@houglum appreciate your feedback! Thankfully there was a CloudSDK repo out there already on github which I have been able to use that allows versioned images, so didn't have to create a custom image.

Also broke for me, waiting for a fix! Sucks because this is my first gcloud update and rsync is just about the only command I use. ๐Ÿ˜…My site is only small though, but still, what a downer on Google's part. cp still works fine. It's funny, every time I update something, whether it's Macs, or phones, my experience always seems to degrade. :( When I don't update anything, my experience always works. (I like how Windows and macOS separate security and feature updates, usually, wish phones did this too).

Your current Cloud SDK version is: 172.0.1
You will be upgraded to version: 253.0.0

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ BigQuery Command Line Tool โ”‚ 2.0.44 โ”‚ < 1 MiB โ”‚
โ”‚ BigQuery Command Line Tool (Platform Specific) โ”‚ 2.0.34 โ”‚ < 1 MiB โ”‚
โ”‚ Cloud SDK Core Libraries โ”‚ 2019.06.28 โ”‚ 11.0 MiB โ”‚
โ”‚ Cloud SDK Core Libraries (Platform Specific) โ”‚ 2018.09.24 โ”‚ < 1 MiB โ”‚
โ”‚ Cloud Storage Command Line Tool โ”‚ 4.39 โ”‚ 3.6 MiB โ”‚
โ”‚ Cloud Storage Command Line Tool (Platform Specific) โ”‚ 4.34 โ”‚ < 1 MiB โ”‚
โ”‚ gcloud cli dependencies โ”‚ 2019.05.03 โ”‚ 2.4 MiB โ”‚
โ”‚ gcloud cli dependencies โ”‚ 2018.08.03 โ”‚ 1.5 MiB โ”‚
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

A lot has changed since your last upgrade. For the latest full release notes,
please visit:
https://cloud.google.com/sdk/release_notes

@ain the issue is fixed, but hasn't been released with gcloud. So you need to either build up your custom image using 4.38 or below or wait until gcloud releases a new version which could take a week. Or use the solution mentioned in the comment above. Read my comment on that ticket as that is something I think I will do by default going forward.

@houglum appreciate your feedback! Thankfully there was a CloudSDK repo out there already on github which I have been able to use that allows versioned images, so didn't have to create a custom image.

A week's turn around for an urgent fix (rsync completely broken) from a world leading software vendor? Wow. ๐Ÿคฆโ€โ™‚๏ธ๐Ÿ‘จโ€๐Ÿ’ปโšก๏ธ

Is there a custom image for gcloud CLI? Where and how do I install it?

In case this helps anyone;

I solved it temporarily by cloning this repository along with its submodules to a shared path on the network:

git clone --recursive ....

Then I symlinked the "gsutil" file from this shared path to each place where it needs to be able to execute.

I feel 20+ days to deploy a fix (even if it is a rollback per se) for such a crucial feature as this, is completely unacceptable.

I run gsutil rsync from cloud shell and this stops me from updating my website, which I need to do now. It sounds like if I want to keep using the cloud shell to move files over I'll have to wait until the fix is released?

What am I missing -- isn't this a critical feature? I don't have the background to create my own custom image -- not even sure what that would look like.

If it helps anyone - downgrading sdk works for me. gcloud components update --version 251.0.0 --quiet

downgrading sdk works

Wow, fantastic! I wish someone had told me that before! Especially in this thread! Rather than going off other longer complex tangents.

Thank you so much! ๐ŸŽ‰

Still, I've never had this issue with AWS S3, such a ball drop by Google tbh. Why wasn't this tested before release if you guys have monthly release cycles? Was it a production only bug?

Also I noticed recently:

GCP Cloud Storage doesn't have bucket redirects built in whereas S3 does, but it worked out fine since I needed two modified website versions for each new domain. :)

v4.40 includes the fix for this bug. The standalone install of v4.40 is now available, and we're waiting on a release of gcloud that includes v4.40 of gsutil. In the mean time, several workarounds have been offered.

That being said, I've locked this thread for the time being, as it's becoming a bit unproductive.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zffocussss picture zffocussss  ยท  12Comments

xmedeko picture xmedeko  ยท  18Comments

msigdel picture msigdel  ยท  15Comments

chris-crucible picture chris-crucible  ยท  12Comments

jsierles picture jsierles  ยท  15Comments