gsutil rsync throws encoding error while syncing .gz files from S3 with Python3

Created on 10 Jan 2020  Â·  12Comments  Â·  Source: GoogleCloudPlatform/gsutil

Problem Description

I am running following command to sync AWS S3 bucket with GCS bucket

gsutil rsync -r s3://source_bucket gs://target_bucket

I have also tried with -J option because there are .gz files.

Here is the error -

Exception in thread Thread-4: B]
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 636, in _PerformSimpleDownload
    hash_algs=hash_algs)
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.7/site-packages/gslib/daisy_chain_wrapper.py", line 213, in PerformDownload
    decryption_tuple=self.decryption_tuple)
  File "/usr/local/lib/python3.7/site-packages/gslib/cloud_api_delegator.py", line 353, in GetObjectMedia
    decryption_tuple=decryption_tuple)
  File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 582, in GetObjectMedia
    hash_algs=hash_algs)
  File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 641, in _PerformSimpleDownload
    headers=headers)
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1670, in get_contents_to_file
    response_headers=response_headers)
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1502, in get_file
    query_args=None)
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1556, in _get_file_internal
    print_to_fd(six.ensure_binary(key_bytes), file=fp, end=b'')
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/utils.py", line 1206, in print_to_fd
    write_to_fd(file, data)
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/utils.py", line 1222, in write_to_fd
    fd.write(six.ensure_text(data))
  File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/vendored/six.py", line 901, in ensure_text
    return s.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

System Specs -

gsutil version: 4.47
checksum: PACKAGED_GSUTIL_INSTALLS_DO_NOT_HAVE_CHECKSUMS (!= da2648451f3edb644090ab6c8d57d5f5)
boto version: 2.49.0
python version: 3.7.3 (default, Mar 27 2019, 09:23:15) [Clang 10.0.1 (clang-1001.0.46.3)]
OS: Darwin 18.6.0
multiprocessing available: True
using cloud sdk: False
pass cloud sdk credentials to gsutil: False
config path(s): /Users/username/.boto, /Users/username/.aws/credentials
gsutil path: /usr/local/bin/gsutil
compiled crcmod: True
installed via package manager: True
editable install: False

Most helpful comment

Hi @dhananjaymehta

Your issue seems to similar to my report #935. I have submit a PR for fixing #936, but it hasn't got reviewed by google team. if it is convenient for you, may you try it. :-)

Thanks.

All 12 comments

The sync utility work with Python2.7

Hi @dhananjaymehta

Your issue seems to similar to my report #935. I have submit a PR for fixing #936, but it hasn't got reviewed by google team. if it is convenient for you, may you try it. :-)

Thanks.

I'm still seeing TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs' on gsutil 4.59

I'm still seeing TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs' on gsutil 4.59

It seems a different code issue. You may open another issue and a list of detailed stack traces should be helpful to track it down.

Thanks @maxshine . I only get see this from our logs rather than a repro. I'm confused since the code explicitly skips TypeError on line 640

Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload
    hash_algs=hash_algs)
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'

I leave it for now, unless I have any bright ideas, or you have any insight.

Thanks @maxshine . I only get see this from our logs rather than a repro. I'm confused since the code explicitly skips TypeError on line 640

Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload
    hash_algs=hash_algs)
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'

I leave it for now, unless I have any bright ideas, or you have any insight.

Hi, per my read the code snippet from your link points to, it should be a guard to handle two situations. You could see the next line: L641 here there is a comment that S3 object will throws TypeError and then this line of code will retry get_contents_to_file w/o hash_algs arguments.

So, as far as I can see, the TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs' message your saw is from the first try at L640. then the exception thrown by it will leads to another try at L641. so the message might be confusing but it should not be an error. :-)

Yes that was my understanding too, as far as the code path goes. But why would that log a traceback?

My guess is, the line of TypeError is from boto3 package, which is out of the control of gsutil application.

But it does give the file & line number as File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload — doesn't that mean it's raising there? Or is this something more complicated like a nested traceback?

(I'm also at peace with leaving it, I don't mean to nerd snipe both of us here!)

But it does give the file & line number as File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload — doesn't that mean it's raising there? Or is this something more complicated like a nested traceback?

(I'm also at peace with leaving it, I don't mean to nerd snipe both of us here!)

That's all right. I'd love to dig out the root cause of techs as much as possible :-)
so here what I find via Google. the boto3 source code of get_contents_to_file

Its params definitions has default value and no variadics. Then if it is invoked via a unknown param hash_alg, the python interpreter will complain. That's why we see the TypeError with gsutil code line.

I'm still confused why the error isn't caught by the try-except here though? e.g. this raises no error (nor warnings):

import boto
from boto import s3
from boto.s3.key import Key

key = Key()

try:
    key.get_contents_to_file(
        fp=None,
        # cb=progress_callback,
        # num_cb=num_progress_callbacks,
        # headers=headers,
        hash_algs=None,
    )
except TypeError:  # s3 and mocks do not support hash_algs
    key.get_contents_to_file(
        fp=None,
        # cb=progress_callback,
        # num_cb=num_progress_callbacks,
        # headers=headers
    )

I'm still confused why the error isn't caught by the try-except here though? e.g. this raises no error (nor warnings):

import boto
from boto import s3
from boto.s3.key import Key

key = Key()

try:
    key.get_contents_to_file(
        fp=None,
        # cb=progress_callback,
        # num_cb=num_progress_callbacks,
        # headers=headers,
        hash_algs=None,
    )
except TypeError:  # s3 and mocks do not support hash_algs
    key.get_contents_to_file(
        fp=None,
        # cb=progress_callback,
        # num_cb=num_progress_callbacks,
        # headers=headers
    )

Yes, you're right. the TypeError is caught by try..except the block. I don't reproduce the thrown situation locally. My guess is that is different code path causing it...

Was this page helpful?
0 / 5 - 0 ratings