Problem Description
I am running following command to sync AWS S3 bucket with GCS bucket
gsutil rsync -r s3://source_bucket gs://target_bucket
I have also tried with -J option because there are .gz files.
Here is the error -
Exception in thread Thread-4: B]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 636, in _PerformSimpleDownload
hash_algs=hash_algs)
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/site-packages/gslib/daisy_chain_wrapper.py", line 213, in PerformDownload
decryption_tuple=self.decryption_tuple)
File "/usr/local/lib/python3.7/site-packages/gslib/cloud_api_delegator.py", line 353, in GetObjectMedia
decryption_tuple=decryption_tuple)
File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 582, in GetObjectMedia
hash_algs=hash_algs)
File "/usr/local/lib/python3.7/site-packages/gslib/boto_translation.py", line 641, in _PerformSimpleDownload
headers=headers)
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1670, in get_contents_to_file
response_headers=response_headers)
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1502, in get_file
query_args=None)
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/s3/key.py", line 1556, in _get_file_internal
print_to_fd(six.ensure_binary(key_bytes), file=fp, end=b'')
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/utils.py", line 1206, in print_to_fd
write_to_fd(file, data)
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/utils.py", line 1222, in write_to_fd
fd.write(six.ensure_text(data))
File "/usr/local/lib/python3.7/site-packages/gslib/vendored/boto/boto/vendored/six.py", line 901, in ensure_text
return s.decode(encoding, errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
System Specs -
gsutil version: 4.47
checksum: PACKAGED_GSUTIL_INSTALLS_DO_NOT_HAVE_CHECKSUMS (!= da2648451f3edb644090ab6c8d57d5f5)
boto version: 2.49.0
python version: 3.7.3 (default, Mar 27 2019, 09:23:15) [Clang 10.0.1 (clang-1001.0.46.3)]
OS: Darwin 18.6.0
multiprocessing available: True
using cloud sdk: False
pass cloud sdk credentials to gsutil: False
config path(s): /Users/username/.boto, /Users/username/.aws/credentials
gsutil path: /usr/local/bin/gsutil
compiled crcmod: True
installed via package manager: True
editable install: False
The sync utility work with Python2.7
Hi @dhananjaymehta
Your issue seems to similar to my report #935. I have submit a PR for fixing #936, but it hasn't got reviewed by google team. if it is convenient for you, may you try it. :-)
Thanks.
I'm still seeing TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
on gsutil 4.59
I'm still seeing
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
on gsutil 4.59
It seems a different code issue. You may open another issue and a list of detailed stack traces should be helpful to track it down.
Thanks @maxshine . I only get see this from our logs rather than a repro. I'm confused since the code explicitly skips TypeError
on line 640
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload
hash_algs=hash_algs)
TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
I leave it for now, unless I have any bright ideas, or you have any insight.
Thanks @maxshine . I only get see this from our logs rather than a repro. I'm confused since the code explicitly skips
TypeError
on line 640Traceback (most recent call last): File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload hash_algs=hash_algs) TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
I leave it for now, unless I have any bright ideas, or you have any insight.
Hi, per my read the code snippet from your link points to, it should be a guard to handle two situations. You could see the next line: L641 here there is a comment that S3 object will throws TypeError and then this line of code will retry get_contents_to_file w/o hash_algs arguments.
So, as far as I can see, the TypeError: get_contents_to_file() got an unexpected keyword argument 'hash_algs'
message your saw is from the first try at L640. then the exception thrown by it will leads to another try at L641. so the message might be confusing but it should not be an error. :-)
Yes that was my understanding too, as far as the code path goes. But why would that log a traceback?
My guess is, the line of TypeError
is from boto3 package, which is out of the control of gsutil application.
But it does give the file & line number as File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload
— doesn't that mean it's raising there? Or is this something more complicated like a nested traceback?
(I'm also at peace with leaving it, I don't mean to nerd snipe both of us here!)
But it does give the file & line number as
File "/usr/lib/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 640, in _PerformSimpleDownload
— doesn't that mean it's raising there? Or is this something more complicated like a nested traceback?(I'm also at peace with leaving it, I don't mean to nerd snipe both of us here!)
That's all right. I'd love to dig out the root cause of techs as much as possible :-)
so here what I find via Google. the boto3 source code of get_contents_to_file
Its params definitions has default value and no variadics. Then if it is invoked via a unknown param hash_alg, the python interpreter will complain. That's why we see the TypeError with gsutil code line.
I'm still confused why the error isn't caught by the try-except here though? e.g. this raises no error (nor warnings):
import boto
from boto import s3
from boto.s3.key import Key
key = Key()
try:
key.get_contents_to_file(
fp=None,
# cb=progress_callback,
# num_cb=num_progress_callbacks,
# headers=headers,
hash_algs=None,
)
except TypeError: # s3 and mocks do not support hash_algs
key.get_contents_to_file(
fp=None,
# cb=progress_callback,
# num_cb=num_progress_callbacks,
# headers=headers
)
I'm still confused why the error isn't caught by the try-except here though? e.g. this raises no error (nor warnings):
import boto from boto import s3 from boto.s3.key import Key key = Key() try: key.get_contents_to_file( fp=None, # cb=progress_callback, # num_cb=num_progress_callbacks, # headers=headers, hash_algs=None, ) except TypeError: # s3 and mocks do not support hash_algs key.get_contents_to_file( fp=None, # cb=progress_callback, # num_cb=num_progress_callbacks, # headers=headers )
Yes, you're right. the TypeError is caught by try..except the block. I don't reproduce the thrown situation locally. My guess is that is different code path causing it...
Most helpful comment
Hi @dhananjaymehta
Your issue seems to similar to my report #935. I have submit a PR for fixing #936, but it hasn't got reviewed by google team. if it is convenient for you, may you try it. :-)
Thanks.