On my Mac (MacBookPro14,3; macOS HighSierra), gsutil (v 4.34) fails sproradically with the following trace:
Traceback (most recent call last):
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gsutil", line 22, in <module>
gsutil.RunMain()
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gsutil.py", line 117, in RunMain
sys.exit(gslib.__main__.main())
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 250, in main
command_runner = CommandRunner()
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 146, in __init__
self.command_map = self._LoadCommandMap()
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 152, in _LoadCommandMap
__import__('gslib.commands.%s' % module_name)
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 43, in <module>
from gslib.utils import copy_helper
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 168, in <module>
if CheckMultiprocessingAvailableAndInit().is_available else None))
File "/Users/<me>/google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py", line 84, in __init__
self.lock = manager.Lock()
File "/Users/<me>/.pyenv/versions/2.7.15/lib/python2.7/multiprocessing/managers.py", line 670, in temp
authkey=self._authkey, exposed=exp
File "/Users/<me>/.pyenv/versions/2.7.15/lib/python2.7/multiprocessing/managers.py", line 733, in __init__
self._incref()
File "/Users/<me>/.pyenv/versions/2.7.15/lib/python2.7/multiprocessing/managers.py", line 783, in _incref
conn = self._Client(self._token.address, authkey=self._authkey)
File "/Users/<me>/.pyenv/versions/2.7.15/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/Users/<me>/.pyenv/versions/2.7.15/lib/python2.7/multiprocessing/connection.py", line 437, in answer_challenge
response = connection.recv_bytes(256) # reject large message
IOError: bad message length
When I modify gslib/utils/parallelism_framework_util.py
to prohibit multiprocessing a la Windows, this works fine. I see there is a mod for Alpine Linux (https://github.com/GoogleCloudPlatform/gsutil/commit/d7493e711e5b1c1937d0e57a25544bfba2641eb4). Does something similar need to be done for macOS? I modified the max number of open files and that seemed to help some but did not entirely address the problem.
To reproduce, run a gsutil
command from the command line several times (even gsutil -v
will break for me, but ls
and cp
seem to fail more often). At some point you should see the trace. I only see this problem in macOS. Ubuntu 18.04, for example, is fine.
Thanks for the report. This is the second report I've seen of this in the past couple months -- both were on gsutil 4.34 running on macOS (the internal report confirmed seeing this on versions 10.12 and 10.13). The only common thing I see between the two reports is a call to manager.Lock()
in the middle of the stack trace, followed by some items in between, then calls to answer_challenge()
and connection.recv_bytes
:
[...]
File "<path-to-gsutil>/gslib/utils/parallelism_framework_util.py", line 84, in __init__
self.lock = manager.Lock()
[...]
File "<path-to-python-libs>/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "<path-to-python-libs>/python2.7/multiprocessing/connection.py", line 437, in answer_challenge
response = connection.recv_bytes(256) # reject large message
I don't have any solid clues as to what's causing this, but from investigating so far, I can say that:
Thanks for the response! Yes, I'm guessing it is a problem specific to macOS, which is why I wonder if it needs special handling like Windows and Alpine Linux. Also, it seems like the python framework itself could be a likely culprit. Will this go away if/when gsutil is upgraded to python 3.x?
I browsed through some of the issues on bugs.python.org and it seems the multiprocessing module got a lot of (much-needed) love in version 3... so I assume so, but there's only one way to find out :)
That being said, we're working on getting some folks dedicated to continuing the Python 3 support work soon, so my initial reaction is to put this investigation off in hopes that moving to Python 3.4+ makes this a non-issue.
Seems reasonable. My team and I are getting along with a patch that prevents multiprocessing similar to the way it is done with Windows, so we're good to go for now. Looking forward to the python 3 version. Cheers!
I am running into the same issue on a mac running macOS 10.13.6.
I had luck switching over to python3
. With it as default, the command would succeed w/o issues.
I'm not sure what your python3
command actually expands to, but there's no way it's truly running under Python 3 -- gsutil does not yet support Python 3, and would crash pretty much immediately if you tried.
when I have a default python version 2.7.10
, and run gsutil command such as gsutil cp ..
, I get the above error. When I change the symbolic link to python3 (3.7.1
), i.e. ln -s /usr/local/bin/python3 /usr/local/bin/python
and run the exact same command, it runs successfully.
Confirming this is still an issue with macOS 10.14.2, Python crashes when using cp
with the -m option, removing this option allows the command to execute without crashing. Really wish this wasn't an issue, I have millions of images to download, without the multi-threaded option it will take forever.
I've been unable to reproduce this on my MBP :( I also only have one installation of Python set up on that machine (2.7.13), so that might be why.
Would anyone in this thread be willing to download the non-Cloud-SDK version of gsutil and try the same parallelized commands? I'm actually curious about two installation scenarios:
1) Use pip
to install gsutil in an new, isolated virtualenv running Python 2.7.X (this should prevent any potential mix-ups of Python versions or modules, if that's somehow happening).
2) Install gsutil to your system outside of a virtualenv, either via pip install --user
or by downloading it from the tarball -- while this may not prevent the mix-ups mentioned in #1, it will help me determine whether or not the potential issue below is happening:
I ask this because all of the errors I've seen for this thus far have been Cloud SDK installations, and a couple folks have mentioned that fiddling with python*
aliases seems to fix it. I wonder if, somehow, the logic in the gcloud launcher script (at <cloud-sdk-root>/bin/gsutil
) is picking the "wrong" Python version based on existing aliases? Alternatively, this might just happen if there's something odd about the python path in your environment, resulting in a mix of either Python versions or module versions being loaded and trying to communicate with each other.
If this still happens in both scenarios 1 and 2 above, I'm inclined to say this is a problem with the multiprocessing module on macOS that isn't present on other systems, and will just wait and see if it's been fixed in Python 3 (we're currently working on PY3 support for gsutil). But if it still happens at that point, we can invest more time looking into this.
But regardless, in the mean time, a good workaround to get parallelism without multiple processes would be to use multiple threads instead, i.e. setting parallel_process_count
and parallel_thread_count
in your .boto file... or inline, e.g.:
gsutil -o 'GSUtil:parallel_process_count=1' -o 'GSUtil:parallel_thread_count=16' -m cp <src> <dst>
@houglum I tried gsutil
using both options:
[1] Installed from gcloud
[2] Installed by itself from here: https://cloud.google.com/storage/docs/gsutil_install#install
The 2nd option resolved some tracebacks, but I do see them occasionally (with --version
).
From your response, I aliased gsutil
to gsutil -o 'GSUtil:parallel_process_count=1' -o 'GSUtil:parallel_thread_count=16'
.
But doing following, I still see tracebacks. gsutil getacl gs://<bucket-name
@umbs thanks. One more Q: do you have multiple Python installations/versions set up on your Mac, by chance?
The above workaround occasionally failed for me. So I ended up updating below file
google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py
in CheckMultiprocessingAvailableAndInit definition I made
multiprocessing_is_available = False
Now it works 100% of the time.
Thanks for the confirmation, @sahidurrahman. From everything I've seen, my best guess is that multiple Python installations and/or packages from multiple installations are trying to communicate with each other (i.e. the worker processes that we start up in our multiprocessing setup are somehow using different modules than the parent process). Either that, or multiprocessing is just buggy on macOS for Py2.7. On that note, we're making good progress on the Python 3 compatibility project, so we should be able to get a Py3 version out relatively soon and see if this still happens in your environments when running on Py3 :)
i see the same issue on macOS 10.14.3.
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 240, in serve_client
request = recv()
File "/Users/z0033qh/Desktop/y/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 43, in <module>
from gslib.utils import copy_helper
File "google-cloud-sdk/platform/gsutil/gslib/utils/copy_helper.py", line 267, in <module>
if CheckMultiprocessingAvailableAndInit().is_available else None))
File "google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py", line 85, in __init__
self.dict = manager.dict()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 667, in temp
token, exp = self._create(typeid, *args, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 565, in _create
conn = self._Client(self._address, authkey=self._authkey)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 437, in answer_challenge
response = connection.recv_bytes(256) # reject large message
IOError: bad message length
setting multiprocessing_is_available = False works.
This has started happening in version 255.0.0
for me. sed came to the rescue!
sed -i '' -e 's/multiprocessing_is_available = True/multiprocessing_is_available = False/g' ~/Downloads/google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py
Not sure if this helps, but I just installed the SDK 257.0.0 version and noticed that the boto install failed during gcloud init
. The error message was Error creating a default .boto configuration file. Please run [gsutil config -n] if you would like to create this file.
When I tried running that command, it failed out with a similar error to what is reported here:
Traceback (most recent call last):
File "/Users/timtrentham/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
gsutil.RunMain()
File "/Users/timtrentham/google-cloud-sdk/platform/gsutil/gsutil.py", line 124, in RunMain
sys.exit(gslib.__main__.main())
File "/Users/timtrentham/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 224, in main
gslib.command.InitializeMultiprocessingVariables()
File "/Users/timtrentham/google-cloud-sdk/platform/gsutil/gslib/command.py", line 349, in InitializeMultiprocessingVariables
total_tasks = AtomicDict(manager=manager)
File "/Users/timtrentham/google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py", line 87, in __init__
self.dict = manager.dict()
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 667, in temp
token, exp = self._create(typeid, *args, **kwds)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/managers.py", line 565, in _create
conn = self._Client(self._address, authkey=self._authkey)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 175, in Client
answer_challenge(c, authkey)
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/connection.py", line 437, in answer_challenge
response = connection.recv_bytes(256) # reject large message
IOError: bad message length
After making sure to run pyenv shell system
prior to running gsutil
, it succeeds.
That last little bit about cleaning up your python environment/aliases/paths (in this case, which Python interpreter is being invoked via pyenv) is a pretty strong indicator that this bug is caused by different Python environments (either two different interpreters, or the same interpreter with two different sets of libraries being loaded, etc.) being launched and trying to communicate with each other.
Thanks for the response. Just thought I'd add my finding as it supports what you're saying. I just needed to be sure I was actually running the system python. If I had done that originally, I wouldn't have had any issues.
Setting my Python version to macOS system Python 2.7.15 did not help gsutil ls
... any better. In my case, it was already using this version anyway.
What did help was @houglum's suggestion, configuring flags for parallelism:
https://github.com/mcandre/dotfiles/blob/master/.bashrc.d/gsutil.sh
Please implement gsutil as a standalone binary such as in Go, or at least update past Python 2, which will soon stop receiving security patches.
Same issue for macOS 10.14.6 and happened for 1 directory where I paused copying files before trying to delete them. Tried both python version 2.7.16 and 3.7.1.
Update:
gsutil -m rm -rf
gave me the above issue while gsutil rm -rf
is working fine, for both python versions.
export CLOUDSDK_PYTHON=python3
in ~/.zshrc
helped me.
Thanks @weivall ! Thats the only solution that works for me.
I had to upgrade the asn1crypto
library: pip install --upgrade asn1crypto
Most helpful comment
The above workaround occasionally failed for me. So I ended up updating below file
google-cloud-sdk/platform/gsutil/gslib/utils/parallelism_framework_util.py
in CheckMultiprocessingAvailableAndInit definition I made
multiprocessing_is_available = False
Now it works 100% of the time.