Gsutil: TypeError: cannot pickle '_io.TextIOWrapper' object

Created on 14 Feb 2020  Â·  19Comments  Â·  Source: GoogleCloudPlatform/gsutil

gsutil -m -h "Cache-Control: public, max-age=31536000" cp -r test/** gs://some-bucket
Traceback (most recent call last):
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil", line 21, in <module>
    gsutil.RunMain()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gsutil.py", line 124, in RunMain
    sys.exit(gslib.__main__.main())
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 424, in main
    return _RunNamedCommandAndHandleExceptions(
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 762, in _RunNamedCommandAndHandleExceptions
    _HandleUnknownFailure(e)
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/__main__.py", line 620, in _RunNamedCommandAndHandleExceptions
    return command_runner.RunNamedCommand(command_name,
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 411, in RunNamedCommand
    return_code = command_inst.RunCommand()
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/commands/cp.py", line 1201, in RunCommand
    self.Apply(_CopyFuncWrapper,
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1499, in Apply
    self._ParallelApply(
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1719, in _ParallelApply
    self._CreateNewConsumerPool(process_count, thread_count,
  File "/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/platform/gsutil/gslib/command.py", line 1384, in _CreateNewConsumerPool
    p.start()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_io.TextIOWrapper' object

gsutil version: 4.47

Most helpful comment

gsutil does not work with python 3.8, force it to use python 3.7 with something like

export CLOUDSDK_PYTHON=/usr/bin/python3     # on mac
export CLOUDSDK_PYTHON=/usr/bin/python3.7   # on linux

All 19 comments

I ran into the same issue. If you use another interpreter (python 3.7 for instance) all is well. This is a problem specifically with Python 3.8

Google Cloud SDK 281.0.0
beta 2019.05.17
bq 2.0.53
cloud-firestore-emulator 1.10.4
core 2020.02.14
gsutil 4.47

Bug is still there

Google Cloud SDK 283.0.0
alpha 2019.05.17
app-engine-python 1.9.88
beta 2019.05.17
bq 2.0.54
cloud-datastore-emulator 2.1.0
core 2020.02.28
gsutil 4.48

Still exists now, only on multiprocessing flag, runs fine without -m :

Google Cloud SDK 286.0.0
bq 2.0.55
core 2020.03.24
gsutil 4.48

Tracking this down, this error comes from a change in Python 3.8 in the multiprocessing library:

Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725.

Spawn is being run for those using MacOs and Python 3.8+ by default since nothing is explicitly set either through get_context or set_start_method.

The issue still presents with Cloud SDK 302.0.0 (gsutil 4.52), on macOS 10.15.6 with Python 3.8.5 installed from homebrew

One workaround is to use the Python 3 interpreter shipped with macOS /usr/bin/python3 by setting the Cloud SDK interpreter path https://cloud.google.com/sdk/gcloud/reference/topic/startup

gsutil does not work with python 3.8, force it to use python 3.7 with something like

export CLOUDSDK_PYTHON=/usr/bin/python3     # on mac
export CLOUDSDK_PYTHON=/usr/bin/python3.7   # on linux

@aleb I'm not sure if this is specific to Mac Mojave, but the path for python3 for me was /usr/local/bin/python3. I couldn't get it to work with python3 anyways, but forcing it to use 2.7 worked like a charm.

export CLOUDSDK_PYTHON=/usr/local/bin/python3      # did not work
export CLOUDSDK_PYTHON=/usr/bin/python2.7          # worked

From the link @caizixian provided,

Python 3 is preferred over Python 2. Note that gcloud requires Python version 2.7.x or 3.5 and up. Other Python tools shipped in the Cloud SDK do not support Python 3 and require Python 2.7.x,

Another workaround on macOS is to

brew install [email protected]
export CLOUDSDK_PYTHON=/usr/local/opt/[email protected]/bin/python3

@dinvlad That worked for me! Thank you so much

@dinvlad Thank you! Works perfectly!

With such a strange "pickle" error, I didn't expect to find my resolution so quickly. Thank you, @dinvlad!!

export CLOUDSDK_PYTHON=/usr/bin/python2.7 will work ! export CLOUDSDK_PYTHON=/usr/bin/python3 or export CLOUDSDK_PYTHON=path/for/python3.7 will solve the current issue but will run into module 'sys' has no attribute 'maxint' error.

While I recognize comments like "Is tHiS fiXEd??" are not helpful — would it be possible for someone on the Google side to acknowledge this is a bug in gsutil and plan to resolve?

Currently, IIUC, gsutil breaks on python 3.8 — a version released a year ago, and the default brew version. Workarounds like installing another version of python are not small adjustments, and difficult for less technical colleagues. There are 49 :+1:s on the issue.

Sorry for the delay. We are aware of this bug and we are working on releasing this workaround soon https://github.com/GoogleCloudPlatform/gsutil/pull/1107

Another workaround would be to disable multiprocessing altogether when using Python 3.8. This can be done either by setting the parallel_process_count=1 in the boto config file or by passing the option from the command line like this

gsutil -o "GSUtil:parallel_process_count=1" -m cp .....

This will be relatively slow as it will be using a single process, however, multithreading will be still ON.

That's an excellent workaround, thanks @dilipped !

updating gsutil solved the issue with python3.8

gsutil does not work with python 3.8, force it to use python 3.7 with something like

export CLOUDSDK_PYTHON=/usr/bin/python3     # on mac
export CLOUDSDK_PYTHON=/usr/bin/python3.7   # on linux

It works fine to me

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gdaughenbaugh picture gdaughenbaugh  Â·  6Comments

RageBill picture RageBill  Â·  9Comments

jterrace picture jterrace  Â·  3Comments

tispratik picture tispratik  Â·  4Comments

kent-at-multiscale picture kent-at-multiscale  Â·  9Comments