DDP cannot start with following error. This happened after I upgraded from 0.7.1 to 0.7.5.
Traceback (most recent call last):
File "train.py", line 365, in <module>
fire.Fire(train)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 138, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 468, in _Fire
target=component.__name__)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "train.py", line 348, in train
trainer.fit(model)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 751, in fit
mp.spawn(self.ddp_train, nprocs=self.num_processes, args=(model,))
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 162, in spawn
process.start()
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/home/jacobz/.conda/envs/lidar/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object
Sorry I don't have a short example to reproduce this yet.
* CUDA:
- GPU:
- GeForce GTX 1080 Ti
- available: True
- version: 10.1
* Packages:
- numpy: 1.18.1
- pyTorch_debug: False
- pyTorch_version: 1.4.0
- pytorch-lightning: 0.7.5
- tensorboard: 2.2.0
- tqdm: 4.45.0
* System:
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.7
- version: #38~18.04.1-Ubuntu SMP Tue Mar 31 04:17:56 UTC 2020
Find a related issue https://github.com/hyperopt/hyperopt-sklearn/issues/74, but I'm sure there's no logger in my module.
And also if I try to pickle my model:
import pickle
pickle.dumps(model)
There's no error occurred..
@cmpute try pickling the Trainer, that's what usually fails. See #1628 for a similar error I debugged yesterday. It is probably something custom (or just non-default!) that you pass to the Trainer, ie. in args.
I haven't met this problem for a while, it may be fixed by latest commits.. I'll close it for now
I'm also experiencing this now. Not fixed yet!
the same issue, please reopen
pl.__version__='0.9.0rc1'
In my case, it happens when I provide the output_filename parameter to pytorch_lightning.profiler.SimpleProfiler and run in ddp_spawn regime