trainer = pl.Trainer(num_processes=1, progress_bar_refresh_rate=20). Observe that this workstrainer = pl.Trainer(num_processes=2, progress_bar_refresh_rate=20). Observe that training does not commence.could you please share the output and expectations?
Hi Borda, thanks for you quick response! Did you try reproducing with my instructions? In (2) I see a training bar as expected, but in (3) there is no output unfortunately.
Edit: I'll note that I saw the same error on my own complicated model, but posted this MWE
I'll check it but it would be much more convenient to describe it directly so also other people may help compare to saying "go and try" which almost no one will do...
Ok, here's the output. Nothing trains and there is no error.
Done!
/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py:469: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: num_processes is only used for distributed_backend="ddp_cpu". Ignoring it.
warnings.warn(*args, **kwargs)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores
1
I took care to make sure the bug can be reproduced by anyone in 1 minute or less, as want to respect your time, so would encourage you to try if you're at a desktop! I also used/linked the official pytorch-lightning demo notebook, and the first MNIST hello world example, so you could be sure that it wasn't anything unusual with my code. I truly appreciate your work on this wonderful package!
Edit: wrote this pretty late and cranky, and edited for politeness.
Hey Tyler, I have followed your suggestions and ran pl.Trainer with num_processes=2. Using pytorch-lightning==0.9.0, it does crash. It appears this is a bug.
In the mean time it is important to mention that PyTorch already uses multiple cpu processes, even with num_processes=1, so while we work to fix this I recommend you stick to one process on cpu, as that will yield the fastest training times.
Guys, read the warning message xD
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: num_processes is only used for distributed_backend="ddp_cpu". Ignoring it.
num_processes only applies to ddp_cpu mode, which needs to be explicitly set with distributed_backend="ddp_cpu".
This is expected behavior, given that this case is explicitly handled by Trainer by printing a warning.
@tbenst you won't get any speedup by using num_processes > 1 on CPU.
This is really just meant to simulate distributed training for testing purposes in environments where DDP is not available.
@awaelchli good to know thanks! No warning was printed with 0.9.0, but perhaps this has since changed.
P.S. I don’t mean to press on this since GPU training is what 99% of users need, but depending on how you compile BLAS / which one you choose, linear algebra operations are not automatically parallelized and this there would be a substantial speed up. On MKL, users sometimes specify OMP_NUM_THREAD=1 for running hyperparameter searches when code is mostly bound by single-thread performance.
@tbenst you posted the warning yourself in this reply https://github.com/PyTorchLightning/pytorch-lightning/issues/3334#issuecomment-686393731
it is kinda hard to see though, because we get used to ignore warnings xD
P.S. I don’t mean to press on this since GPU training is what 99% of users need, but depending on how you compile BLAS / which one you choose, linear algebra operations are not automatically parallelized and this there would be a substantial speed up. On MKL, users sometimes specify OMP_NUM_THREAD=1 for running hyperparameter searches when code is mostly bound by single-thread performance.
ok, I did not know that. You can get the ddp cpu mode with these Trainer args:
trainer = pl.Trainer(distributed_backend="ddp_cpu", num_processes=2)
Doh!! I sure did thanks for your patience 👍.
Edit: perhaps short term solution is to modify error messages as “argument will be ignored” is different from observed behavior of training crashing