Pytorch-lightning: CPU training is broken for more than one process

Created on 2 Sep 2020 · 10Comments · Source: PyTorchLightning/pytorch-lightning

🐛 Bug

open demo notebook on Colab
change MNIST trainer line to trainer = pl.Trainer(num_processes=1, progress_bar_refresh_rate=20). Observe that this works
change MNIST line to trainer = pl.Trainer(num_processes=2, progress_bar_refresh_rate=20). Observe that training does not commence.

Working as intended bug / fix help wanted

Source

tbenst

👍1

All 10 comments

could you please share the output and expectations?

Borda on 2 Sep 2020

Hi Borda, thanks for you quick response! Did you try reproducing with my instructions? In (2) I see a training bar as expected, but in (3) there is no output unfortunately.

Edit: I'll note that I saw the same error on my own complicated model, but posted this MWE

tbenst on 2 Sep 2020

I'll check it but it would be much more convenient to describe it directly so also other people may help compare to saying "go and try" which almost no one will do...

Borda on 3 Sep 2020

Ok, here's the output. Nothing trains and there is no error.

Done!

/usr/local/lib/python3.6/dist-packages/torchvision/datasets/mnist.py:469: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: num_processes is only used for distributed_backend="ddp_cpu". Ignoring it.
  warnings.warn(*args, **kwargs)
GPU available: True, used: False
TPU available: False, using: 0 TPU cores

1

I took care to make sure the bug can be reproduced by anyone in 1 minute or less, as want to respect your time, so would encourage you to try if you're at a desktop! I also used/linked the official pytorch-lightning demo notebook, and the first MNIST hello world example, so you could be sure that it wasn't anything unusual with my code. I truly appreciate your work on this wonderful package!

Edit: wrote this pretty late and cranky, and edited for politeness.

tbenst on 3 Sep 2020

Hey Tyler, I have followed your suggestions and ran pl.Trainer with num_processes=2. Using pytorch-lightning==0.9.0, it does crash. It appears this is a bug.

In the mean time it is important to mention that PyTorch already uses multiple cpu processes, even with num_processes=1, so while we work to fix this I recommend you stick to one process on cpu, as that will yield the fastest training times.

teddykoker on 4 Sep 2020

👍1

Guys, read the warning message xD
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/distributed.py:25: UserWarning: num_processes is only used for distributed_backend="ddp_cpu". Ignoring it.

num_processes only applies to ddp_cpu mode, which needs to be explicitly set with distributed_backend="ddp_cpu".
This is expected behavior, given that this case is explicitly handled by Trainer by printing a warning.

awaelchli on 21 Sep 2020

@tbenst you won't get any speedup by using num_processes > 1 on CPU.
This is really just meant to simulate distributed training for testing purposes in environments where DDP is not available.

awaelchli on 21 Sep 2020

@awaelchli good to know thanks! No warning was printed with 0.9.0, but perhaps this has since changed.

P.S. I don’t mean to press on this since GPU training is what 99% of users need, but depending on how you compile BLAS / which one you choose, linear algebra operations are not automatically parallelized and this there would be a substantial speed up. On MKL, users sometimes specify OMP_NUM_THREAD=1 for running hyperparameter searches when code is mostly bound by single-thread performance.

tbenst on 21 Sep 2020

❤1

@tbenst you posted the warning yourself in this reply https://github.com/PyTorchLightning/pytorch-lightning/issues/3334#issuecomment-686393731
it is kinda hard to see though, because we get used to ignore warnings xD

P.S. I don’t mean to press on this since GPU training is what 99% of users need, but depending on how you compile BLAS / which one you choose, linear algebra operations are not automatically parallelized and this there would be a substantial speed up. On MKL, users sometimes specify OMP_NUM_THREAD=1 for running hyperparameter searches when code is mostly bound by single-thread performance.

ok, I did not know that. You can get the ddp cpu mode with these Trainer args:
trainer = pl.Trainer(distributed_backend="ddp_cpu", num_processes=2)

awaelchli on 21 Sep 2020

Doh!! I sure did thanks for your patience 👍.

Edit: perhaps short term solution is to modify error messages as “argument will be ignored” is different from observed behavior of training crashing

tbenst on 21 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings