This happens in the training loop.
ValueError: only one element tensors can be converted to Python scalars
From my observation, I believe this happens when the batch size can't be divided by gpu num. For example on the last batch of each epoch, and when you have 4 gpus but set batch size to 2.
I think it would be nice to use only some of the gpus the user specified, while printing out a msg telling them that the gpus are not specified correctly. Current implementaion simply throws a not friendly error
Do I understand correctly, this happens with every Lightning training where batch_size < num_gpus?
Then I would also like to see a warning message like you described and automatically set num_gpus to the batch size.
However, we still have the problem where batch_size > num_gpus for all batches except the last, when we specify drop_last=False in dataloader. What do we do then?
Would the same error occur in torch.nn.DataParallel (without PL)?
the last epoch seems a bit tricky.. I'm not expert on this, but I wonder if its possible to send the data to part of the specified gpus (for example 4 gpus in all, but send 3 batches to first 3 gpu). I remember there's some send_batch_to_gpu function in pl
@neggert @jeffling pls ^^
@Ir1d btw this person #1236 is getting the same error but doesn't have GPU.
@Ir1d mind send a PR or provide an example to replicate it?
I can provide an example next week. Currently not sure how to fix this
Hi @Borda, I've reproduced the same issue here:
https://github.com/Richarizardd/pl_image_classification
Basic image classification with PL using MNIST, CIFAR10, and ImageFolder Datasets from torchvision. If you run the mnist_gpu1.yaml config file, you would get the same issue as @Ir1d
Hi @Richarizardd
I looked at your code and I found that in your validation epoch end, you don't reduce the outputs properly. PL does not do this for you. This is intentional, right @williamFalcon ?
So, in your validation_epoch_end, instead of
for output in outputs:
metric_total += output[metric_name]
tqdm_dict[metric_name] = metric_total / len(outputs)
you should do the following:
tqdm_dict[metric_name] = torch.stack([output[metric_name] for output in outputs]).mean()
I tested this by adding it to your code and it worked (no error).
As far as I can tell, this is not a bug in PL. However, we could print a better error message.
@Ir1d you probably had the same mistake.

In @Richarizardd's case, the error was thrown in validation_epoch_end. Could you post your stack trace here so I can check if it's the same?
Your metric reduction looks fine to me.
I tried again and there's still this issue in pl v0.7.3
Here's the whole log for a recent run. I set bs=3 on 4 gpus and set pl use all the 4 gpus, and this is happening for the first batch. (in another case when drop_last is not set and bs=12 on 4 gpus, this is happening for the last batch of the epoch, which seems that it happens when bs < num gpus)

I'll try bring a minimal code for reproduction after I finish my midterm tests. Currently my code is available at https://github.com/ir1d/AFN , but the data is a bit large and might be hard to run.
@Ir1d found the bug in trainer code. It does not reduce the outputs if output_size of training step does not equal num_gpus. I will make a PR to fix it.
@Ir1d The fix got merged. Kindly asking you to verify the fix with latest master branch. Closing for now.
Most helpful comment
Hi @Richarizardd
I looked at your code and I found that in your validation epoch end, you don't reduce the outputs properly. PL does not do this for you. This is intentional, right @williamFalcon ?
So, in your
validation_epoch_end, instead ofyou should do the following:
I tested this by adding it to your code and it worked (no error).
As far as I can tell, this is not a bug in PL. However, we could print a better error message.
@Ir1d you probably had the same mistake.