Right now it makes everything NaN.
To be clear, for a N x C x H x W tensor, the problem is only when all of N, H, and W are 1. (So BatchNorm2d on batch size of 1 is OK as long as you don't have a 1x1 image).
What's the desired behavior? The only reasonable behavior I can think of is:
I'm not sure the outputing zero is a good idea. I can't think of a case where that's what you want.
Most helpful comment
To be clear, for a
N x C x H x Wtensor, the problem is only when all of N, H, and W are 1. (So BatchNorm2d on batch size of 1 is OK as long as you don't have a 1x1 image).What's the desired behavior? The only reasonable behavior I can think of is:
I'm not sure the outputing zero is a good idea. I can't think of a case where that's what you want.