this line does not work, I think it should be modified as self.modules() instead of self.children()
Hi @cl2227619761, what do you mean by
this line does not work
?
Please post a minimal example where you encounter an issue with this line.
Hi @cl2227619761
That is a very good catch!
@pmeier the issue is that self.children() returns the immediate modules, so in this case it would be inner_blocks and layer_blocks, none of which are nn.Conv2d.
The fix is what @cl2227619761 mentioned: replace self.children() with self.modules().
But this brings the question on the impact of the initialization. If I remember correctly, we needed the custom initialization in order to get best results, so I'm surprised that this was left out.
A PR fixing this would be great, but it would also be great if we could measure the impact of this incorrect initialization in the model performance.
@mthrok if you find the time, it would be great to assess how much this mistake affects performance.
My bad, I misunderstood the comment above the loop:
Can I help with this? If you point me to what model/scripts you used to train I can train with the correct init and the current one.
@gan3sh500 You can find the training scripts here.
I鈥檒l have to train on single 2080ti. I鈥檝e seen smaller batch size create worse convergence but is it fine if I compare with both the kaiming init and otherwise trained with same conditions?
@gan3sh500 sorry for the delay in replying.
If you change the number of GPUs, you'll need to adapt the learning rate to follow the linear scaling rule -- if you divide the global batch size by 8x, you should divide the learning rate by 8x