Vision: Feature Pyramid Network code bug

Created on 17 Jun 2020 · 7Comments · Source: pytorch/vision

https://github.com/pytorch/vision/blob/c2e8a00885e68ae1200eb6440f540e181d9125de/torchvision/ops/feature_pyramid_network.py#L60

this line does not work, I think it should be modified as self.modules() instead of self.children()

bug help wanted ops needs training object detection

Source

cl2227619761

All 7 comments

Hi @cl2227619761, what do you mean by

this line does not work

Please post a minimal example where you encounter an issue with this line.

pmeier on 17 Jun 2020

Hi @cl2227619761

That is a very good catch!

@pmeier the issue is that self.children() returns the immediate modules, so in this case it would be inner_blocks and layer_blocks, none of which are nn.Conv2d.
The fix is what @cl2227619761 mentioned: replace self.children() with self.modules().
But this brings the question on the impact of the initialization. If I remember correctly, we needed the custom initialization in order to get best results, so I'm surprised that this was left out.

A PR fixing this would be great, but it would also be great if we could measure the impact of this incorrect initialization in the model performance.

@mthrok if you find the time, it would be great to assess how much this mistake affects performance.

fmassa on 17 Jun 2020

👍1

My bad, I misunderstood the comment above the loop:

https://github.com/pytorch/vision/blob/c2e8a00885e68ae1200eb6440f540e181d9125de/torchvision/ops/feature_pyramid_network.py#L59-L63

pmeier on 17 Jun 2020

👍1

Can I help with this? If you point me to what model/scripts you used to train I can train with the correct init and the current one.

gan3sh500 on 21 Jun 2020

@gan3sh500 You can find the training scripts here.

pmeier on 21 Jun 2020

I’ll have to train on single 2080ti. I’ve seen smaller batch size create worse convergence but is it fine if I compare with both the kaiming init and otherwise trained with same conditions?

gan3sh500 on 21 Jun 2020

@gan3sh500 sorry for the delay in replying.

If you change the number of GPUs, you'll need to adapt the learning rate to follow the linear scaling rule -- if you divide the global batch size by 8x, you should divide the learning rate by 8x

fmassa on 30 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings