@khalidw yes this is possible. I would recommend YOLOv5, as the 3 feature maps are simply inputs into the Detect() layer, you can examine them there. For YOLOv5s on zidane.jpg these features are of shapes:
torch.Size([1, 128, 80, 60])
torch.Size([1, 256, 40, 30])
torch.Size([1, 512, 20, 15])
@glenn-jocher thanks for your timely reply, is the same possible for yolov3? I have spend quite some time on v3 and have trained yolov3, yolov3-tiny, yolov3-spp models for my use case, moving to YOLOv5 at this moment will be a very big hassle for me.
@khalidw you should be able to do the same with YOLOv3
you can add three lines code in models.py(L287-L289) to collect feature maps:
for example, I use yolov3spp.cfg, and input size is 512x512, then
feature_maps = [] # new line1
for i, module in enumerate(self.module_list):
name = module.__class__.__name__
if name in ['WeightedFeatureFusion', 'FeatureConcat']:
if verbose:
l = [i - 1] + module.layers # layers
sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]
str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
x = module(x, out) # WeightedFeatureFusion(), FeatureConcat()
elif name == 'YOLOLayer':
yolo_out.append(module(x, out))
else: # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
x = module(x)
if i in [87, 99, 111]: # new line2
feature_maps.append(x) # new line3
in feature_maps list:
[batch_size, 1024, 16, 16] # feature map1
[batch_size, 512, 32, 32] # feature map2
[batch_size, 256, 64, 64] # feature map3
@WZMIAOMIAO Thank you for showing how to do it in code. It was really very helpful. For yolov3 I used layers 36, 61 and 74, as I figured out that layer count starts from 0 (i from the for loop). My input image size is 192x320 therefore I am getting the following output:
layer: 36, torch.Size([1, 256, 24, 40])
layer: 61, torch.Size([1, 512, 12, 20])
layer: 74, torch.Size([1, 1024, 6, 10])
My plan is to locate the detected object in these feature maps (object bounding box * 1/feature map stride) and take average across all channels and scales.
Most helpful comment
you can add three lines code in models.py(L287-L289) to collect feature maps:
for example, I use yolov3spp.cfg, and input size is 512x512, then
in feature_maps list:
[batch_size, 1024, 16, 16] # feature map1
[batch_size, 512, 32, 32] # feature map2
[batch_size, 256, 64, 64] # feature map3