Yolov3: Feature extraction from intermediate layers

Created on 29 Oct 2020 · 5Comments · Source: ultralytics/yolov3

I am working on a tracking application and want to use the features generated by Yolo in track association stage. I know that yolo does object detection at three (3) scales, would it be possible to extract these three feature maps against an input image.

question

Source

khalidw

Most helpful comment

you can add three lines code in models.py(L287-L289) to collect feature maps:
for example, I use yolov3spp.cfg, and input size is 512x512, then

        feature_maps = []  # new line1
        for i, module in enumerate(self.module_list):
            name = module.__class__.__name__
            if name in ['WeightedFeatureFusion', 'FeatureConcat']:  
                if verbose:
                    l = [i - 1] + module.layers  # layers
                    sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  
                    str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
                x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat()
            elif name == 'YOLOLayer':
                yolo_out.append(module(x, out))
            else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
                x = module(x)

            if i in [87, 99, 111]:      # new line2
                feature_maps.append(x)       # new line3

in feature_maps list:
[batch_size, 1024, 16, 16] # feature map1
[batch_size, 512, 32, 32] # feature map2
[batch_size, 256, 64, 64] # feature map3

WZMIAOMIAO on 30 Oct 2020

👍2

All 5 comments

@khalidw yes this is possible. I would recommend YOLOv5, as the 3 feature maps are simply inputs into the Detect() layer, you can examine them there. For YOLOv5s on zidane.jpg these features are of shapes:

torch.Size([1, 128, 80, 60])
torch.Size([1, 256, 40, 30])
torch.Size([1, 512, 20, 15])

See
https://github.com/ultralytics/yolov5/blob/c8c5ef36c9a19c7843993ee8d51aebb685467eca/models/yolo.py#L22

glenn-jocher on 29 Oct 2020

@glenn-jocher thanks for your timely reply, is the same possible for yolov3? I have spend quite some time on v3 and have trained yolov3, yolov3-tiny, yolov3-spp models for my use case, moving to YOLOv5 at this moment will be a very big hassle for me.

khalidw on 29 Oct 2020

@khalidw you should be able to do the same with YOLOv3

glenn-jocher on 29 Oct 2020

you can add three lines code in models.py(L287-L289) to collect feature maps:
for example, I use yolov3spp.cfg, and input size is 512x512, then

        feature_maps = []  # new line1
        for i, module in enumerate(self.module_list):
            name = module.__class__.__name__
            if name in ['WeightedFeatureFusion', 'FeatureConcat']:  
                if verbose:
                    l = [i - 1] + module.layers  # layers
                    sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  
                    str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
                x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat()
            elif name == 'YOLOLayer':
                yolo_out.append(module(x, out))
            else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
                x = module(x)

            if i in [87, 99, 111]:      # new line2
                feature_maps.append(x)       # new line3

in feature_maps list:
[batch_size, 1024, 16, 16] # feature map1
[batch_size, 512, 32, 32] # feature map2
[batch_size, 256, 64, 64] # feature map3

WZMIAOMIAO on 30 Oct 2020

👍2

@WZMIAOMIAO Thank you for showing how to do it in code. It was really very helpful. For yolov3 I used layers 36, 61 and 74, as I figured out that layer count starts from 0 (i from the for loop). My input image size is 192x320 therefore I am getting the following output:

layer: 36, torch.Size([1, 256, 24, 40])
layer: 61, torch.Size([1, 512, 12, 20])
layer: 74, torch.Size([1, 1024, 6, 10])

My plan is to locate the detected object in these feature maps (object bounding box * 1/feature map stride) and take average across all channels and scales.

khalidw on 30 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings