Yolov5: What's the Focus layer?

Created on 26 Jun 2020  ·  3Comments  ·  Source: ultralytics/yolov5

I see a focus layer after the input

class Focus(nn.Module):
    # Focus wh information into c-space
    def __init__(self, c1, c2, k=1):
        super(Focus, self).__init__()
        self.conv = Conv(c1 * 4, c2, k, 1)

    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)
        return self.conv(torch.cat([x[..., ::2, ::2],
                                    x[..., 1::2, ::2],
                                    x[..., ::2, 1::2],
                                    x[..., 1::2, 1::2]], 1))

Which transforms:

[[[[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15]]]]

to 

[[[[ 0,  2],
   [ 8, 10]],

  [[ 4,  6],
  [12, 14]],

  [[ 1,  3],
   [9, 11]],

  [[5,  7],
  [13, 15]]]]

Which sort of seems like a downsample, but why not use DownSample directly? What does this accomplish and is there any literature I can read about this technique?

Most helpful comment

check TResNet paper. p2. They call it SpaceToDepth

All 3 comments

Hello @maykulkarni, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook Open In Colab, Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

  • Cloud-based AI systems operating on hundreds of HD video streams in realtime.
  • Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
  • Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

check TResNet paper. p2. They call it SpaceToDepth

@maykulkarni I think It is the inverse operation of pixelshuffle

Was this page helpful?
0 / 5 - 0 ratings