I see a focus layer after the input
class Focus(nn.Module):
# Focus wh information into c-space
def __init__(self, c1, c2, k=1):
super(Focus, self).__init__()
self.conv = Conv(c1 * 4, c2, k, 1)
def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)
return self.conv(torch.cat([x[..., ::2, ::2],
x[..., 1::2, ::2],
x[..., ::2, 1::2],
x[..., 1::2, 1::2]], 1))
Which transforms:
[[[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]]]]
to
[[[[ 0, 2],
[ 8, 10]],
[[ 4, 6],
[12, 14]],
[[ 1, 3],
[9, 11]],
[[5, 7],
[13, 15]]]]
Which sort of seems like a downsample, but why not use DownSample directly? What does this accomplish and is there any literature I can read about this technique?
Hello @maykulkarni, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
check TResNet paper. p2. They call it SpaceToDepth
@maykulkarni I think It is the inverse operation of pixelshuffle
Most helpful comment
check TResNet paper. p2. They call it SpaceToDepth