Darknet: Changing input size from 416x416 to 104x104

Created on 16 May 2018  路  13Comments  路  Source: AlexeyAB/darknet

I want to change tiny yolo v3 input size from 416x416 to 104x104 because I have small images and I want it to train faster. What parts of this file do I need to change?
Usually, in the yolo v2 model, I would just remove some max pooling layers and it would be fine. Now, this cfg file is complicated and I don't know where to start.
So if possible, please help me comment some maxpool layers or make it appropriately correct for the input size I described.

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=414
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=133
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=414
activation=linear

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=133
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

Most helpful comment

@off99555 not so simple) File https://pjreddie.com/media/files/yolov3-tiny.weights trained for mask = 1,2,3 so it will work incorrect with mask = 0,1,2. We are waiting for the new yolov3-tiny.weights

All 13 comments

If you will remove downsampling layers such as max-pool layers with stride=2 - it will increase resulotion of the subsequent layers and it will be decrease performance.

What does the mask do? When do I need to change it?
Do I need to change it if I didn't change width and height to 128?

What does the mask do? When do I need to change it?

There is a bug, it should be: mask = 0,1,2 - this is indecies of anchors that will be used for this [yolo] layer

[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319

So will be used bolded anchors in this yolo-layer.

Yeah, so if it's a bug then it should be 0,1,2 always regardless of input size right?
And if it's a bug why don't we change it for everyone inside the repo by committing it?

@off99555 not so simple) File https://pjreddie.com/media/files/yolov3-tiny.weights trained for mask = 1,2,3 so it will work incorrect with mask = 0,1,2. We are waiting for the new yolov3-tiny.weights

Ah, I see. So if I am training from scratch without any pre-trained weights, I should use mask=0,1,2. But if I use the pre-trained weights provided by darknet as of now, I need to use mask=1,2,3.
All of these are regardless of input size. Am I right?

If you train your own network (regardless of with or without any pre-trained weights) then use mask=0,1,2
If you use exactly yolov3-tiny.weights and exactly for Detection, then use mask=1,2,3

Thank you. I have another error. It's when I train the model 128x128 to detect 133 classes.
Command:
./darknet detector train data/ch.data ./cfg/yolov3-tiny-ch.cfg -dont_show
Head of the output:
2018-05-17_20-53-42
Tail of the output:
2018-05-17_20-52-52
Cfg file:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=2
width=128
height=128
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=414
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=133
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=414
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=133
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

What does resizing mean? I saw the number changes randomly. E.g. for the model 416x416, sometimes I saw it say resizing 500x500 something.

I fixed it. Update your code from this GitHub repository.

When you set random=1 then for each 10 iterations the network will be resized randomly +-160 pixels. It allows to train such that it can detect object with different sizes.

Thanks. My image is already difficult enough to recognize, so if I don't want it to decrease my image size then I change random=0 on those 2 random variables right?

Yes, you can set random=0 in each yolo-layers.

OK. Resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shootingliu picture shootingliu  路  3Comments

zihaozhang9 picture zihaozhang9  路  3Comments

off99555 picture off99555  路  3Comments

bit-scientist picture bit-scientist  路  3Comments

PROGRAMMINGENGINEER-NIKI picture PROGRAMMINGENGINEER-NIKI  路  3Comments