Ssd_keras: Role of num_priors

Created on 29 Nov 2016  路  6Comments  路  Source: rykov8/ssd_keras

Hi rykov8,

First of, great job! and thanks a lot for sharing this repo with us.

I'm trying to shorten the ssd network a bit to see if I can gain on speed during training.
I see you set num_priors variable to either 3 or 6, and then use it to determine the number of filters nb_filter in the Conv2D layers responsible for the location and confidence multiboxes.
Now, in trying to make a shorter network I end up with a shape mismatch coming from the last merge() layer (prediction layer):
Exception: "concat" mode can only merge layers with matching output shapes except for the concat axis. Layer shapes: [(None, 247500, 4), (None, 247500, 2), (None, 337500, 8)]

That is, the shape of these three layers:

net['mbox_loc'] = Reshape((num_boxes, 4),
                              name='mbox_loc_final')(net['mbox_loc'])
net['mbox_conf'] = Reshape((num_boxes, num_classes),
                               name='mbox_conf_logits')(net['mbox_conf'])
net['mbox_priorbox'] = merge([net['conv1_2_mbox_priorbox'],
                                  net['conv2_2_mbox_priorbox']],
                                  mode='concat',
                                  concat_axis=1,
                                  name='mbox_priorbox')

I try to look in the literature with no luck. Can you maybe explain how to set this parameter? What does it depend on?
My input image shape is (300, 300,3), just like in your example and I'm using the same priors you pickled.

Thanks in advance!

Most helpful comment

@natlachaman Hi! Thanks for the question. The answer is quite simple, if I got your question right. Each grid cell predicts the class and the modification of each prior box. For each grid cell the number of prior boxes is calculated according to provided aspect ratios and the presence of max_size. For example, if aspect_ratios=[2, 3] and max_size is not None it means, that for this prediction there will be 6 priors: small square min_size x min_size, big square sqrt(min_size * max_size) x sqrt(min_size * max_size) and 4 rectangles with sizes min_size * sqrt(ar) x min_size / sqrt(ar), where ar can be every element from array [2, 3, 1/2, 1/3]. Other possible situations are hopefully clear. This priors are always the same for every picture and are computed during the forward pass via PriorBox layer. You may refer to its implementation in ssd_layers.py. As for training, I saved prior boxes to a separate file, because they are needed to assign ground truth boxes to priors, and it is the most convenient way to deal with it.
So, if you reduce num_priors for Conv2D layers responsible for the location and confidence multiboxes, you also need to reduce the number of priors' aspect ratios and/or remove max_size in order to remove big square prior (it may lead to poor performance, though). Probably, it was better to set num_priors automatically, but I'm not sure, whether it is worth spending time on it now.

All 6 comments

@natlachaman Hi! Thanks for the question. The answer is quite simple, if I got your question right. Each grid cell predicts the class and the modification of each prior box. For each grid cell the number of prior boxes is calculated according to provided aspect ratios and the presence of max_size. For example, if aspect_ratios=[2, 3] and max_size is not None it means, that for this prediction there will be 6 priors: small square min_size x min_size, big square sqrt(min_size * max_size) x sqrt(min_size * max_size) and 4 rectangles with sizes min_size * sqrt(ar) x min_size / sqrt(ar), where ar can be every element from array [2, 3, 1/2, 1/3]. Other possible situations are hopefully clear. This priors are always the same for every picture and are computed during the forward pass via PriorBox layer. You may refer to its implementation in ssd_layers.py. As for training, I saved prior boxes to a separate file, because they are needed to assign ground truth boxes to priors, and it is the most convenient way to deal with it.
So, if you reduce num_priors for Conv2D layers responsible for the location and confidence multiboxes, you also need to reduce the number of priors' aspect ratios and/or remove max_size in order to remove big square prior (it may lead to poor performance, though). Probably, it was better to set num_priors automatically, but I'm not sure, whether it is worth spending time on it now.

Oh! I see it now! Makes a lot of sense.
I have one last doubt. You mentioned that for training, you have a separate file with the assigned ground truth boxes to prior boxes, is that right? And the number of priors depend on the number of items given to aspect_ratios and the image size, if I'm correct.
How should I create the prior boxes a priori to match the number of priors drawn by my model? I imagine that the image size and the aspect ratios should be the same in both cases, but in your model the aspect ratios varies and the size of the feature maps too (you take a different lower layer every time).
How can I go about this?

So far I created the "prior boxes" file by a forward-passing the dataset through a PriorBoxlayer like this:

img_size = image.shape # (300, 300, 3)
input_tensor = Input(shape=img_size)
priorbox = PriorBox(img_size[:-1], 30.0, aspect_ratios=[1], variances=[0.1, 0.1, 0.2, 0.2])
model = Model(input_tensor, output_tensor)
priors = model.predict(image)
priors.shape # (90000, 8)

Thanks for taking time for my questions!

You mentioned that for training, you have a separate file with the assigned ground truth boxes to prior boxes, is that right?

No, it is not. If you want to reduce the number of priors, produced by the net, you should change aspect_ratios in the architecture and set num_priors respectively in every prediction block.

How should I create the prior boxes a priori to match the number of priors drawn by my model?

You don't need to do it. Once you have changed the SSD architecture, even with some untrained weights you can just call:

preds = model.predict(inputs, batch_size=1, verbose=1)

exactly like in SSD.ipynb. Here preds[0, :, -8:] are actually all priors, each prior looks like:

preds[0, i, -8] = [xmin, ymin, xmax, ymax, var1, var2, var3, var4]

xmin etc are in relative coordinates. Now you need to save preds[0, :, -8:], for default settings it is exactly what is in the prior_boxes_ssd300.pkl (you can check this).
Ground truth example is gt_pascal.pkl (it is not PASCAL ground truth, it is gt in PASCAL-like form). In this file you can see, that for each image gt is a list (probably an empty one), each element of this list looks like [xmin, ymin, xmax, ymax, prob1, prob2, prob3], xmin etc are in relative coordinates. Here I assume, that first 4 numbers are coordinates and others are one-hot encoded classes (excluding background for ground truth).
Actually, that is. Now for each gt list of boxes call assign_boxes method of BBoxUtility class (refer to SSD_training.ipynb generate method of Generator class).

Everything make sense now. I followed your instructions and is all running fine :)

Thanks a lot!!

@natlachaman you're welcome!

Thank you for your work on this. May I ask, for clarification, is there a typo in your response?

preds[0, i, -8] = [xmin, ymin, xmax, ymax, var1, var2, var3, var4]

Should it be, preds[0, i, -8:] instead? (i.e., colon after 8)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ayushchopra96 picture ayushchopra96  路  15Comments

freshn picture freshn  路  4Comments

lydials picture lydials  路  4Comments

abduallahmohamed picture abduallahmohamed  路  16Comments

emillion92 picture emillion92  路  4Comments