Darknet: Yolo-v2 (VOC+COCO)

Created on 30 Jan 2018 · 15Comments · Source: AlexeyAB/darknet

Hello,

When I checked the VOC evaluation results, there is a result of Yolo-v2 (VOC+COCO) which has reached above 81mAP. Where we can find some information about it?

2018-01-30_23-45-02

Source

VanitarNordic

Most helpful comment

@VanitarNordic @phongnhhn92 @TheMikeyR

Someone offered a solution to this problem. I did not try it, but you can test it: https://groups.google.com/forum/#!searchin/darknet/rotation%7Csort:date/darknet/DPxhZcC0x2k/NBvD06urAwAJ
He said:

I trained this with COCO with/without pre-trained model yesterday, but still, the final weights is pretty bad for detection. Although the weights trained without pre-trained model is better.

He made hardcoded rotation on 90 degree.
But I change this code to the random rotation in range(-angle, +angle) that set in the cfg-file:

you should set angle from 0 to 180: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/cfg/yolo-voc.2.0.cfg#L9
You should change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L744
to this: *a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.num_boxes, a.classes, a.jitter, a.hue, a.saturation, a.exposure, a.small_object, a.angle);
change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.h#L86
to this: data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, int small_object, float angle);
change these 2 functions:
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L671
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L295

to these 3 functions:

/***** rotate truth boxes 90 degree *****/
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, image im, float rad)
{   
    float tx, ty;
    char labelpath[4096];
    find_replace(path, "images", "labels", labelpath);
    find_replace(labelpath, "JPEGImages", "labels", labelpath);

    find_replace(labelpath, "raw", "labels", labelpath);
    find_replace(labelpath, ".jpg", ".txt", labelpath);
    find_replace(labelpath, ".png", ".txt", labelpath);
    find_replace(labelpath, ".JPG", ".txt", labelpath);
    find_replace(labelpath, ".JPEG", ".txt", labelpath);
    int count = 0;
    float x,y,w,h;
    int id;
    int i;
    //float rad = TWO_PI/4;
    box_label *boxes = read_boxes(labelpath, &count);
    randomize_boxes(boxes, count);

    if(count > num_boxes) count = num_boxes;
    for (i = 0; i < count; ++i) {
        tx = boxes[i].x * im.h;
        ty = boxes[i].y * im.w;
        x = (cos(rad)*(tx-im.h/2) - sin(rad)*(ty-im.w/2) + im.w/2)/im.w;
        y = (sin(rad)*(tx-im.h/2) + cos(rad)*(ty-im.w/2) + im.h/2)/im.h;
        boxes[i].x = x;
        boxes[i].y = y;

        w =  boxes[i].h;
        h =  boxes[i].w;
        boxes[i].w = w;
        boxes[i].h = h; 

        boxes[i].left   = x - w/2;
        boxes[i].right  = x + w/2;
        boxes[i].top    = y - h/2;
        boxes[i].bottom = y + h/2; 

    }
    correct_boxes(boxes, count, dx, dy, sx, sy, flip);

    for (i = 0; i < count; ++i) {
        x =  boxes[i].x;
        y =  boxes[i].y;
        w =  boxes[i].w;
        h =  boxes[i].h;
        id = boxes[i].id;

        if ((w < .001 || h < .001)) continue;

        truth[i*5+0] = x;
        truth[i*5+1] = y;
        truth[i*5+2] = w;
        truth[i*5+3] = h;
        truth[i*5+4] = id; draw_box_width(im,  boxes[i].x* im.w- boxes[i].w*im.w/2,  boxes[i].y*im.h- boxes[i].h*im.h/2,  boxes[i].x*im.w+ boxes[i].w*im.w/2,  boxes[i].y*im.h+ boxes[i].h*im.h/2, 4, 0.1, 0.4, 0.6);
        save_image(im,"draw");

    }
    free(boxes);
}


/***** load rotated original images******/
data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, float angle)
{
    char **random_paths = get_random_paths(paths, n, m);
    int i;
    data d = {0};
    d.shallow = 0;

    d.X.rows = n;
    d.X.vals = calloc(d.X.rows, sizeof(float*));
    d.X.cols = h*w*3;

    d.y = make_matrix(n, 5*boxes);
    for(i = 0; i < n; ++i){
        image orig0 = load_image_color(random_paths[i], 0, 0);
        float random_angle = rand_uniform(angle, angle);
        float random_angle_rad = TWO_PI*random_angle/360.0; //  degree to radian
        image orig = rotate_image_r(orig0, random_angle_rad);
        image sized = make_image(w, h, orig.c);
        fill_image(sized, .5);

        float dw = jitter * orig.w;
        float dh = jitter * orig.h;

        float new_ar = (orig.w + rand_uniform(-dw, dw)) / (orig.h + rand_uniform(-dh, dh));
        float scale = rand_uniform(.25, 2);

        float nw, nh;

        if(new_ar < 1){
            nh = scale * h;
            nw = nh * new_ar;
        } else {
            nw = scale * w;
            nh = nw / new_ar;
        }

        float dx = rand_uniform(0, w - nw);
        float dy = rand_uniform(0, h - nh);

        place_image(orig, nw, nh, dx, dy, sized);

        random_distort_image(sized, hue, saturation, exposure);
        int flip = rand()%2;
        if(flip) flip_image(sized);
        d.X.vals[i] = sized.data;

        //fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,sized);
        fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,orig, random_angle_rad);

        free_image(orig);
        free_image(orig0);
    }
    free(random_paths);
    return d;
}

// rotate image by 90 degree
image rotate_image_r(image im, float rad)
{
    int x, y, c;
    float cx = im.w/2.;
    float cy = im.h/2.;
    image rot = make_image(im.h, im.w, im.c);
    for(c = 0; c < im.c; ++c){
        for(y = 0; y < im.h; ++y){
            for(x = 0; x < im.w; ++x){
                float rx = cos(rad)*(x-cx) - sin(rad)*(y-cy) + cy;
                float ry = sin(rad)*(x-cx) + cos(rad)*(y-cy) + cx;
                float val = bilinear_interpolate(im, x, y, c);
                set_pixel(rot, rx, ry, c, val);
            }
        }
    }
    return rot;
}

AlexeyAB on 8 Feb 2018

🎉2 👍2 ❤1

All 15 comments

Hi,

All informations only here is: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4#KEY_YOLOv2 (VOC + COCO)

YOLOv2 (VOC + COCO) | YOLOv2 (VOC + COCO) | University of Washington | Joseph Redmon, Ali Farhadi | We use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/ | 2017-10-21 18:07:57
-- | -- | -- | -- | -- | --

There are no additional scientific articles explaining how the accuracy was improved in comparison with the previous Yolo v2.0 - 78.6 mAP: https://pjreddie.com/darknet/yolo/

I think there are several improvements, such as:

keep aspect ratio
training dataset includes VOC+COCO instead of only VOC
introduced parameter burn_in: https://github.com/pjreddie/darknet/blob/56be49aa4854b81b855f6a9daffce4b4ad1fbb9e/src/network.c#L93

Last Joseph's article is not about Yolo "IQA: Visual Question Answering in Interactive Environments": https://pjreddie.com/publications/

AlexeyAB on 31 Jan 2018

👍3 ❤1 🎉1

is it any method to increase the training images without handy annotating them?

I know the Darknet adds some noise or colors and makes vast number of training images in the memory, but I want to know if there is a code to make extra physical training images (by adding rotations, scales, brightness ...) and change the annotations automatically (in case or rotation or scale changes).

By this method we can make many hundreds of images from just for example 100 images.

VanitarNordic on 8 Feb 2018

@VanitarNordic my friend, you can try shifting image to the left right up down a few pixels and easy change the bounding box coordinates.
Other techniques such as rotations, scaling, ... will massively change the appearance of the object inside the image, there is no way any we can guess the changes from the original box with a interpolated one. I has this problem before and I have to manually do it.

phongnhhn92 on 8 Feb 2018

@VanitarNordic I've tried to increase my dataset 3x by rotating the images (90, 180 and 270 degrees), together with the annotations. If you are familiar with python you can use this function to do it. Input is the image, a list with all bounding boxes for the image and which method (I've used this variable so I can easily spawn 3 parallel processing using multiprocessing library and have everything process faster.
I don't know how much better the training gets to feed these rotated images, if you figure it out, feel free to share!

def rotate_image(im, bboxes, method=None):
    if not method:
        return im, bboxes
    # Turn 180 degree
    if method == 1:
        for element in bboxes:
            element[1] = 1. - element[1]
            element[2] = 1. - element[2]
        im = np.array(im[::-1, ::-1, :])
        return im, bboxes
    # Turn 90 degree
    elif method == 2:
        for element in bboxes:
            element[1] = 1. - element[1]
        im = np.array(im[::, ::-1, :])
        return im, bboxes
    # Turn 270 degree
    elif method == 3:
        for element in bboxes:
            element[2] = 1. - element[2]
        im = np.array(im[::-1, ::, :])
        return im, bboxes

TheMikeyR on 8 Feb 2018

❤1 🎉1 👍1

@TheMikeyR

thank you very much. actually I was thinking to write a python code to crop the bounding box(es), and rotate/scale it(them) inside the image and save as a new image.

VanitarNordic on 8 Feb 2018

@VanitarNordic seems like a great idea, there are many different libraries for image augmentation, e.g. imgaug, but as you mentioned these are not including the detections, since they are for networks like faster-rcnn where you train with the cropped object instead, where yolo is training with the box location and full image. Would be nice with a yolo-alternative, I don't have time to look into it now, but maybe in the future. If you make something, feel free to share :+1:

TheMikeyR on 8 Feb 2018

@VanitarNordic @phongnhhn92 @TheMikeyR

Someone offered a solution to this problem. I did not try it, but you can test it: https://groups.google.com/forum/#!searchin/darknet/rotation%7Csort:date/darknet/DPxhZcC0x2k/NBvD06urAwAJ
He said:

I trained this with COCO with/without pre-trained model yesterday, but still, the final weights is pretty bad for detection. Although the weights trained without pre-trained model is better.

He made hardcoded rotation on 90 degree.
But I change this code to the random rotation in range(-angle, +angle) that set in the cfg-file:

you should set angle from 0 to 180: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/cfg/yolo-voc.2.0.cfg#L9
You should change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L744
to this: *a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.num_boxes, a.classes, a.jitter, a.hue, a.saturation, a.exposure, a.small_object, a.angle);
change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.h#L86
to this: data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, int small_object, float angle);
change these 2 functions:
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L671
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L295

to these 3 functions:

/***** rotate truth boxes 90 degree *****/
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, image im, float rad)
{   
    float tx, ty;
    char labelpath[4096];
    find_replace(path, "images", "labels", labelpath);
    find_replace(labelpath, "JPEGImages", "labels", labelpath);

    find_replace(labelpath, "raw", "labels", labelpath);
    find_replace(labelpath, ".jpg", ".txt", labelpath);
    find_replace(labelpath, ".png", ".txt", labelpath);
    find_replace(labelpath, ".JPG", ".txt", labelpath);
    find_replace(labelpath, ".JPEG", ".txt", labelpath);
    int count = 0;
    float x,y,w,h;
    int id;
    int i;
    //float rad = TWO_PI/4;
    box_label *boxes = read_boxes(labelpath, &count);
    randomize_boxes(boxes, count);

    if(count > num_boxes) count = num_boxes;
    for (i = 0; i < count; ++i) {
        tx = boxes[i].x * im.h;
        ty = boxes[i].y * im.w;
        x = (cos(rad)*(tx-im.h/2) - sin(rad)*(ty-im.w/2) + im.w/2)/im.w;
        y = (sin(rad)*(tx-im.h/2) + cos(rad)*(ty-im.w/2) + im.h/2)/im.h;
        boxes[i].x = x;
        boxes[i].y = y;

        w =  boxes[i].h;
        h =  boxes[i].w;
        boxes[i].w = w;
        boxes[i].h = h; 

        boxes[i].left   = x - w/2;
        boxes[i].right  = x + w/2;
        boxes[i].top    = y - h/2;
        boxes[i].bottom = y + h/2; 

    }
    correct_boxes(boxes, count, dx, dy, sx, sy, flip);

    for (i = 0; i < count; ++i) {
        x =  boxes[i].x;
        y =  boxes[i].y;
        w =  boxes[i].w;
        h =  boxes[i].h;
        id = boxes[i].id;

        if ((w < .001 || h < .001)) continue;

        truth[i*5+0] = x;
        truth[i*5+1] = y;
        truth[i*5+2] = w;
        truth[i*5+3] = h;
        truth[i*5+4] = id; draw_box_width(im,  boxes[i].x* im.w- boxes[i].w*im.w/2,  boxes[i].y*im.h- boxes[i].h*im.h/2,  boxes[i].x*im.w+ boxes[i].w*im.w/2,  boxes[i].y*im.h+ boxes[i].h*im.h/2, 4, 0.1, 0.4, 0.6);
        save_image(im,"draw");

    }
    free(boxes);
}


/***** load rotated original images******/
data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, float angle)
{
    char **random_paths = get_random_paths(paths, n, m);
    int i;
    data d = {0};
    d.shallow = 0;

    d.X.rows = n;
    d.X.vals = calloc(d.X.rows, sizeof(float*));
    d.X.cols = h*w*3;

    d.y = make_matrix(n, 5*boxes);
    for(i = 0; i < n; ++i){
        image orig0 = load_image_color(random_paths[i], 0, 0);
        float random_angle = rand_uniform(angle, angle);
        float random_angle_rad = TWO_PI*random_angle/360.0; //  degree to radian
        image orig = rotate_image_r(orig0, random_angle_rad);
        image sized = make_image(w, h, orig.c);
        fill_image(sized, .5);

        float dw = jitter * orig.w;
        float dh = jitter * orig.h;

        float new_ar = (orig.w + rand_uniform(-dw, dw)) / (orig.h + rand_uniform(-dh, dh));
        float scale = rand_uniform(.25, 2);

        float nw, nh;

        if(new_ar < 1){
            nh = scale * h;
            nw = nh * new_ar;
        } else {
            nw = scale * w;
            nh = nw / new_ar;
        }

        float dx = rand_uniform(0, w - nw);
        float dy = rand_uniform(0, h - nh);

        place_image(orig, nw, nh, dx, dy, sized);

        random_distort_image(sized, hue, saturation, exposure);
        int flip = rand()%2;
        if(flip) flip_image(sized);
        d.X.vals[i] = sized.data;

        //fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,sized);
        fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,orig, random_angle_rad);

        free_image(orig);
        free_image(orig0);
    }
    free(random_paths);
    return d;
}

// rotate image by 90 degree
image rotate_image_r(image im, float rad)
{
    int x, y, c;
    float cx = im.w/2.;
    float cy = im.h/2.;
    image rot = make_image(im.h, im.w, im.c);
    for(c = 0; c < im.c; ++c){
        for(y = 0; y < im.h; ++y){
            for(x = 0; x < im.w; ++x){
                float rx = cos(rad)*(x-cx) - sin(rad)*(y-cy) + cy;
                float ry = sin(rad)*(x-cx) + cos(rad)*(y-cy) + cx;
                float val = bilinear_interpolate(im, x, y, c);
                set_pixel(rot, rx, ry, c, val);
            }
        }
    }
    return rot;
}

AlexeyAB on 8 Feb 2018

🎉2 👍2 ❤1

@AlexeyAB

Hi Alex, thank you for sharing the code.

As I see the code modifies the Darknet code in the training section and creates custom images in the memory. I'll test but a question pops up here, is it better than we make these variety of images on the hard disk or make them float in the memory?

VanitarNordic on 8 Feb 2018

@VanitarNordic Hi, I think the result should be the same.

AlexeyAB on 8 Feb 2018

@AlexeyAB

Another question remains.

if we rotate or scale just the bounding boxes, then the empty space in rotation or scaling will be filled by the black color I think by default. then the model does not "think" that remained background pieces in the bounding boxes are part of the object?

I think it could happen and we should rotate or scale the whole image not just bounding boxes. What do you think?

That might be the reason why he gets bad results by his code

VanitarNordic on 8 Feb 2018

@VanitarNordic In this code the whole image is rotated around its center (not only bounded boxes). And then bounded boxes are rotated in the same way.
He said, that he got bad result before this fix: https://groups.google.com/forum/#!msg/darknet/DPxhZcC0x2k/ZUHiKDL_AwAJ

And yes, even if we rotated whole image, then the angles or edges of the image will be black, but with this the neural network meets with black edges even without rotation, for example, during the using of the padding - black (value 0) at the edges: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/im2col.c#L6-L10
And as we see, it works successfully with padding.

AlexeyAB on 8 Feb 2018

@AlexeyAB

Yes, when we pad, rotate or scale the Whole Image, then no problem. My comment was about if we do these operations just on bounding boxes.

Thanks.

VanitarNordic on 8 Feb 2018

Thanks for the share, detailed instructions and your time @AlexeyAB :+1:

TheMikeyR on 9 Feb 2018

@AlexeyAB

I faced two errors when I was trying to compile with the new codes:

image orig = rotate_image_r(orig0, random_angle_rad);
image rotate_image_r(image im, float rad)

2018-02-12_22-13-02

VanitarNordic on 12 Feb 2018

@AlexeyAB

I moved the image rotate_image_r(image im, float rad) to the up of the data load_data_detection function and commented the place_image(orig, nw, nh, dx, dy, sized); function to remove the errors.

The code compiled and I started training. Besides the function does not read the angle value from the cfg file and it is always zero. (you can verify it by a printf test) . Therefore I defined the range from -180 to 180 inside the function itself.

The results got worse. Most likely there is a bug in the code. I'll try to do this by writing a python code and do the rotation operation of real images on the hard disk and keep the Darknet code intact.

VanitarNordic on 13 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings