Darknet: How to apply yolov3 to a image batch of size 2?

Created on 23 May 2018 · 30Comments · Source: AlexeyAB/darknet

Hi,

I would like to apply yolov3 to 2 images for detection.
I found a code snippet from issue #483 . But the API has changed a lot since then.
Could you show how to do that?

thx.

want enhancement

Source

panda9095

Most helpful comment

Finally got it working.

From YOLODLL_API Detector::Detector constructor of yolo_v2_class.cpp of yolo_cpp_dll project, specify the batch size (tried _net.batch = batch_size_ elsewhere, didn't work):

net = parse_network_cfg_custom(cfgfile, batch_size);

Add/edit the following lines in bold to the previous post:

float *X = (float *)calloc(net.batch*net.h*net.w * 3, sizeof(float));
for (int i = 0; i < net.batch; ++i) {
    image im;
    im.c = imgs[i].c;
    im.data = imgs[i].data;
    im.h = imgs[i].h;
    im.w = imgs[i].w;
    image sized;
    if (net.w == im.w && net.h == im.h) {
        sized = make_image(im.w, im.h, im.c);
        memcpy(sized.data, im.data, im.w*im.h*im.c * sizeof(float));
    }
    else
        sized = resize_image(im, net.w, net.h);
    memcpy(X + i * net.h*net.w * 3, sized.data, net.h*net.w * 3 * sizeof(float));
        free(sized.data); //fixed memory leak
}

float *prediction = network_predict(net, X);
layer l = net.layers[net.n - 1];

box *boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
float **probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
for (int j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes, sizeof(float *));

std::vector<std::vector<bbox_t>> bbox_vec_batch;
for (int j = 0; j < net.batch; ++j) {
    std::vector<bbox_t> bbox_vec;
    get_region_boxes(l, 1, 1, thresh, probs, boxes, 0, 0);
    if (nms) do_nms(boxes, probs, l.w*l.h*l.n, l.classes, nms);
    for (int k = 0; k < l.w*l.h*l.n; ++k) {
        int const obj_id = max_index(probs[k], l.classes);
        float const prob = probs[k][obj_id];
        if (prob > thresh) {
            bbox_t bbox;
            if (boxes[k].w > 1) {
                bbox.x = 0;
                bbox.w = imgs[j].w;
            }
            else {
                float w = boxes[k].w * imgs[j].w;
                bbox.x = round(boxes[k].x * imgs[j].w - w / 2);
                bbox.w = w;
            }
            if (boxes[k].h > 1) {
                bbox.y = 0;
                bbox.h = imgs[j].h;
            }
            else {
                float h = boxes[k].h * imgs[j].h;
                bbox.y = round(boxes[k].y * imgs[j].h - h / 2);
                bbox.h = h;
            }
            bbox.obj_id = obj_id;
            bbox.prob = prob;
            bbox.track_id = 0;
            bbox_vec.push_back(bbox);
        }
    }
    bbox_vec_batch.push_back(bbox_vec);
    l.output += l.h*l.w*l.n*(l.classes + l.coords + 1); //fixed stepping issue
}
free(boxes);
free_ptrs((void **)probs, l.w*l.h*l.n);
free(X);

Benchmarked on 200 samples (second last column and last column represent average and total run time in second, respectively):

NVIDIA Quadro K4200:
batch size = 2
cpu                             8.25% (1.25)
mem                             3110.154MB
predict                       100      0.136870     13.687000
loadimg                       200      0.004135      0.827000
main                            1     14.531000     14.531000
batch size = 1
cpu                             8.44% (2.23)
mem                             2485.166MB
predict                       200      0.080695     16.139000
loadimg                       200      0.004330      0.866000
main                            1     17.021000     17.021000

NVIDIA Geforce GTX 1080:
batch size = 2
cpu                             9.07% (5.01)
mem                             3818.476MB
predict                       100      0.034790      3.479000
loadimg                       200      0.003320      0.664000
main                            1      4.158000      4.158000
batch size = 1
cpu                             8.36% (5.65)
mem                             3186.131MB
predict                       200      0.022865      4.573000
loadimg                       200      0.004425      0.885000
main                            1      5.471000      5.471000

Thanks @AlexeyAB @panda9095 @saihv

jstumpin on 4 Jul 2018

👍2 😕1

All 30 comments

@AlexeyAB @saihv

+1.

I reckon YOLO could benefit from such an implementation as exemplified here. Following #483 has led me to a brick wall - stuck at figuring out the right 'stepping', none of these work:

- net.h*net.w*3
- l.h*l.w*l.n
- l.h*l.w*l.n*(l.classes+l.coords+1)
- l.output+(l.w*l.h*l.c)

The last bit is from saihv's. l.c is always '0' in my case anyway...

Here's what I have so far:

net.batch = imgs.size();
float *X = (float *)calloc(net.batch*net.h*net.w * 3, sizeof(float));
for (int i = 0; i < net.batch; ++i) {
    image im;
    im.c = imgs[i].c;
    im.data = imgs[i].data;
    im.h = imgs[i].h;
    im.w = imgs[i].w;
    image sized;
    if (net.w == im.w && net.h == im.h) {
        sized = make_image(im.w, im.h, im.c);
        memcpy(sized.data, im.data, im.w*im.h*im.c * sizeof(float));
    }
    else
        sized = resize_image(im, net.w, net.h);
    memcpy(X + i * net.h*net.w * 3, sized.data, net.h*net.w * 3 * sizeof(float));
}

float *prediction = network_predict(net, X);
layer l = net.layers[net.n - 1];

box *boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
float **probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
for (int j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes, sizeof(float *));

std::vector<std::vector<bbox_t>> bbox_vec_batch;
for (int j = 0; j < net.batch; ++j) {
    std::vector<bbox_t> bbox_vec;
    get_region_boxes(l, 1, 1, thresh, probs, boxes, 0, 0);
    if (nms) do_nms(boxes, probs, l.w*l.h*l.n, l.classes, nms);
    for (int k = 0; k < l.w*l.h*l.n; ++k) {
        int const obj_id = max_index(probs[k], l.classes);
        float const prob = probs[k][obj_id];
        if (prob > thresh) {
            bbox_t bbox;
            if (boxes[k].w > 1) {
                bbox.x = 0;
                bbox.w = imgs[j].w;
            }
            else {
                float w = boxes[k].w * imgs[j].w;
                bbox.x = round(boxes[k].x * imgs[j].w - w / 2);
                bbox.w = w;
            }
            if (boxes[k].h > 1) {
                bbox.y = 0;
                bbox.h = imgs[j].h;
            }
            else {
                float h = boxes[k].h * imgs[j].h;
                bbox.y = round(boxes[k].y * imgs[j].h - h / 2);
                bbox.h = h;
            }
            bbox.obj_id = obj_id;
            bbox.prob = prob;
            bbox.track_id = 0;
            bbox_vec.push_back(bbox);
        }
    }
    bbox_vec_batch.push_back(bbox_vec);
    l.output += 0; //unsolved stepping mystery
}
free(boxes);
free_ptrs((void **)probs, l.w*l.h*l.n);
free(X);

In any of the 4 proposed solutions, the output is only valid for the first element in the batch. Hence, 'stepping' is highly likely to be the missing ingredient. For the record, as far as batch-scalability is concerned, based on my personal benchmarking on SSD, evidently there's a considerable performance hike, moving up from Quadro K4200 to GeForce GTX 1080.

jstumpin on 25 May 2018

It seems the current version of yolov3 cannot detect 2 images simultaneously.
Could multi-thread technique be a solution if I would like to speed up the detection for 2 images? Any reference related? thx. @AlexeyAB

panda9095 on 25 May 2018

How about some hack using opencv or something like - concatenate both the images to a single image, detect and split the image?

kmsravindra on 25 May 2018

@panda9095

It seems the current version of yolov3 cannot detect 2 images simultaneously.
Could multi-thread technique be a solution if I would like to speed up the detection for 2 images? Any reference related? thx. @AlexeyAB

I think no. Only big batch can significantly accelerate detection.

@kmsravindra

How about some hack using opencv or something like - concatenate both the images to a single image, detect and split the image?

I think this will reduce the accuracy. Field of vision of each final activation will see context (part) of other images.

AlexeyAB on 25 May 2018

@jstumpin

The last bit is from saihv's. l.c is always '0' in my case anyway...

l.output += 0; //unsolved stepping mystery

Use l.outputs

Try to use l.output = l.output + l.outputs; instead of l.output = l.output + (l.w*l.h*l.c);

AlexeyAB on 25 May 2018

My project is to detect two ROI of a image simultaneously and in real time. The desirable fps is about 10. But the model runs about 50~60ms for one ROI. Would multi-thread prediction work in this case? thx. @AlexeyAB

panda9095 on 25 May 2018

@panda9095 I think yes.
Just try to run two instances of Darknet yolo in 2 separate terminals on the same PC.

AlexeyAB on 25 May 2018

@AlexeyAB
In addition to shifting these:

box *boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
float **probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
for (int j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes, sizeof(float *));
.
.
.
free(boxes);
free_ptrs((void **)probs, l.w*l.h*l.n);
free(X);

inside/outside of the loop, using l.outputs too does not bear well. On the contrary, it yields unstable outcome (sometimes zero detection, sometimes a bunch of noisy detections) for every execution of the program. On the other hand, with l.w*l.h*l.c I would get the first element of the batch correct (the rest are just clones of the first, hence wrong detections), but at least the accuracy is persistent every time.

jstumpin on 28 May 2018

Finally got it working.

From YOLODLL_API Detector::Detector constructor of yolo_v2_class.cpp of yolo_cpp_dll project, specify the batch size (tried _net.batch = batch_size_ elsewhere, didn't work):

net = parse_network_cfg_custom(cfgfile, batch_size);

Add/edit the following lines in bold to the previous post:

float *X = (float *)calloc(net.batch*net.h*net.w * 3, sizeof(float));
for (int i = 0; i < net.batch; ++i) {
    image im;
    im.c = imgs[i].c;
    im.data = imgs[i].data;
    im.h = imgs[i].h;
    im.w = imgs[i].w;
    image sized;
    if (net.w == im.w && net.h == im.h) {
        sized = make_image(im.w, im.h, im.c);
        memcpy(sized.data, im.data, im.w*im.h*im.c * sizeof(float));
    }
    else
        sized = resize_image(im, net.w, net.h);
    memcpy(X + i * net.h*net.w * 3, sized.data, net.h*net.w * 3 * sizeof(float));
        free(sized.data); //fixed memory leak
}

float *prediction = network_predict(net, X);
layer l = net.layers[net.n - 1];

box *boxes = (box *)calloc(l.w*l.h*l.n, sizeof(box));
float **probs = (float **)calloc(l.w*l.h*l.n, sizeof(float *));
for (int j = 0; j < l.w*l.h*l.n; ++j) probs[j] = (float *)calloc(l.classes, sizeof(float *));

std::vector<std::vector<bbox_t>> bbox_vec_batch;
for (int j = 0; j < net.batch; ++j) {
    std::vector<bbox_t> bbox_vec;
    get_region_boxes(l, 1, 1, thresh, probs, boxes, 0, 0);
    if (nms) do_nms(boxes, probs, l.w*l.h*l.n, l.classes, nms);
    for (int k = 0; k < l.w*l.h*l.n; ++k) {
        int const obj_id = max_index(probs[k], l.classes);
        float const prob = probs[k][obj_id];
        if (prob > thresh) {
            bbox_t bbox;
            if (boxes[k].w > 1) {
                bbox.x = 0;
                bbox.w = imgs[j].w;
            }
            else {
                float w = boxes[k].w * imgs[j].w;
                bbox.x = round(boxes[k].x * imgs[j].w - w / 2);
                bbox.w = w;
            }
            if (boxes[k].h > 1) {
                bbox.y = 0;
                bbox.h = imgs[j].h;
            }
            else {
                float h = boxes[k].h * imgs[j].h;
                bbox.y = round(boxes[k].y * imgs[j].h - h / 2);
                bbox.h = h;
            }
            bbox.obj_id = obj_id;
            bbox.prob = prob;
            bbox.track_id = 0;
            bbox_vec.push_back(bbox);
        }
    }
    bbox_vec_batch.push_back(bbox_vec);
    l.output += l.h*l.w*l.n*(l.classes + l.coords + 1); //fixed stepping issue
}
free(boxes);
free_ptrs((void **)probs, l.w*l.h*l.n);
free(X);

Benchmarked on 200 samples (second last column and last column represent average and total run time in second, respectively):

NVIDIA Quadro K4200:
batch size = 2
cpu                             8.25% (1.25)
mem                             3110.154MB
predict                       100      0.136870     13.687000
loadimg                       200      0.004135      0.827000
main                            1     14.531000     14.531000
batch size = 1
cpu                             8.44% (2.23)
mem                             2485.166MB
predict                       200      0.080695     16.139000
loadimg                       200      0.004330      0.866000
main                            1     17.021000     17.021000

NVIDIA Geforce GTX 1080:
batch size = 2
cpu                             9.07% (5.01)
mem                             3818.476MB
predict                       100      0.034790      3.479000
loadimg                       200      0.003320      0.664000
main                            1      4.158000      4.158000
batch size = 1
cpu                             8.36% (5.65)
mem                             3186.131MB
predict                       200      0.022865      4.573000
loadimg                       200      0.004425      0.885000
main                            1      5.471000      5.471000

Thanks @AlexeyAB @panda9095 @saihv

jstumpin on 4 Jul 2018

👍2 😕1

@jstumpin Doesn't bother to normalize the image data like the pj version do?
im.data[kwh+iw+j] = data[istep+j*c+k]/255

wait1988 on 9 Aug 2018

@wait1988
We do bothered. It's being normalized via imdecode (if one uses OpenCV) or load_image_stb (if otherwise).

jstumpin on 9 Aug 2018

@jstumpin Ok,I see,I'll try it now.

wait1988 on 9 Aug 2018

@jstumpin I use the pj version,and add the above code you provided.It doesn't work.
CUDA Error:invalid argument.

wait1988 on 9 Aug 2018

@wait1988
Not sure how it work on the original repo@pj version but it's quite hard to get a CUDA error with the above code. The worse you can get is either wrong detections or no detection at all (you can reproduce such mishaps by setting parse_network_cfg_custom(cfgfile, 1) or enabling set_batch_network(&net, 1) despite loading >1 images; in the constructor of yolo_v2_class.cpp). The code will break though (program will stall, but no CUDA error) if preassigned network batch size != current image batch size (e.g.: final image batch < network batch).

jstumpin on 11 Aug 2018

The code will break though (program will stall, but no CUDA error) if preassigned network batch size != current image batch size (e.g.: final image batch < network batch).

The cause for the above glitch:
https://github.com/pjreddie/darknet/issues/915#issue-336229064

The potential solution:

insert:

set_batch_network(&net, batch_size);
prior to:
float *X = (float *)calloc(net.batch*net.h*net.w * 3, sizeof(float));
where batch_size is the number of current image batch.

patch set_batch_network accordingly:

@@ -362,7 +362,9 @@ void set_batch_network(network *net, int b)
        net->layers[i].batch = b;
#ifdef CUDNN
        if(net->layers[i].type == CONVOLUTIONAL){
            cudnn_convolutional_setup(net->layers + i, cudnn_fastest);
            layer *l = net->layers + i;
            cudnn_convolutional_setup(l, cudnn_fastest);
            l->workspace_size = get_workspace_size(*l);

Thanks @fsaxen @AlexeyAB

jstumpin on 17 Aug 2018

Finally got it working.

From YOLODLL_API Detector::Detector constructor of yolo_v2_class.cpp of yolo_cpp_dll project, specify the batch size (tried net.batch = batch_size elsewhere, didn't work):

The solution by @jstumpin works for yolov2. But for yolov3, the strategy to compute bboxes from network output is quite different. It depends on not only net.layers[net.n-1], but also other layers with type YOLO(this layer type only exists in yolov3).

My solution to work with batch detection on yolov3 is as follows. All input images have been resized(to network size) and normalized.

    // assume channel 3 
    // img_ptrs is of type std::vector< std::shared_ptr<image_t> > to 
    //    properly transfer image data. 
    float *X = (float*)calloc(net.batch*net.w*net.h*3,sizeof(float));
    for(int i=0;i<net.batch;i++)
    {
        image im; 
        im.c = img_ptrs[i]->c;
        im.w = img_ptrs[i]->w; 
        im.h = img_ptrs[i]->h; 
        im.data = img_ptrs[i]->data; 
        image sized; 
        if(net.w==im.w && net.h==im.h)
        {
            sized = make_image(im.w,im.h,im.c); 
            memcpy(sized.data, im.data, im.w*im.h*im.c*sizeof(float)); 
        }
        else sized = resize_image(im, net.w, net.h); 
        memcpy(X+i*net.h*net.w*3, sized.data, net.h*net.w*3*sizeof(float)); 

        free(sized.data); 
    } 

    // predict 
    network_predict(net, X); 
    // layer l = net.layers[net.n-1]; 
    // get bbox
    std::vector< std::vector<bbox_t> > bbox_vec_batch; 
    for(int j=0;j<net.batch;j++)
    {
        int nboxes = 0; 
        int letterbox = 0; 
        float hier_thresh = 0.5; 
        int nms=0.4;
        detection* dets = get_network_boxes(&net,img_ptrs[j]->w,img_ptrs[j]->h,
                                             thresh, hier_thresh,
                                             0,1,&nboxes, letterbox);
        do_nms_sort(dets, nboxes, l.classes,nms); 

        std::vector<bbox_t> bbox_vec; 
        for(int i=0;i<nboxes;++i)
        {
            box b = dets[i].bbox;  
            const int obj_id = max_index(dets[i].prob, l.classes); 
            const float prob = dets[i].prob[obj_id];
            if(prob>thresh) // thresh is given
            {
                bbox_t bbox; 
                bbox.x = std::max((double)0,(b.x-b.w/2.)*img_ptrs[j]->w);
                bbox.y = std::max((double)0,(b.y-b.h/2.)*img_ptrs[j]->h);
                bbox.w = b.w*img_ptrs[j]->w;
                bbox.h = b.h*img_ptrs[j]->h; 
                bbox.obj_id = obj_id;
                bbox.prob = prob;
                bbox.track_id = 0;

                bbox_vec.push_back(bbox);
            }
        }
        bbox_vec_batch.push_back(bbox_vec); 
        free_detections(dets, nboxes); 
        // stepping 
        for(int j=0;j<net.n;j++)
        {
            layer& temp_l = net.layers[j];
            if(temp_l.type==YOLO || temp_l.type==REGION || temp_l.type==DETECTION)
            {
                // temp_l.output += temp_l.h*temp_l.w*temp_l.n*(temp_l.classes + temp_l.coords + 1);
                temp_l.output = temp_l.output + temp_l.outputs; 
            }
        }
    }
    for(int j=0;j<net.n;j++) // reset layer output pointer for free.
    {
        layer& temp_l = net.layers[j];
        if(temp_l.type==YOLO || temp_l.type==REGION || temp_l.type==DETECTION)
        {
            for(int i=0;i<net.batch;i++)
            temp_l.output = temp_l.output - temp_l.outputs; 
        }
    }
    if(X)
    free(X);

The main body of my solution is similar to that of @jstumpin 's solution. And thanks @jstumpin @AlexeyAB For sharing their code.

anl13 on 20 Aug 2018

👍1

My solution (above) works, but when I test it on multiple images, only the first image gives perfect detection. Same thing happens to @jstumpin 's solution. Could someone explain that?

anl13 on 20 Aug 2018

Sorry I haven't been following this all along, but:

In the snippet by @jstumpin, l.output += l.h*l.w*l.n*(l.classes + l.coords + 1);

In YOLOv2, l.n*(l.classes + l.coords + 1) should be the same as l.c (see here); which is what I originally used in my version. And note that in v2, l.outputs is essentially the same as l.h*l.w*l.c.

In YOLOv3, definition of l.outputs changes between YOLO, region and detection layers, may be that's a cause for concern? If only the first image is giving you right detections, it means that the step size is wrong somehow.

saihv on 21 Aug 2018

I finally get it working. First of all, all snippets above are correct.

The truth is, in my image batch, all images are with different size and totally different backgrounds. They are randomly downloaded from Google.

When I use images from a same video sequence(thus with same size), it works well.

I could not figure out the detail reason, but it works now. And thanks @saihv for your kind advice.

anl13 on 23 Aug 2018

For what ever reason, the glaring context of this thread has eluded my conscience. Unsurprisingly, the 'solution' only fits YOLOv2.

Unfortunately, I cannot reproduce the snippets contributed by @anl13. The process stalled momentarily at get_network_boxes and then exited. It works for YOLOv2 just dandy.

Adapting my YOLOv2 codes to @anl13's based on @saihv's neat explanations leads to nowhere:

   for (int k = 0; k < net.n; ++k) {
       layer temp_l = net.layers[k];
       if (temp_l.type == YOLO || temp_l.type == REGION || temp_l.type == DETECTION)
            l.output += l.outputs;
   }

However, at least the process churns out inexplicable figures until the final batch - a classic sign of wrong step size. Actually, I don't have to add that new loop to 'achieve' gibberish outcome, existing codes already could.

Any pointers?

jstumpin on 27 Aug 2018

That's a bit confusing @jstumpin . I could get correct results on both yolov3 and yolov2 with my snippets. I have tested my code on thousands of frames with batch size 2 and 4. So I think l.outputs is the right stepping.
I haven't check all the details of these layers thoroughly, so I couldn't tell what's wrong with your results.

By the way, I add some lines of code yesterday in my snippets:

for(int j=0;j<net.n;j++) // reset layer output pointer for free.
    {
        layer& temp_l = net.layers[j];
        if(temp_l.type==YOLO || temp_l.type==REGION || temp_l.type==DETECTION)
        {
            for(int i=0;i<net.batch;i++)
            temp_l.output = temp_l.output - temp_l.outputs; 
        }
    }

These lines are used to reset l.output to original pointer position, to avoid some weird segmentation fault.

anl13 on 27 Aug 2018

@anl13 So, just to confirm, the problem was just that you had images of multiple sizes? And once you corrected that, using a step size of l.outputs on yolo, region and detection layers helped with batch detections?

(Intuitively, yes, the default approach wouldnt be amenable for images of different sizes)

saihv on 27 Aug 2018

The default yolo_v2_class.cpp has already taken care of standardization of image sizes within Detector::detect function body as shown in various snippets above. In fact, resizing has already taken place in functions defined within yolo_v2_class.hpp prior to the function in question. Speaking of function, I assume @anl13 is also using the same constructor as yolo_v2_class.cpp for parsing/loading the config/weights? If so, it would add to the mystery as to why am I the only one facing the 'stalling' curse on YOLOv3 using his snippets. Omitting set_batch_network(&net, batch_size); which I suggested earlier to handle dynamic batch sizes, changes nothing.

jstumpin on 28 Aug 2018

I'd like to do batch inference using the Python wrapper.

My desired interface is:

yolo = YOLOv3(batch_size=32)

images = []
for i in range(32):
    success, image = video_capture.read()
    images.append(image)
images = np.array(images) 

results = yolo.batch_inference(images)

Where could I put @anl13 's C/C++ code to enable this?

pawarren on 28 Aug 2018

I think yes @saihv

My constructor is similar to yolo_v2_class.cpp with batch size modified @jstumpin just as former discussions suggested(It is essential to do set_batch_network(&net, batch_size) ). Here is my complete code(forked from this repository) https://github.com/anl13/darknet. Maybe helpful somehow.

@pawarren I once tried to change python wrapper, but found it a bit complicated. I think you can integrate the snippets in a C function, and export it to darknet.so library, and define corresponding data type and functions in darknet.py, then you can use it. I haven't implement that(I do not have time to work on it these days), but I think it would work.

anl13 on 28 Aug 2018

Culprit identified as if (l.batch == 2) avg_flipped_yolo(l);, FYI/FYA @AlexeyAB. Crisis averted, much kudos to @anl13 for pointing this out in his repo.

At least for my case, set_batch_network(&net, batch_size) is only needed as per original purpose - dynamic batch size.

I ditched the old ways of get_region_boxes and replacing it with get_network_boxes, again as per @anl13's snippets. The former method proved to be nearly impossible to step-size, specifically for YOLOv3.

jstumpin on 28 Aug 2018

Finally got it working.

From YOLODLL_API Detector::Detector constructor of yolo_v2_class.cpp of yolo_cpp_dll project, specify the batch size (tried net.batch = batch_size elsewhere, didn't work):

My solution to work with batch detection on yolov3 is as follows. All input images have been resized(to network size) and normalized.

    // assume channel 3 
    // img_ptrs is of type std::vector< std::shared_ptr<image_t> > to 
    //    properly transfer image data. 
    float *X = (float*)calloc(net.batch*net.w*net.h*3,sizeof(float));
    for(int i=0;i<net.batch;i++)
    {
        image im; 
        im.c = img_ptrs[i]->c;
        im.w = img_ptrs[i]->w; 
        im.h = img_ptrs[i]->h; 
        im.data = img_ptrs[i]->data; 
        image sized; 
        if(net.w==im.w && net.h==im.h)
        {
            sized = make_image(im.w,im.h,im.c); 
            memcpy(sized.data, im.data, im.w*im.h*im.c*sizeof(float)); 
        }
        else sized = resize_image(im, net.w, net.h); 
        memcpy(X+i*net.h*net.w*3, sized.data, net.h*net.w*3*sizeof(float)); 

        free(sized.data); 
    } 

    // predict 
    network_predict(net, X); 
    // layer l = net.layers[net.n-1]; 
    // get bbox
    std::vector< std::vector<bbox_t> > bbox_vec_batch; 
    for(int j=0;j<net.batch;j++)
    {
        int nboxes = 0; 
        int letterbox = 0; 
        float hier_thresh = 0.5; 
        int nms=0.4;
        detection* dets = get_network_boxes(&net,img_ptrs[j]->w,img_ptrs[j]->h,
                                             thresh, hier_thresh,
                                             0,1,&nboxes, letterbox);
        do_nms_sort(dets, nboxes, l.classes,nms); 

        std::vector<bbox_t> bbox_vec; 
        for(int i=0;i<nboxes;++i)
        {
            box b = dets[i].bbox;  
            const int obj_id = max_index(dets[i].prob, l.classes); 
            const float prob = dets[i].prob[obj_id];
            if(prob>thresh) // thresh is given
            {
                bbox_t bbox; 
                bbox.x = std::max((double)0,(b.x-b.w/2.)*img_ptrs[j]->w);
                bbox.y = std::max((double)0,(b.y-b.h/2.)*img_ptrs[j]->h);
                bbox.w = b.w*img_ptrs[j]->w;
                bbox.h = b.h*img_ptrs[j]->h; 
                bbox.obj_id = obj_id;
                bbox.prob = prob;
                bbox.track_id = 0;

                bbox_vec.push_back(bbox);
            }
        }
        bbox_vec_batch.push_back(bbox_vec); 
        free_detections(dets, nboxes); 
        // stepping 
        for(int j=0;j<net.n;j++)
        {
            layer& temp_l = net.layers[j];
            if(temp_l.type==YOLO || temp_l.type==REGION || temp_l.type==DETECTION)
            {
                // temp_l.output += temp_l.h*temp_l.w*temp_l.n*(temp_l.classes + temp_l.coords + 1);
                temp_l.output = temp_l.output + temp_l.outputs; 
            }
        }
    }
    for(int j=0;j<net.n;j++) // reset layer output pointer for free.
    {
        layer& temp_l = net.layers[j];
        if(temp_l.type==YOLO || temp_l.type==REGION || temp_l.type==DETECTION)
        {
            for(int i=0;i<net.batch;i++)
            temp_l.output = temp_l.output - temp_l.outputs; 
        }
    }
    if(X)
    free(X);

The main body of my solution is similar to that of @jstumpin 's solution. And thanks @jstumpin @AlexeyAB For sharing their code.

@anl13 I was going through your darknet repo , I am not able to see the predicted file, as well how to input multiple files , like we can do for @AlexeyAB repo. Please help

worldmovers on 5 Oct 2018

Stepping does not work as expected with any of the snippets you posted here.

keko950 on 16 Sep 2019

Has anyone tried network_predict_data_multi or network_predict_data ? Those functions are also accessible through python wrapper. But I couldn't figure out how to specify argument type _DATA_, and result type.

predict_multi_image = lib.network_predict_data_multi 
predict_multi_image.argtypes = 
predict_multi_image.restype =

enesozi on 4 Oct 2019

Hi all,
I tried to add batch inference to the existing codebase in #4099. It worked for me. Please take a look when you have time.

enesozi on 16 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Python 3.8 issues with yolo_cpp_dll.dll

Greta-A · 3Comments

one more question about fine-tuning vs transfer-learning

shootingliu · 3Comments

Detect numpy image with Python

louisondumont · 3Comments

make error undefined reference to `cvRound'

Yumin-Sun-00 · 3Comments

How to show one label only even though it detects all the objects

rezaabdullah · 3Comments