Yolov5: how to convert the outputs of the yolov5.onnx to boxes ,labels and scores .

Created on 11 Aug 2020 · 14Comments · Source: ultralytics/yolov5

❔Question

Hi buddy ,can you help me to explain the outputs of the onnx model ? I don't know how to convert the outputs to boxes ,labels and scores .
I use netron to display this onnx model .
outputs:
name: classes
type: float32[1,3,80,80,85]

why the type are five dimensions? how to convert them to detection task result ?
thanks.

question

Source

JiaoPaner

Most helpful comment

@yongjingli you can go to see #343, this issue solved my problem.I recoded the non_max_suppression in yolov5/utils/general.py into c++ version with yolov5s.onnx (in export.py ,I set model.model[-1].export = False). The main output analysis code as follows:

    float* output = output_tensor[0].GetTensorMutableData<float>(); // output of onnx runtime ->>> 1,25200,85
    size_t size = output_tensor[0].GetTensorTypeAndShapeInfo().GetElementCount(); // 1x25200x85=2142000
    int dimensions = 85; // 0,1,2,3 ->box,4->confidence，5-85 -> coco classes confidence 
    int rows = size / dimensions; //25200
    int confidenceIndex = 4;
    int labelStartIndex = 5;
    float modelWidth = 640.0;
    float modelHeight = 640.0;
    float xGain = modelWidth / image.width;
    float yGain = modelHeight / image.height;

    std::vector<cv::Vec4f> locations;
    std::vector<int> labels;
    std::vector<float> confidences;

    std::vector<cv::Rect> src_rects;
    std::vector<cv::Rect> res_rects;
    std::vector<int> res_indexs;

    cv::Rect rect;
    cv::Vec4f location;
    for (int i = 0; i < rows; ++i) {
        int index = i * dimensions;
        if(output[index+confidenceIndex] <= 0.4f) continue;

        for (int j = labelStartIndex; j < dimensions; ++j) {
            output[index+j] = output[index+j] * output[index+confidenceIndex];
        }

        for (int k = labelStartIndex; k < dimensions; ++k) {
            if(output[index+k] <= 0.5f) continue;

            location[0] = (output[index] - output[index+2] / 2) / xGain;//top left x
            location[1] = (output[index + 1] - output[index+3] / 2) / yGain;//top left y
            location[2] = (output[index] + output[index+2] / 2) / xGain;//bottom right x
            location[3] = (output[index + 1] + output[index+3] / 2) / yGain;//bottom right y

            locations.emplace_back(location);

            rect = cv::Rect(location[0], location[1],
                            location[2] - location[0], location[3] - location[1]);
            src_rects.push_back(rect);
            labels.emplace_back(k-labelStartIndex);


            confidences.emplace_back(output[index+k]);
        }

    }
    utils::nms(src_rects,res_rects,res_indexs);

    cJSON  *result = cJSON_CreateObject(), *items = cJSON_CreateArray();
    for (int i = 0; i < res_indexs.size(); ++i) {
        cJSON  *item = cJSON_CreateObject();
        int index = res_indexs[i];
        cJSON_AddStringToObject(item, "label", classes[labels[index]].c_str());
        cJSON_AddNumberToObject(item,"score",confidences[index]);
        cJSON  *location = cJSON_CreateObject();
        cJSON_AddNumberToObject(location,"x",locations[index][0]);
        cJSON_AddNumberToObject(location,"y",locations[index][1]);
        cJSON_AddNumberToObject(location,"width",locations[index][2] - locations[index][0]);
        cJSON_AddNumberToObject(location,"height",locations[index][3] - locations[index][1]);
        cJSON_AddItemToObject(item,"location",location);
        cJSON_AddItemToArray(items,item);
    }
    cJSON_AddNumberToObject(result, "code", 0);
    cJSON_AddStringToObject(result, "msg", "success");
    cJSON_AddItemToObject(result, "data", items);
    char *resultJson = cJSON_PrintUnformatted(result);
    return resultJson;

void utils::nms(const std::vector<cv::Rect> &srcRects, std::vector<cv::Rect> &resRects, std::vector<int> &resIndexs,float thresh) {
    resRects.clear();
    const size_t size = srcRects.size();
    if (!size) return;
    // Sort the bounding boxes by the bottom - right y - coordinate of the bounding box
    std::multimap<int, size_t> idxs;
    for (size_t i = 0; i < size; ++i){
        idxs.insert(std::pair<int, size_t>(srcRects[i].br().y, i));
    }
    // keep looping while some indexes still remain in the indexes list
    while (idxs.size() > 0){
        // grab the last rectangle
        auto lastElem = --std::end(idxs);
        const cv::Rect& last = srcRects[lastElem->second];
        resIndexs.push_back(lastElem->second);
        resRects.push_back(last);
        idxs.erase(lastElem);
        for (auto pos = std::begin(idxs); pos != std::end(idxs); ){
            // grab the current rectangle
            const cv::Rect& current = srcRects[pos->second];
            float intArea = (last & current).area();
            float unionArea = last.area() + current.area() - intArea;
            float overlap = intArea / unionArea;
            // if there is sufficient overlap, suppress the current bounding box
            if (overlap > thresh) pos = idxs.erase(pos);
            else ++pos;
        }
    }
}

JiaoPaner on 15 Aug 2020

🚀1 👍1

All 14 comments

I think you should look for the output from the non_max_suppression, which is called 'pred' in detect.py. It has the form of (x1, y1, x2, y2, conf, cls). You can arrange the elements in it in whatever way you like and write it into txt or json.

NosremeC on 12 Aug 2020

Thanks.I found the method too, maybe I'll recode it in c++ because of using onnx runtime c++ version.

JiaoPaner on 12 Aug 2020

hello, @JiaoPaner , @NosremeC I also want to do the same work to use onnnx runtime c++ version, but I met some problmes with Detect in yolo.py. After I set self.training==False, I don't know why I still get a output of x but not (torch.cat(z, 1), x).

this is some code in yolo.py:

         if not self.training:  # inference
            if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                self.grid[i] = self._make_grid(nx, ny).to(x[i].device)

            y = x[i].sigmoid()
            y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i]  # xy
            y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
            z.append(y.view(bs, -1, self.no))

    return x if self.training else (torch.cat(z, 1), x)

yongjingli on 12 Aug 2020

    float* output = output_tensor[0].GetTensorMutableData<float>(); // output of onnx runtime ->>> 1,25200,85
    size_t size = output_tensor[0].GetTensorTypeAndShapeInfo().GetElementCount(); // 1x25200x85=2142000
    int dimensions = 85; // 0,1,2,3 ->box,4->confidence，5-85 -> coco classes confidence 
    int rows = size / dimensions; //25200
    int confidenceIndex = 4;
    int labelStartIndex = 5;
    float modelWidth = 640.0;
    float modelHeight = 640.0;
    float xGain = modelWidth / image.width;
    float yGain = modelHeight / image.height;

    std::vector<cv::Vec4f> locations;
    std::vector<int> labels;
    std::vector<float> confidences;

    std::vector<cv::Rect> src_rects;
    std::vector<cv::Rect> res_rects;
    std::vector<int> res_indexs;

    cv::Rect rect;
    cv::Vec4f location;
    for (int i = 0; i < rows; ++i) {
        int index = i * dimensions;
        if(output[index+confidenceIndex] <= 0.4f) continue;

        for (int j = labelStartIndex; j < dimensions; ++j) {
            output[index+j] = output[index+j] * output[index+confidenceIndex];
        }

        for (int k = labelStartIndex; k < dimensions; ++k) {
            if(output[index+k] <= 0.5f) continue;

            location[0] = (output[index] - output[index+2] / 2) / xGain;//top left x
            location[1] = (output[index + 1] - output[index+3] / 2) / yGain;//top left y
            location[2] = (output[index] + output[index+2] / 2) / xGain;//bottom right x
            location[3] = (output[index + 1] + output[index+3] / 2) / yGain;//bottom right y

            locations.emplace_back(location);

            rect = cv::Rect(location[0], location[1],
                            location[2] - location[0], location[3] - location[1]);
            src_rects.push_back(rect);
            labels.emplace_back(k-labelStartIndex);


            confidences.emplace_back(output[index+k]);
        }

    }
    utils::nms(src_rects,res_rects,res_indexs);

    cJSON  *result = cJSON_CreateObject(), *items = cJSON_CreateArray();
    for (int i = 0; i < res_indexs.size(); ++i) {
        cJSON  *item = cJSON_CreateObject();
        int index = res_indexs[i];
        cJSON_AddStringToObject(item, "label", classes[labels[index]].c_str());
        cJSON_AddNumberToObject(item,"score",confidences[index]);
        cJSON  *location = cJSON_CreateObject();
        cJSON_AddNumberToObject(location,"x",locations[index][0]);
        cJSON_AddNumberToObject(location,"y",locations[index][1]);
        cJSON_AddNumberToObject(location,"width",locations[index][2] - locations[index][0]);
        cJSON_AddNumberToObject(location,"height",locations[index][3] - locations[index][1]);
        cJSON_AddItemToObject(item,"location",location);
        cJSON_AddItemToArray(items,item);
    }
    cJSON_AddNumberToObject(result, "code", 0);
    cJSON_AddStringToObject(result, "msg", "success");
    cJSON_AddItemToObject(result, "data", items);
    char *resultJson = cJSON_PrintUnformatted(result);
    return resultJson;

void utils::nms(const std::vector<cv::Rect> &srcRects, std::vector<cv::Rect> &resRects, std::vector<int> &resIndexs,float thresh) {
    resRects.clear();
    const size_t size = srcRects.size();
    if (!size) return;
    // Sort the bounding boxes by the bottom - right y - coordinate of the bounding box
    std::multimap<int, size_t> idxs;
    for (size_t i = 0; i < size; ++i){
        idxs.insert(std::pair<int, size_t>(srcRects[i].br().y, i));
    }
    // keep looping while some indexes still remain in the indexes list
    while (idxs.size() > 0){
        // grab the last rectangle
        auto lastElem = --std::end(idxs);
        const cv::Rect& last = srcRects[lastElem->second];
        resIndexs.push_back(lastElem->second);
        resRects.push_back(last);
        idxs.erase(lastElem);
        for (auto pos = std::begin(idxs); pos != std::end(idxs); ){
            // grab the current rectangle
            const cv::Rect& current = srcRects[pos->second];
            float intArea = (last & current).area();
            float unionArea = last.area() + current.area() - intArea;
            float overlap = intArea / unionArea;
            // if there is sufficient overlap, suppress the current bounding box
            if (overlap > thresh) pos = idxs.erase(pos);
            else ++pos;
        }
    }
}

JiaoPaner on 15 Aug 2020

🚀1 👍1

@JiaoPaner If I need C# version, Is there a C# version available? Thanks a lot.

ricklina90 on 19 Aug 2020

@ricklina90 you just recode above c++ code into c# code.

JiaoPaner on 19 Aug 2020

@JiaoPaner After I recode c++ code into c# code, It works fine. Thanks you.

ricklina90 on 25 Aug 2020

is there a way to reshape this to [255, 20, 20],etc?

spicybeef003 on 4 Sep 2020

My onnx session outputs (1, 25200, 11) but non_max_suppression outputs
torch.Size([300, 6]) (6 classes). Why does it have shape 300? how to convert this to the x, y coordinates?

JonathanLehner on 4 Sep 2020

@JonathanLehner 300 means the number of boxes detected, 6 is the center_x, center_y, w, h, score, cls_id

Edwardmark on 23 Sep 2020

@JiaoPaner I have couple of queries regarding your c++ implementation.

a) You are considering only 1 output layer when there are 3 in total. Does considering the bounding boxes from output layer having smallest stride is sufficient?

b) In your box calculation you haven't used any sigmoid function, anchor or stride length. How are you getting the box dimension correctly?

kafan1986 on 5 Oct 2020

@kafan1986
In export.py ,I set model.model[-1].export = False.
Using netron to display this onnx model :

here are 4 outputs,but we need only first output which name is "output". you needn't use any sigmoid function anymore.

JiaoPaner on 15 Oct 2020

❔Question

Hi buddy ,can you help me to explain the outputs of the onnx model ? I don't know how to convert the outputs to boxes ,labels and scores .
I use netron to display this onnx model .
outputs:
name: classes
type: float32[1,3,80,80,85]

name: boxes
type: float32[1,3,40,40,85]

name: 444
type: float32[1,3,20,20,85]

why the type are five dimensions? how to convert them to detection task result ?
thanks.

Hi, have you managed to export onnx model? I tried to do "torch.onnx.export(model,img,"yolos.onnx")" but I got error "Exporting the operator hardswish to ONNX opset version 9 is not supported. Please open a bug to request ONNX export support for the missing operator." I am stuck with this problem for a while.

Jiang15 on 16 Oct 2020

@Jiang15 set opset_version=12 .
torch.onnx.export(model, img, img, verbose=False, opset_version=12, input_names=['image'],output_names= ['output'])

JiaoPaner on 17 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings