Mask_RCNN in tensorflow c++ Running model failed: Not found: FeedInputs:

Created on 31 Jan 2018 · 39Comments · Source: matterport/Mask_RCNN

I have saved the mask_rcnn model as the .pb file .But there are totally two parts of input in keras code: image and meta. I couldn't find how to feed them into the input in tensorflow c++.
This is my c++ code:

             Status run_status = session->Run(
                         {{"input_image", image_tensor},{"input_image_meta",meta_tensor}},
                         {"output_node0"}, {}, &outputs
             );

And I got an error "Running model failed: Not found: FeedInputs :unable to find feed output input_image_meta". Are there any tricks to solve the problem??
Thanks!

Source

hxw111

Most helpful comment

@ivshli I indeed was able to implement all these. Including unmold. Kind of painful. But I suppose in the spirit of sharing and good karma I'll post it below :) . It's not particularly efficient; happy to take feedback in that department.

    // given inputMat of type RGB (not BGR) / CV_8UC3 (possibly from an imread + cvtColor)
    // also given dest of type cv::Mat(inputMat.size(), CV_8UC1)
    // we trained on 256x256 , so TF_MASKRCNN_IMG_WIDTHHEIGHT = 256
    // we copied MEAN_PIXEL configs, so cv::Scalar TF_MASKRCNN_MEAN_PIXEL(123.7, 116.8, 103.9);
    // we statically defined float TF_MASKRCNN_IMAGE_METADATA[10] = {  0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 3 , 0 , 0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 0 , 0 }; 

    // Resize to square with max dim, so we can resize it to 512x512
    int largestDim = inputMat.size().height > inputMat.size().width ? inputMat.size().height : inputMat.size().width;
    cv::Mat squareInputMat(cv::Size(largestDim, largestDim), CV_8UC3);
    int leftBorder = (largestDim - inputMat.size().width) / 2;
    int topBorder = (largestDim - inputMat.size().height) / 2;
    cv::copyMakeBorder(inputMat, squareInputMat, topBorder, largestDim - (inputMat.size().height + topBorder), leftBorder, largestDim - (inputMat.size().width + leftBorder), cv::BORDER_CONSTANT, cv::Scalar(0));
    cv::Mat resizedInputMat(cv::Size(TF_MASKRCNN_IMG_WIDTHHEIGHT, TF_MASKRCNN_IMG_WIDTHHEIGHT), CV_8UC3);
    cv::resize(squareInputMat, resizedInputMat, resizedInputMat.size(), 0, 0);

    // Need to "mold_image" like in mask rcnn
    cv::Mat moldedInput(resizedInputMat.size(), CV_32FC3);
    resizedInputMat.convertTo(moldedInput, CV_32FC3);
    cv::subtract(moldedInput, TF_MASKRCNN_MEAN_PIXEL, moldedInput);

    // Move the data into the input tensor
    // remove memory copies by using code at https://github.com/tensorflow/tensorflow/issues/8033#issuecomment-332029092
    // allocate a Tensor and get pointer to memory for that Tensor, allocate a "fake" cv::Mat from it to use as a  basis to convert
    tensorflow::Tensor inputTensor(tensorflow::DT_FLOAT, {1, moldedInput.size().height, moldedInput.size().width, 3}); // single image instance with 3 channels
    float_t *p = inputTensor.flat<float_t>().data();
    cv::Mat inputTensorMat(moldedInput.size(), CV_32FC3, p);
    moldedInput.convertTo(inputTensorMat, CV_32FC3);

    // Copy the TF_MASKRCNN_IMAGE_METADATA data into a tensor
    tensorflow::Tensor inputMetadataTensor(tensorflow::DT_FLOAT, {1, TF_MASKRCNN_IMAGE_METADATA_LENGTH});
    auto inputMetadataTensorMap = inputMetadataTensor.tensor<float, 2>();
    for (int i = 0; i < TF_MASKRCNN_IMAGE_METADATA_LENGTH; ++i) {
        inputMetadataTensorMap(0, i) = TF_MASKRCNN_IMAGE_METADATA[i];
    }

    // Run tensorflow
    cv::TickMeter tm;
    tm.start();
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = tfSession->Run({{"input_image", inputTensor}, {"input_image_meta", inputMetadataTensor}},
                                                      {"output_detections", "output_mrcnn_class", "output_mrcnn_bbox", "output_mrcnn_mask",
                                                          "output_rois", "output_rpn_class", "output_rpn_bbox"},
                                                       {},
                                                       &outputs);
    if (!run_status.ok()) {
        std::cerr << "tfSession->Run failed: " << run_status << std::endl;
    }
    tm.stop();
    std::cout << "Inference time, ms: " << tm.getTimeMilli()  << std::endl;

    if (outputs[3].shape().dims() != 5 || outputs[3].shape().dim_size(4) != 2) {
        throw std::runtime_error("Expected mask dimensions to be [1,100,28,28,2] but got: " + outputs[3].shape().DebugString());
    }

    auto detectionsMap = outputs[0].tensor<float, 3>();

    for (int i = 0; i < outputs[3].shape().dim_size(1); ++i) {
        auto scoreAtI = detectionsMap(0, i, 5);
        auto detectedClass = detectionsMap(0, i, 4);
        auto y1 = detectionsMap(0, i, 0), x1 = detectionsMap(0, i, 1), y2 = detectionsMap(0, i, 2), x2 = detectionsMap(0, i, 3);
        auto maskHeight = y2 - y1, maskWidth = x2 - x1;

        if (maskHeight != 0 && maskWidth != 0) {
            // Pointer arithmetic
            const int i0 = 0, /* size0 = (int)outputs[3].shape().dim_size(1), */ i1 = i, size1 = (int)outputs[3].shape().dim_size(1), size2 = (int)outputs[3].shape().dim_size(2), size3 = (int)outputs[3].shape().dim_size(3), i4 = (int)detectedClass /*, size4 = 2 */;
            int pointerLocationOfI = (i0*size1 + i1)*size2;
            float_t *maskPointer = outputs[3].flat<float_t>().data();

            // The shape of the detection is [28,28,2], where the last index is the class of interest.
            // We'll extract index 1 because it's the toilet seat.
            cv::Mat initialMask(cv::Size(size2, size3), CV_32FC2, &maskPointer[pointerLocationOfI]); // CV_32FC2 because I know size4 is 2
            cv::Mat detectedMask(initialMask.size(), CV_32FC1);
            cv::extractChannel(initialMask, detectedMask, i4);

            // Convert to B&W
            cv::Mat binaryMask(detectedMask.size(), CV_8UC1);
            cv::threshold(detectedMask, binaryMask, 0.5, 255, cv::THRESH_BINARY);

            // First scale and offset in relation to TF_MASKRCNN_IMG_WIDTHHEIGHT
            cv::Mat scaledDetectionMat(maskHeight, maskWidth, CV_8UC1);
            cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);
            cv::Mat scaledOffsetMat(moldedInput.size(), CV_8UC1, cv::Scalar(0));
            scaledDetectionMat.copyTo(scaledOffsetMat(cv::Rect(x1, y1, maskWidth, maskHeight)));

            // Second, scale and offset in relation to our original inputMat
            cv::Mat detectionScaledToSquare(squareInputMat.size(), CV_8UC1);
            cv::resize(scaledOffsetMat, detectionScaledToSquare, detectionScaledToSquare.size(), 0, 0);
           detectionScaledToSquare(cv::Rect(leftBorder, topBorder, inputMat.size().width, inputMat.size().height)).copyTo(dest);
        }
    }

moorage on 14 Mar 2018

👍18 ❤3 😄3 🎉2 🚀1

All 39 comments

Hi,
Did you solved your problem? I'm facing the same issus

ivshli on 27 Feb 2018

Try to visualize the graph in your pb file with
https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/python/tools/import_pb_to_tensorboard.py .
There is a good post about it at https://medium.com/@daj/how-to-inspect-a-pre-trained-tensorflow-model-5fd2ee79ced0. Probably the name "input_image_meta" has changed somehow or the pb is not complete?
I'm also very keen on getting this to run...

jmtatsch on 28 Feb 2018

👍2

@hxw111 @ivshli @jmtatsch I'm curious how you built your image_tensor and meta_tensor.
I can get my session to run, but I'm having trouble getting the outputs to work (all zeros on output_detections:0).

UPDATE: now working; posted full code in follow up comment: https://github.com/matterport/Mask_RCNN/issues/222#issuecomment-373130661

Here's how I'm running it

    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = session->Run({{"input_image:0", inputTensor}, {"input_image_meta:0", inputMetadataTensor}},
                                                 {"output_detections:0", "output_mrcnn_class:0", "output_mrcnn_bbox:0", "output_mrcnn_mask:0", "output_rois:0", "output_rpn_class:0", "output_rpn_bbox:0"},
                                                 {},
                                                 &outputs);

moorage on 12 Mar 2018

@moorage I'm working on it, try to recode the python code into c++ style, you can check utils.resize_image molded_images, compose_image_meta fucntions
If anyone who knows or already done it, very welcome to share their experiences :)

ivshli on 14 Mar 2018

😄1

    // given inputMat of type RGB (not BGR) / CV_8UC3 (possibly from an imread + cvtColor)
    // also given dest of type cv::Mat(inputMat.size(), CV_8UC1)
    // we trained on 256x256 , so TF_MASKRCNN_IMG_WIDTHHEIGHT = 256
    // we copied MEAN_PIXEL configs, so cv::Scalar TF_MASKRCNN_MEAN_PIXEL(123.7, 116.8, 103.9);
    // we statically defined float TF_MASKRCNN_IMAGE_METADATA[10] = {  0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 3 , 0 , 0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 0 , 0 }; 

    // Resize to square with max dim, so we can resize it to 512x512
    int largestDim = inputMat.size().height > inputMat.size().width ? inputMat.size().height : inputMat.size().width;
    cv::Mat squareInputMat(cv::Size(largestDim, largestDim), CV_8UC3);
    int leftBorder = (largestDim - inputMat.size().width) / 2;
    int topBorder = (largestDim - inputMat.size().height) / 2;
    cv::copyMakeBorder(inputMat, squareInputMat, topBorder, largestDim - (inputMat.size().height + topBorder), leftBorder, largestDim - (inputMat.size().width + leftBorder), cv::BORDER_CONSTANT, cv::Scalar(0));
    cv::Mat resizedInputMat(cv::Size(TF_MASKRCNN_IMG_WIDTHHEIGHT, TF_MASKRCNN_IMG_WIDTHHEIGHT), CV_8UC3);
    cv::resize(squareInputMat, resizedInputMat, resizedInputMat.size(), 0, 0);

    // Need to "mold_image" like in mask rcnn
    cv::Mat moldedInput(resizedInputMat.size(), CV_32FC3);
    resizedInputMat.convertTo(moldedInput, CV_32FC3);
    cv::subtract(moldedInput, TF_MASKRCNN_MEAN_PIXEL, moldedInput);

    // Move the data into the input tensor
    // remove memory copies by using code at https://github.com/tensorflow/tensorflow/issues/8033#issuecomment-332029092
    // allocate a Tensor and get pointer to memory for that Tensor, allocate a "fake" cv::Mat from it to use as a  basis to convert
    tensorflow::Tensor inputTensor(tensorflow::DT_FLOAT, {1, moldedInput.size().height, moldedInput.size().width, 3}); // single image instance with 3 channels
    float_t *p = inputTensor.flat<float_t>().data();
    cv::Mat inputTensorMat(moldedInput.size(), CV_32FC3, p);
    moldedInput.convertTo(inputTensorMat, CV_32FC3);

    // Copy the TF_MASKRCNN_IMAGE_METADATA data into a tensor
    tensorflow::Tensor inputMetadataTensor(tensorflow::DT_FLOAT, {1, TF_MASKRCNN_IMAGE_METADATA_LENGTH});
    auto inputMetadataTensorMap = inputMetadataTensor.tensor<float, 2>();
    for (int i = 0; i < TF_MASKRCNN_IMAGE_METADATA_LENGTH; ++i) {
        inputMetadataTensorMap(0, i) = TF_MASKRCNN_IMAGE_METADATA[i];
    }

    // Run tensorflow
    cv::TickMeter tm;
    tm.start();
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = tfSession->Run({{"input_image", inputTensor}, {"input_image_meta", inputMetadataTensor}},
                                                      {"output_detections", "output_mrcnn_class", "output_mrcnn_bbox", "output_mrcnn_mask",
                                                          "output_rois", "output_rpn_class", "output_rpn_bbox"},
                                                       {},
                                                       &outputs);
    if (!run_status.ok()) {
        std::cerr << "tfSession->Run failed: " << run_status << std::endl;
    }
    tm.stop();
    std::cout << "Inference time, ms: " << tm.getTimeMilli()  << std::endl;

    if (outputs[3].shape().dims() != 5 || outputs[3].shape().dim_size(4) != 2) {
        throw std::runtime_error("Expected mask dimensions to be [1,100,28,28,2] but got: " + outputs[3].shape().DebugString());
    }

    auto detectionsMap = outputs[0].tensor<float, 3>();

    for (int i = 0; i < outputs[3].shape().dim_size(1); ++i) {
        auto scoreAtI = detectionsMap(0, i, 5);
        auto detectedClass = detectionsMap(0, i, 4);
        auto y1 = detectionsMap(0, i, 0), x1 = detectionsMap(0, i, 1), y2 = detectionsMap(0, i, 2), x2 = detectionsMap(0, i, 3);
        auto maskHeight = y2 - y1, maskWidth = x2 - x1;

        if (maskHeight != 0 && maskWidth != 0) {
            // Pointer arithmetic
            const int i0 = 0, /* size0 = (int)outputs[3].shape().dim_size(1), */ i1 = i, size1 = (int)outputs[3].shape().dim_size(1), size2 = (int)outputs[3].shape().dim_size(2), size3 = (int)outputs[3].shape().dim_size(3), i4 = (int)detectedClass /*, size4 = 2 */;
            int pointerLocationOfI = (i0*size1 + i1)*size2;
            float_t *maskPointer = outputs[3].flat<float_t>().data();

            // The shape of the detection is [28,28,2], where the last index is the class of interest.
            // We'll extract index 1 because it's the toilet seat.
            cv::Mat initialMask(cv::Size(size2, size3), CV_32FC2, &maskPointer[pointerLocationOfI]); // CV_32FC2 because I know size4 is 2
            cv::Mat detectedMask(initialMask.size(), CV_32FC1);
            cv::extractChannel(initialMask, detectedMask, i4);

            // Convert to B&W
            cv::Mat binaryMask(detectedMask.size(), CV_8UC1);
            cv::threshold(detectedMask, binaryMask, 0.5, 255, cv::THRESH_BINARY);

            // First scale and offset in relation to TF_MASKRCNN_IMG_WIDTHHEIGHT
            cv::Mat scaledDetectionMat(maskHeight, maskWidth, CV_8UC1);
            cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);
            cv::Mat scaledOffsetMat(moldedInput.size(), CV_8UC1, cv::Scalar(0));
            scaledDetectionMat.copyTo(scaledOffsetMat(cv::Rect(x1, y1, maskWidth, maskHeight)));

            // Second, scale and offset in relation to our original inputMat
            cv::Mat detectionScaledToSquare(squareInputMat.size(), CV_8UC1);
            cv::resize(scaledOffsetMat, detectionScaledToSquare, detectionScaledToSquare.size(), 0, 0);
           detectionScaledToSquare(cv::Rect(leftBorder, topBorder, inputMat.size().width, inputMat.size().height)).copyTo(dest);
        }
    }

moorage on 14 Mar 2018

👍18 ❤3 😄3 🎉2 🚀1

this is really useful, thanks a lot.

samhodge on 25 Mar 2018

@moorage hello, thanks for your code
but I think
int pointerLocationOfI = (i0size1 + i1)size2;
should be
int pointerLocationOfI = (i0size1 + i1)size2size3size4;
How do you think？
I don't know much about outputs[3].flat().data()

luoshanwei on 2 Apr 2018

My version worked for me @luoshanwei :)

moorage on 2 Apr 2018

@moorage Hi, do you run your code on cpu or gpu? I try to run the code on a single cpu (to test the time cost), by setting the cpu number:

    GraphDef graph_def;
    SessionOptions opts;
    TF_CHECK_OK(ReadBinaryProto(Env::Default(), graph_definition, &graph_def));
    graph::SetDefaultDevice("/cpu:0", &graph_def);`

However, it doesn't work. The program also occupy other cpus.
Do you meet this problem?

ypflll on 3 Apr 2018

@ypflll never tried that, sorry. I ran on a single i7 laptop, but didn't check CPU usage.

moorage on 3 Apr 2018

@luoshanwei did you get the pointer math sorted?

samhodge on 12 Apr 2018

I am looking at this an going mildy cross eyed: https://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays/ but short of addressing each pixel by hand in a five deep for loop via the tensor math I cannot be sure of how else to do it.

samhodge on 12 Apr 2018

OpenCV leaves a lot to be desired when it comes to multichannel images: this should make short work of the problem: https://github.com/OpenImageIO/oiio/blob/master/src/libOpenImageIO/imagebuf_test.cpp

samhodge on 12 Apr 2018

Hi, I have a question about C++ implementation.
I implemented with reference the #222 comment
But the output and input is different from the comment now. I think this is because of latest update.
The error message is like this.

tfSession->Run failed: Invalid argument: You must feed a value for placeholder tensor 'input_anchors' with dtype float and shape [?,?,4]

But I am not sure how to build 'input_anchors'. Does anybody know how to build in C++?

Thank you

Masahiro1002 on 31 May 2018

@Masahiro1002 did you get to the bottom of this?

seems like we need to convert some python from here:

https://github.com/parai/Mask_RCNN/commit/6289c1bd08fc90a1c3e296be8155674651f82a4b

samhodge on 18 Jul 2018

@moorage thank you! could tell how you converted the keras to .pb in the first place?

marcown on 18 Jul 2018

@marcown you can find a multitude of guides here: https://github.com/matterport/Mask_RCNN/issues/218

samhodge on 19 Jul 2018

👍2 ❤1

thanks!

marcown on 19 Jul 2018

❤1

    // given inputMat of type RGB (not BGR) / CV_8UC3 (possibly from an imread + cvtColor)
    // also given dest of type cv::Mat(inputMat.size(), CV_8UC1)
    // we trained on 256x256 , so TF_MASKRCNN_IMG_WIDTHHEIGHT = 256
    // we copied MEAN_PIXEL configs, so cv::Scalar TF_MASKRCNN_MEAN_PIXEL(123.7, 116.8, 103.9);
    // we statically defined float TF_MASKRCNN_IMAGE_METADATA[10] = {  0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 3 , 0 , 0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 0 , 0 }; 

    // Resize to square with max dim, so we can resize it to 512x512
    int largestDim = inputMat.size().height > inputMat.size().width ? inputMat.size().height : inputMat.size().width;
    cv::Mat squareInputMat(cv::Size(largestDim, largestDim), CV_8UC3);
    int leftBorder = (largestDim - inputMat.size().width) / 2;
    int topBorder = (largestDim - inputMat.size().height) / 2;
    cv::copyMakeBorder(inputMat, squareInputMat, topBorder, largestDim - (inputMat.size().height + topBorder), leftBorder, largestDim - (inputMat.size().width + leftBorder), cv::BORDER_CONSTANT, cv::Scalar(0));
    cv::Mat resizedInputMat(cv::Size(TF_MASKRCNN_IMG_WIDTHHEIGHT, TF_MASKRCNN_IMG_WIDTHHEIGHT), CV_8UC3);
    cv::resize(squareInputMat, resizedInputMat, resizedInputMat.size(), 0, 0);

    // Need to "mold_image" like in mask rcnn
    cv::Mat moldedInput(resizedInputMat.size(), CV_32FC3);
    resizedInputMat.convertTo(moldedInput, CV_32FC3);
    cv::subtract(moldedInput, TF_MASKRCNN_MEAN_PIXEL, moldedInput);

    // Move the data into the input tensor
    // remove memory copies by using code at https://github.com/tensorflow/tensorflow/issues/8033#issuecomment-332029092
    // allocate a Tensor and get pointer to memory for that Tensor, allocate a "fake" cv::Mat from it to use as a  basis to convert
    tensorflow::Tensor inputTensor(tensorflow::DT_FLOAT, {1, moldedInput.size().height, moldedInput.size().width, 3}); // single image instance with 3 channels
    float_t *p = inputTensor.flat<float_t>().data();
    cv::Mat inputTensorMat(moldedInput.size(), CV_32FC3, p);
    moldedInput.convertTo(inputTensorMat, CV_32FC3);

    // Copy the TF_MASKRCNN_IMAGE_METADATA data into a tensor
    tensorflow::Tensor inputMetadataTensor(tensorflow::DT_FLOAT, {1, TF_MASKRCNN_IMAGE_METADATA_LENGTH});
    auto inputMetadataTensorMap = inputMetadataTensor.tensor<float, 2>();
    for (int i = 0; i < TF_MASKRCNN_IMAGE_METADATA_LENGTH; ++i) {
        inputMetadataTensorMap(0, i) = TF_MASKRCNN_IMAGE_METADATA[i];
    }

    // Run tensorflow
    cv::TickMeter tm;
    tm.start();
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = tfSession->Run({{"input_image", inputTensor}, {"input_image_meta", inputMetadataTensor}},
                                                      {"output_detections", "output_mrcnn_class", "output_mrcnn_bbox", "output_mrcnn_mask",
                                                          "output_rois", "output_rpn_class", "output_rpn_bbox"},
                                                       {},
                                                       &outputs);
    if (!run_status.ok()) {
        std::cerr << "tfSession->Run failed: " << run_status << std::endl;
    }
    tm.stop();
    std::cout << "Inference time, ms: " << tm.getTimeMilli()  << std::endl;

    if (outputs[3].shape().dims() != 5 || outputs[3].shape().dim_size(4) != 2) {
        throw std::runtime_error("Expected mask dimensions to be [1,100,28,28,2] but got: " + outputs[3].shape().DebugString());
    }

    auto detectionsMap = outputs[0].tensor<float, 3>();

    for (int i = 0; i < outputs[3].shape().dim_size(1); ++i) {
        auto scoreAtI = detectionsMap(0, i, 5);
        auto detectedClass = detectionsMap(0, i, 4);
        auto y1 = detectionsMap(0, i, 0), x1 = detectionsMap(0, i, 1), y2 = detectionsMap(0, i, 2), x2 = detectionsMap(0, i, 3);
        auto maskHeight = y2 - y1, maskWidth = x2 - x1;

        if (maskHeight != 0 && maskWidth != 0) {
            // Pointer arithmetic
            const int i0 = 0, /* size0 = (int)outputs[3].shape().dim_size(1), */ i1 = i, size1 = (int)outputs[3].shape().dim_size(1), size2 = (int)outputs[3].shape().dim_size(2), size3 = (int)outputs[3].shape().dim_size(3), i4 = (int)detectedClass /*, size4 = 2 */;
            int pointerLocationOfI = (i0*size1 + i1)*size2;
            float_t *maskPointer = outputs[3].flat<float_t>().data();

            // The shape of the detection is [28,28,2], where the last index is the class of interest.
            // We'll extract index 1 because it's the toilet seat.
            cv::Mat initialMask(cv::Size(size2, size3), CV_32FC2, &maskPointer[pointerLocationOfI]); // CV_32FC2 because I know size4 is 2
            cv::Mat detectedMask(initialMask.size(), CV_32FC1);
            cv::extractChannel(initialMask, detectedMask, i4);

            // Convert to B&W
            cv::Mat binaryMask(detectedMask.size(), CV_8UC1);
            cv::threshold(detectedMask, binaryMask, 0.5, 255, cv::THRESH_BINARY);

            // First scale and offset in relation to TF_MASKRCNN_IMG_WIDTHHEIGHT
            cv::Mat scaledDetectionMat(maskHeight, maskWidth, CV_8UC1);
            cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);
            cv::Mat scaledOffsetMat(moldedInput.size(), CV_8UC1, cv::Scalar(0));
            scaledDetectionMat.copyTo(scaledOffsetMat(cv::Rect(x1, y1, maskWidth, maskHeight)));

            // Second, scale and offset in relation to our original inputMat
            cv::Mat detectionScaledToSquare(squareInputMat.size(), CV_8UC1);
            cv::resize(scaledOffsetMat, detectionScaledToSquare, detectionScaledToSquare.size(), 0, 0);
           detectionScaledToSquare(cv::Rect(leftBorder, topBorder, inputMat.size().width, inputMat.size().height)).copyTo(dest);
        }
    }

Hey man, could you share you pb model file with me? Now you code can not work on the latest pd model of mask-rcnn. Here is my e-mail: [email protected]
I really need to make mask-rcnn to do inference in C++ environment, thanks!

msr-peng on 14 Dec 2018

@Masahiro1002 I have the same issue on c++ Tensorflow, have you solved this problem?
why @moorage 's code hasn't 'input_anchors' ?

95xueqian on 29 Dec 2018

@moorage Can you provide complete code for calling the Mask RCNN model in C++?
Mask RCNN model input should have three parameters, why is there less input_anchors parameter in your code?

ChauncyFr on 6 Jan 2019

@moorage I'm working on it, try to recode the python code into c++ style, you can check utils.resize_image molded_images, compose_image_meta fucntions
If anyone who knows or already done it, very welcome to share their experiences :)

i am working on it ,did you finish it

gyp2448565528 on 17 Jan 2019

My version worked for me @luoshanwei :)

this mrcnn model of .pb should have 3 inputs (input_image:0, input_image_meta:0, input_anchors:0) and 7 outputs, but why you only have two inputs? where's the last one?

kongjibai on 23 Feb 2019

My version worked for me @luoshanwei :)

this mrcnn model of .pb should have 3 inputs (input_image:0, input_image_meta:0, input_anchors:0) and 7 outputs, but why you only have two inputs? where's the last one?

the Version2.1 have 3 inputs, V2.0 have 2 inputs, you can find them in source code mrcnn/model.py

gyp2448565528 on 24 Feb 2019

My version worked for me @luoshanwei :)

this mrcnn model of .pb should have 3 inputs (input_image:0, input_image_meta:0, input_anchors:0) and 7 outputs, but why you only have two inputs? where's the last one?

the Version2.1 have 3 inputs, V2.0 have 2 inputs, you can find them in source code mrcnn/model.py

Hi, man! Do you know how to generate mask on the image from the model output? Because I find with above code moorage shared, the maskHeight<1 and maskWidth<1. This leads to resize failed, how did you solved it? Could you help me?
cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);

kongjibai on 28 Feb 2019

My version worked for me @luoshanwei :)

this mrcnn model of .pb should have 3 inputs (input_image:0, input_image_meta:0, input_anchors:0) and 7 outputs, but why you only have two inputs? where's the last one?

the Version2.1 have 3 inputs, V2.0 have 2 inputs, you can find them in source code mrcnn/model.py

Hi, man! Do you know how to generate mask on the image from the model output? Because I find with above code moorage shared, the maskHeight<1 and maskWidth<1. This leads to resize failed, how did you solved it? Could you help me?
cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);

My colleague modified the original code.But I dont know how to upload the file, give me your email , I will send it to u

gyp2448565528 on 1 Mar 2019

My colleague modified the original code.But I dont know how to upload the file, give me your email , I will send it to u

My email: [email protected], thank you very much!

kongjibai on 3 Mar 2019

    // given inputMat of type RGB (not BGR) / CV_8UC3 (possibly from an imread + cvtColor)
    // also given dest of type cv::Mat(inputMat.size(), CV_8UC1)
    // we trained on 256x256 , so TF_MASKRCNN_IMG_WIDTHHEIGHT = 256
    // we copied MEAN_PIXEL configs, so cv::Scalar TF_MASKRCNN_MEAN_PIXEL(123.7, 116.8, 103.9);
    // we statically defined float TF_MASKRCNN_IMAGE_METADATA[10] = {  0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 3 , 0 , 0 ,TF_MASKRCNN_IMG_WIDTHHEIGHT ,TF_MASKRCNN_IMG_WIDTHHEIGHT , 0 , 0 }; 

    // Resize to square with max dim, so we can resize it to 512x512
    int largestDim = inputMat.size().height > inputMat.size().width ? inputMat.size().height : inputMat.size().width;
    cv::Mat squareInputMat(cv::Size(largestDim, largestDim), CV_8UC3);
    int leftBorder = (largestDim - inputMat.size().width) / 2;
    int topBorder = (largestDim - inputMat.size().height) / 2;
    cv::copyMakeBorder(inputMat, squareInputMat, topBorder, largestDim - (inputMat.size().height + topBorder), leftBorder, largestDim - (inputMat.size().width + leftBorder), cv::BORDER_CONSTANT, cv::Scalar(0));
    cv::Mat resizedInputMat(cv::Size(TF_MASKRCNN_IMG_WIDTHHEIGHT, TF_MASKRCNN_IMG_WIDTHHEIGHT), CV_8UC3);
    cv::resize(squareInputMat, resizedInputMat, resizedInputMat.size(), 0, 0);

    // Need to "mold_image" like in mask rcnn
    cv::Mat moldedInput(resizedInputMat.size(), CV_32FC3);
    resizedInputMat.convertTo(moldedInput, CV_32FC3);
    cv::subtract(moldedInput, TF_MASKRCNN_MEAN_PIXEL, moldedInput);

    // Move the data into the input tensor
    // remove memory copies by using code at https://github.com/tensorflow/tensorflow/issues/8033#issuecomment-332029092
    // allocate a Tensor and get pointer to memory for that Tensor, allocate a "fake" cv::Mat from it to use as a  basis to convert
    tensorflow::Tensor inputTensor(tensorflow::DT_FLOAT, {1, moldedInput.size().height, moldedInput.size().width, 3}); // single image instance with 3 channels
    float_t *p = inputTensor.flat<float_t>().data();
    cv::Mat inputTensorMat(moldedInput.size(), CV_32FC3, p);
    moldedInput.convertTo(inputTensorMat, CV_32FC3);

    // Copy the TF_MASKRCNN_IMAGE_METADATA data into a tensor
    tensorflow::Tensor inputMetadataTensor(tensorflow::DT_FLOAT, {1, TF_MASKRCNN_IMAGE_METADATA_LENGTH});
    auto inputMetadataTensorMap = inputMetadataTensor.tensor<float, 2>();
    for (int i = 0; i < TF_MASKRCNN_IMAGE_METADATA_LENGTH; ++i) {
        inputMetadataTensorMap(0, i) = TF_MASKRCNN_IMAGE_METADATA[i];
    }

    // Run tensorflow
    cv::TickMeter tm;
    tm.start();
    std::vector<tensorflow::Tensor> outputs;
    tensorflow::Status run_status = tfSession->Run({{"input_image", inputTensor}, {"input_image_meta", inputMetadataTensor}},
                                                      {"output_detections", "output_mrcnn_class", "output_mrcnn_bbox", "output_mrcnn_mask",
                                                          "output_rois", "output_rpn_class", "output_rpn_bbox"},
                                                       {},
                                                       &outputs);
    if (!run_status.ok()) {
        std::cerr << "tfSession->Run failed: " << run_status << std::endl;
    }
    tm.stop();
    std::cout << "Inference time, ms: " << tm.getTimeMilli()  << std::endl;

    if (outputs[3].shape().dims() != 5 || outputs[3].shape().dim_size(4) != 2) {
        throw std::runtime_error("Expected mask dimensions to be [1,100,28,28,2] but got: " + outputs[3].shape().DebugString());
    }

    auto detectionsMap = outputs[0].tensor<float, 3>();

    for (int i = 0; i < outputs[3].shape().dim_size(1); ++i) {
        auto scoreAtI = detectionsMap(0, i, 5);
        auto detectedClass = detectionsMap(0, i, 4);
        auto y1 = detectionsMap(0, i, 0), x1 = detectionsMap(0, i, 1), y2 = detectionsMap(0, i, 2), x2 = detectionsMap(0, i, 3);
        auto maskHeight = y2 - y1, maskWidth = x2 - x1;

        if (maskHeight != 0 && maskWidth != 0) {
            // Pointer arithmetic
            const int i0 = 0, /* size0 = (int)outputs[3].shape().dim_size(1), */ i1 = i, size1 = (int)outputs[3].shape().dim_size(1), size2 = (int)outputs[3].shape().dim_size(2), size3 = (int)outputs[3].shape().dim_size(3), i4 = (int)detectedClass /*, size4 = 2 */;
            int pointerLocationOfI = (i0*size1 + i1)*size2;
            float_t *maskPointer = outputs[3].flat<float_t>().data();

            // The shape of the detection is [28,28,2], where the last index is the class of interest.
            // We'll extract index 1 because it's the toilet seat.
            cv::Mat initialMask(cv::Size(size2, size3), CV_32FC2, &maskPointer[pointerLocationOfI]); // CV_32FC2 because I know size4 is 2
            cv::Mat detectedMask(initialMask.size(), CV_32FC1);
            cv::extractChannel(initialMask, detectedMask, i4);

            // Convert to B&W
            cv::Mat binaryMask(detectedMask.size(), CV_8UC1);
            cv::threshold(detectedMask, binaryMask, 0.5, 255, cv::THRESH_BINARY);

            // First scale and offset in relation to TF_MASKRCNN_IMG_WIDTHHEIGHT
            cv::Mat scaledDetectionMat(maskHeight, maskWidth, CV_8UC1);
            cv::resize(binaryMask, scaledDetectionMat, scaledDetectionMat.size(), 0, 0);
            cv::Mat scaledOffsetMat(moldedInput.size(), CV_8UC1, cv::Scalar(0));
            scaledDetectionMat.copyTo(scaledOffsetMat(cv::Rect(x1, y1, maskWidth, maskHeight)));

            // Second, scale and offset in relation to our original inputMat
            cv::Mat detectionScaledToSquare(squareInputMat.size(), CV_8UC1);
            cv::resize(scaledOffsetMat, detectionScaledToSquare, detectionScaledToSquare.size(), 0, 0);
           detectionScaledToSquare(cv::Rect(leftBorder, topBorder, inputMat.size().width, inputMat.size().height)).copyTo(dest);
        }
    }

Hello，my dear friend ,can you send your tensorflow.dll, tensorflow.lib and include files to me ?I can't compile the dll from tensorflow source files. I really need it, my email:[email protected]. thanks a lot

121649982 on 20 Jun 2019

I have saved the mask_rcnn model as the .pb file .But there are totally two parts of input in keras code: image and meta. I couldn't find how to feed them into the input in tensorflow c++.
This is my c++ code:
Status run_status = session->Run(
{{"input_image", image_tensor},{"input_image_meta",meta_tensor}},
{"output_node0"}, {}, &outputs
);

And I got an error "Running model failed: Not found: FeedInputs :unable to find feed output input_image_meta". Are there any tricks to solve the problem??
Thanks!

hey , why my x,y=0? and no mask and box was showed, please help me ,thanks

buaacarzp on 26 Jun 2019

wht my detectionsMap is zero ?

buaacarzp on 26 Jun 2019

👍1

this is my handle, https://blog.csdn.net/qq_33671888/article/details/89254537

CasonTsai on 2 Jul 2019

@caishengzao Could you push it to a repository? My Chinese is not the best ;) , which makes it difficult to read

MennoK on 8 Jul 2019

@MennoK ok,i will try to make it into English and then post all resource into github

CasonTsai on 9 Jul 2019

@MennoK this is my code implementation https://github.com/CasonTsai/MaskRcnn_tensorflow_cpp_inference,
Sorry, it's late

CasonTsai on 1 Oct 2019

@moorage @CasonTsai I implemented a tensorflow c++11 infer engine and run inference of Mask RCNN sucessfully, except that the output of tensor "detection" is full of zeros (I can see non-zeros values for "mrcnn mask" tensor). I see that you have encountered with similar questions.

debug-tensorflow-cpp

debug_eigen

debug_tensorflow

output

The model are exported using modern TF 2.0 methods with tf 1.x compatible interface (you can easily see it from codes)

mrcnn

I wonder that how did you solve the problem (zeros with "detection" tensor)before when you got zeros from output tensors. This prevent me from moving forward, any helps are appreciated. @moorage

Here is some snippet of codes for inference:

        // In c++ it is also possible for tensorflow to create an reader operator to automatically read images from an image
        // path, where image tensor is built automatically and graph_def is finally converted from a variable of type tf::Scope.
        // In tensorflow, see codes defined in "tensorflow/core/framework/tensor_types.h" and "tensorflow/core/framework/tensor.h"
        // that users are able to use Eigen::TensorMap to extract values from the container for reading and assignment. (Lei ([email protected]) 2020.7)
        tfe::Tensor _molded_images(tf::DT_FLOAT, tf::TensorShape({1, molded_shape(0), molded_shape(1), 3}));
        auto _molded_images_mapped = _molded_images.tensor<float, 4>();
        // @todo TODO using Eigen::TensorMap to optimize the copy operation, e.g.: float* data_mapped = _molded_images.flat<float>().data();  copy to the buf using memcpy
        //   ref: 1. discussion Tensorflow Github repo issue#8033
        //        2. opencv2 :
        //          2.1. grab buf: Type* buf = mat.ptr<Type>();
        //          2.2  memcpy to the buf
        //        3. Eigen::Tensor buffer :
        //          3.1 grab buf in RowMajor/ColMajor layout: tensor.data();
        //          3.2 convert using Eigen::TensorMap : Eigen::TensorMap<Eigen::Tensor<Type, NUM_DIMS>>(buf)
        //  _molded_images_mapped = Eigen::TensorMap<Eigen::Tensor<float, 4, Eigen::RowMajor>>(&data[0], 1, molded_shape_H, molded_shape_W, 3);
        for (int h=0; h < molded_shape(1); h++) {
            for (int w=0; w < molded_shape(2); w++) {
                _molded_images_mapped(0, h, w, 0) = molded_images(0, h, w, 0);
                _molded_images_mapped(0, h, w, 1) = molded_images(0, h, w, 1);
                _molded_images_mapped(0, h, w, 2) = molded_images(0, h, w, 2);
            }
        }
        inputs->emplace_back("input_image:0", _molded_images);

        tfe::Tensor _images_metas(tf::DT_FLOAT, tf::TensorShape({1, images_metas.cols() } ) );
        auto _images_metas_mapped = _images_metas.tensor<float, 2>();
        for (int i=0; i < images_metas.cols(); i++)
        {
            _images_metas_mapped(0, i) = images_metas(0, i);
        }
        inputs->emplace_back("input_image_meta:0", _images_metas);

        tfe::Tensor _anchors(tf::DT_FLOAT, tf::TensorShape({1, anchors.rows(), anchors.cols()}));
        auto _anchors_mapped = _anchors.tensor<float, 3>();
        for (int i=0; i < anchors.rows(); i++)
        {
            for (int j=0; j < anchors.cols(); j++)
            {
                 _anchors_mapped(0,i,j) = anchors(i,j);
            }
        }
        inputs->emplace_back("input_anchors:0", _anchors);

        // @todo : TODO
        // run base_engine_ detection
        // see examples from main.cpp, usage of TensorFlowEngine

        // load saved model
//        tfe::FutureType fut = base_engine_->Run(*inputs, *outputs,
//                                                {"mrcnn_detection/Reshape_1:0", "mrcnn_class/Reshape_1:0", "mrcnn_bbox/Reshape:0", "mrcnn_mask/Reshape_1:0", "ROI/packed_2:0", "rpn_class/concat:0", "rpn_bbox/concat:0"}, {});
        // load saved graph
        tfe::FutureType fut = base_engine_->Run(*inputs, *outputs,
                                                {"output_detections:0", "output_mrcnn_class:0", "output_mrcnn_bbox:0", "output_mrcnn_mask:0", "output_rois:0", "output_rpn_class:0", "output_rpn_bbox:0"}, {});
        // pass fut object to anther thread by value to avoid undefined behaviors
        std::shared_future<tfe::ReturnType>  fut_ref( std::move(fut) );

        // wrap fut with a new future object and pass local variables in
        std::future<ReturnType> wrapped_fut = std::async(std::launch::async, [=, &rets]() -> ReturnType {
            LOG(INFO) << "enter into sfe TF handler ...";

            // fetch result
            fut_ref.wait();

            tf::Status status = fut_ref.get();
            std::string graph_def = base_model_dir_;
            if (status.ok()) {

                if (outputs->size() == 0) {
                    LOG(INFO) << format("[Main] Found no output: %s!", graph_def.c_str(), status.ToString().c_str());
                    return status;
                }
                LOG(INFO) << format("[Main] Success: infer through <%s>!", graph_def.c_str());
                // @todo : TODO fill out the detectron result

                tfe::Tensor detections = (*outputs)[0];
                tfe::Tensor mrcnn_mask = (*outputs)[3];

                // @todo : TODO convert tf::Tensor to eigen matrix/tensor
                auto detections_mapped = detections.tensor<float, 3>();
                auto mrcnn_mask_mapped = mrcnn_mask.tensor<float, 5>();
#ifndef NDEBUG
                LOG(INFO) << format("detections(shape:(%d,%d,%d)):",
                        detections_mapped.dimension(0),
                        detections_mapped.dimension(1),
                        detections_mapped.dimension(2))
                << std::endl << detections_mapped;
                // LOG(INFO) << "mask:" << std::endl << mrcnn_mask_mapped;
#endif
                for (int i=0; i < images.size(); i++) {
                    // Eigen::Tensor is default ColMajor layout, which is different from c/c++ matrix layout.
                    // Note only column layout is fully supported for the moment (v3.3.9)
//                    Eigen::Tensor<float, 2> detection = Eigen::TensorLayoutSwapOp<Eigen::Tensor<float, 2, Eigen::RowMajor>>
//                    (detections_mapped.chip(i, 0));
                    Eigen::Tensor<float, 2, Eigen::RowMajor> detection = detections_mapped.chip(i, 0);
                    // Generate mask using a threshold
//                    Eigen::Tensor<float, 4> mask = Eigen::TensorLayoutSwapOp<Eigen::Tensor<float, 4, Eigen::RowMajor>>
//                    (mrcnn_mask_mapped.chip(i, 0));
                    Eigen::Tensor<float, 4, Eigen::RowMajor> mask = mrcnn_mask_mapped.chip(i, 0);


                    DetectronResult ret;
                    Eigen::MatrixXi window = windows.row(i);

                    unmold_detections(detection, mask, image_shape, molded_shape, window, ret);
                    rets.push_back( std::move(ret) );
                }

yiakwy on 27 Jul 2020

@yiakwy you may need to check these steps:
1 keep same config(such input size ,batch_size ) when you save the keras model,turn to keras model to tf model,and inference model with c++
2 check the input name ,and grantee the image data have been flow into tensor correctly，
3 check the preprogress,such as generating the anchors
4 you can reference the link
https://github.com/CasonTsai/MaskRcnn_tensorflow_cpp_inference ,https://blog.csdn.net/qq_33671888/article/details/89254537

CasonTsai on 7 Aug 2020

@CasonTsai Thanks for the suggestions. Could you help to check following codes ? You can also checkout the codes here :
https://github.com/yiakwy/SEMANTIC_VISUAL_SUPPORTED_ODEMETRY/blob/master/modules/models/sfe.h
https://github.com/yiakwy/SEMANTIC_VISUAL_SUPPORTED_ODEMETRY/blob/master/modules/models/simple_mrcnn_infer.cpp

Codes on exportation of models from keras to Tensorflow either in pure protobuf file with constant variables or introduced saved format (google Cloud team introduced this method in 2017 for tensorflow serving) could be find in https://github.com/yiakwy/SEMANTIC_VISUAL_SUPPORTED_ODEMETRY/blob/master/python/pysvso/models/sfe.py
Both input tests and output tests are included in cpp source to compare with results from python backend

Some operations has already been token to ensure:

input tensors tests covered with exactly or nearly the same quantities
tensorflow graph are generated in two ways : both protobuf file for for tf 1.x and saved format for tf 2.x。 Documentation provided: https://github.com/yiakwy/SEMANTIC_VISUAL_SUPPORTED_ODEMETRY/blob/master/docs/tensorflow_models/tensorflow_cpp_inference.md
exportation signature In python side is in consistent with cpp inference engine

Have you ever encountered the same question before ?

yiakwy on 8 Aug 2020

@hxw111 @CasonTsai @121649982 @moorage

Ultimate solution.

The bug has been fixed with tensorflow inference test sboth in python and cpp (fix a typo, wrong index in image_meta). Here is an example from output:

resolved_bug

Recently I made a speech in Google Developer Group (GDG) about inference in end devices. Welcome to have a look at it!

Close the issue.

yiakwy on 24 Aug 2020

@waleedka @MennoK I want to add a pull request about solution to this problem. Also I used MASK_RCNN in my POC poroject svso on depth estimation in real time (a semantic SLAM project) and introduce it to the public in GDG.

yiakwy on 24 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings