Mask_rcnn: model.detect interface performance issue

Created on 18 Dec 2017 · 13Comments · Source: matterport/Mask_RCNN

Hi matterport,

When I try to use this Mask_RCNN source code to do image segmentation for image size 1000*600, I found its performance is not good and about 1.5s per frame. It seems the python code of mask rcnn is not optimized by GPU.

I wonder if it could be optimized?
Matterport will be plan to optimize it?

Thanks.

Source

topcomma

Most helpful comment

I checked my code,and found that most of my time cost was in load_gt_mask(), in which I stupidly used a nested cycle.
After fix this, I get a time cost of 82.1ms to detect a 250*250 size image on a TitanX, 12G.
It decreases slightly when set RPN_ANCHOR_STRIDE to 2 (75.2ms), or set POST_NMS_ROIS_INFERENCE to 500 (76.9ms).

ypflll on 20 Dec 2017

👍4

All 13 comments

It costs 350ms when I infer a 250*250 size image on a TitanX, 12G, and almost the same when using a Resnet50 architecture.
Sames that it's far behind Kaiming He's paper which is 195ms.

ypflll on 19 Dec 2017

The final detection layer is written in Python. It can be converted to TensorFlow code, which will improve the speed significantly. I don't know if I'll have time to do that change any time soon, but if you want to try doing that I can help guide you in the right direction.

More details here https://github.com/matterport/Mask_RCNN/issues/34

waleedka on 20 Dec 2017

ypflll on 20 Dec 2017

👍4

Hi, waleedka and ypflll

Thanks for your help and suggestion.
Mask RCNN is the state of the art for image segmentation, I will follow the performance issue in the future. Hope to improve it and apply it to specific scenario. @waleedka, if can get your guide to it someday, greatly appreciate in advance.

topcomma on 20 Dec 2017

Hi everyone,
I also found the prediction speed seems to be slow. In my case the input image is 480 by 640. And the model.detect function costs more than 1s on a GTX 1080Ti. I tried to resize the image to 250 by 250, but that function call still costs nearly 1s. And idea on this? @ypflll
Thanks very much.

flyers on 23 Dec 2017

I don't know the detail how you test it.
You can print time cost of the model.detect function. My result is 82ms.

ypflll on 25 Dec 2017

Hi, I just directly use the given coco model and the model.detect function. The testing code is as follows. The time cost is 0.81s, which is about 10 times slower than your case. If I do not resize the orginal image (which is 480 by 640), then the time cost is 1.01s. I wonder whether I missed setting some parameters properly. @ypflll

import os
import time
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

import coco
import utils
import model as modellib
import visualize

class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
    IMAGE_MIN_DIM = 256
    IMAGE_MAX_DIM = 256

# Root directory of the project
ROOT_DIR = os.getcwd()

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on

config = InferenceConfig()
config.display()

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

# read a RGBA image
image = skimage.io.imread('Image00004.png')[:,:,0:3]
from skimage.transform import resize
image_resized = resize(image, (256, 256))
print(image_resized.shape)

# Run detection
t1 = time.time()
results = model.detect([image_resized], verbose=1)
print('Detection time is {}'.format(time.time() - t1))

flyers on 29 Dec 2017

@flyers Hi, it's hard to say. Seems nothing wrong.
Maybe you can insight the model.detect to get more information.

ypflll on 5 Jan 2018

I currently have the model hooked up to OpenCV to run live and I'm getting around 3 FPS.

I reduced the molded_images to 512 by 512. 256 by 256 doesn't offer much improvement in FPS.

Processing 1 images
image shape: (480, 640, 3) min: 0.00000 max: 255.00000
molded_images shape: (1, 512, 512, 3) min: -123.70000 max: 151.10000
image_metas shape: (1, 89) min: 0.00000 max: 640.00000
total time taken this loop: 0.2837817668914795 s
Processing 1 images
image shape: (480, 640, 3) min: 0.00000 max: 255.00000
molded_images shape: (1, 512, 512, 3) min: -123.70000 max: 151.10000
image_metas shape: (1, 89) min: 0.00000 max: 640.00000
total time taken this loop: 0.2817041873931885 s
Processing 1 images
image shape: (480, 640, 3) min: 0.00000 max: 255.00000
molded_images shape: (1, 512, 512, 3) min: -123.70000 max: 151.10000
image_metas shape: (1, 89) min: 0.00000 max: 640.00000
total time taken this loop: 0.2877650260925293 s
Processing 1 images
image shape: (480, 640, 3) min: 0.00000 max: 255.00000
molded_images shape: (1, 512, 512, 3) min: -123.70000 max: 151.10000
image_metas shape: (1, 89) min: 0.00000 max: 640.00000
total time taken this loop: 0.30383849143981934 s

I was wondering what I could do to improve the performance because I can see that neither my CPU or GPU is running anywhere near 100% (CPU is running at around 30% while GPU reached max of 60% but most of the time it is spiking around 30-40%).

I don't think loading the camera feed is the bottleneck either because I get similar performance if I run the model on a video.

I don't know if this is the correct place to ask it but is Tensorflow set to take 100% of a GPU's performance? Or can it be manually adjusted?

Thanks in advance for any help or advice.

Edit: I just ran the script a couple more times and actually the GPU is barely hitting 10% utilization. CPU is running at around 60% but I am running a couple of other tasks in the background.

ghost on 16 Jan 2018

hi,

I use this model to evaluate the speed in my PC with 1080ti. And it cost about 120ms per image whose size is 480 * 640. So I hope somebody can provide a modified model used resnet-50 as the backbone, because this repository has too many forks to display, which is difficult to see others modified.

20chase on 4 Feb 2018

To use resnet 50, just change the line_, C2, C3, C4, C5 = resnet_graph(input_image, "resnet101", stage5=True) to _, C2, C3, C4, C5 = resnet_graph(input_image, "resnet50", stage5=True) in model.py