Darknet: CenterNet : Objects as Points — 2x Better Than Yolo V3 in Speed and +4.4% Coco AP

Created on 19 Oct 2019 · 25Comments · Source: AlexeyAB/darknet

CenterNet

Objects as Points seems to achieve a good speed-accuracy tradeoff, better than Yolo v3, and probably better than CornerNet (#3229).

GitHub repo here.

Like CornerNet this one works without anchor boxes (nor NMS) and can regress many other properties with such as 3D location and pose estimation.

May be interesting to test this new detection head with Darknet backbones + PAN instead of Hourglass/DLA.

Speed-accuracy Trade Off (Titan Xp)

Speed-accuracy-trade-off

Coco Challenge State of the Art Networks

Coco-challenge-comp

Different Backbones for Speed-Accuracy Tradeoff

Backbone-comp

Yolo-comp

Source

laclouis5

Most helpful comment

There is already MatrixNet in the Roadmap, that is faster and more accurate than CornerNet: https://github.com/AlexeyAB/darknet/issues/3772

Roadmap: https://github.com/AlexeyAB/darknet/projects/1

AlexeyAB on 19 Oct 2019

👍2

All 25 comments

There is already MatrixNet in the Roadmap, that is faster and more accurate than CornerNet: https://github.com/AlexeyAB/darknet/issues/3772

Roadmap: https://github.com/AlexeyAB/darknet/projects/1

AlexeyAB on 19 Oct 2019

👍2

I added something like CenterNet: https://github.com/AlexeyAB/darknet/issues/3229#issuecomment-569412122

AlexeyAB on 28 Dec 2019

👍1

Thank you for this implementation!

Two different papers exist for CenterNet: Key-point Triplets (the one you implemented) and Objects as Points.

I tried the second one with the author's repo but results were not as good as expected on my personal dataset, Yolo v3 Tiny Pan3 is still more accurate and faster.

I'll post complete results in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673 as usual in a week or two.

laclouis5 on 30 Dec 2019

@laclouis5

Did you try CenterNet dla-34 512x512 ?
Can you add results to your table? https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673

Also try to train Yolo v3 Tiny Pan3 with pre-trained weights: https://drive.google.com/file/d/18v36esoXCh-PsOKwyP2GWrpYDptDY8Zf/view?usp=sharing

AlexeyAB on 30 Dec 2019

@AlexeyAB

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

laclouis5 on 30 Dec 2019

@laclouis5

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

No.
You can use any of these files.

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

Thanks.
It will also be interesting to compare the speed (FPS) of models:
CenterNet dla-34 512x512 vs CenterNet resnet-18 512x512 vs CSPResNeXt50-PANet-SPP.

Since it seems CenterNet dla-34 512x512 is slower than stated: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/1#issuecomment-569684933

AlexeyAB on 30 Dec 2019

@AlexeyAB

Sure, I'll add FPS for all networks including CenterNet. Which command should I use to compute FPS precisely using Darknet framework? I run demo with -dont_show then average FPS?

laclouis5 on 30 Dec 2019

I run demo with -dont_show then average FPS?

Yes, just run ./darknet detector demo ... test.mp4 -dont_show by using videofile (no video-camera)

AlexeyAB on 30 Dec 2019

👍1

@AlexeyAB
I just added new results and FPS in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673.

I will train Yolo v3 Spp Pan Scale with pre-trained weights when I have GPU time.

laclouis5 on 6 Jan 2020

@laclouis5 Thanks!

So on GeForce GTX 1060 you get:

CenterNet dla-34 512x512 - 25 FPS
Yolo V3 CSR Spp Panet 544x544 - 18 FPS (so should be ~20 FPS 512x512)

What OS, CUDA and cuDNN versions did you use for Yolov3 and CenterNet?
What FPS can you get by using default yolov3.cfg/weights ?
Did you use pre-trained weights file for training CenterNet dla-34 512x512 ?

AlexeyAB on 6 Jan 2020

@AlexeyAB,

Ubuntu 18.04.3 TLS
Intel i7-7700 @ 3.6GHz x 8
GeForce GTX 1060 6GB
Cuda 10.0
CuDNN 7.6

I got 19 FPS with original Yolo v3 network (544x544), 25 FPS in 512x512.
I got 22 FPS with Yolo V3 CSR Spp Panet 512x512.

I used pre-trained weights from ctdet-coco-dla-2x.pth for CenterNet.

laclouis5 on 7 Jan 2020

👍1

| Model | Network Resolution | GTX 1060 FPS | GTX 1080Ti FPS | [email protected] |[email protected]| AP |
| --- | --- | --- | --- | ---| ---| ---|
| CenterNet dla-34 | 512x512 | 25 | ~25 | 55.1% | 40.8% | 37.4% |
| CenterNet ResNet101 | 512x512 | - | 45 | 53.0% | 36.9% | 34.6% |
| csresnext50-panet-spp original-optimal.cfg | 512x512 | 22 | 44 | 64.4% | 45.9% | 42.4% |
| yolov3.cfg | 512x512 | 25 | 30 | ~56.0% | ~33.0% | ~32.0% |

AlexeyAB on 21 Jan 2020

👍1

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS [email protected] [email protected] AP
CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4%
CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6%
csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4%
yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

I think the benchmark between yolov3 and centernet-darknet53 would be interesting.

reactivetype on 29 Jan 2020

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS [email protected] [email protected] AP
CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4%
CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6%
csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4%
yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

What is interesting is that Yolo V3 is 1% better in [email protected] than CenterNet dla-34 but far worse in [email protected] and Coco AP (~8% and ~6%).

I noticed the same behaviour on my tests https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673, Center-Net dla-34 has 75.7% [email protected] and 41.6% Coco AP. While this [email protected] is the smallest of my trained networks, Coco AP is on of the best (between Yolo V3 Tiny Pan Mixup (40%) and Yolo V3 Tiny Pan3 (42%)).

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for [email protected] but when looking at [email protected] Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

laclouis5 on 30 Jan 2020

👍1

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

Would you mind sharing a reference to CSR50-Panet?

reactivetype on 31 Jan 2020

@reactivetype

csresnext50-panet-spp-original-optimal.cfg
https://github.com/AlexeyAB/darknet#pre-trained-models

AlexeyAB on 31 Jan 2020

👍1

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for [email protected] but when looking at [email protected] Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

@laclouis5 When comparing precision/recall of two detection architectures, it would be a fair comparison if we compare Centernet and Yolo with the same backbone. I suspect the dla-34 backbone may not be efficient and optimal.

In fact, it would be possible to use csresnext50 with Centernet. The good thing about centernet is that it's anchor-free and NMS is optional making post-processing a lot lighter. It maintains precision by enriching the supervision labels. An interesting variant of Centernet is TTFNet, which makes training even faster with better labels. https://arxiv.org/abs/1909.00700

reactivetype on 31 Jan 2020

@reactivetype

Why anchor-free is good?
What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors
MatrixNet uses soft-NMS

https://arxiv.org/pdf/1908.04646v2.pdf

https://github.com/AlexeyAB/darknet/issues/3772

KP-xNet solves problem (1) of CornerNets because all the matrix layers represent different scales and aspect ratios rather than having them all in a single layer. This also allows us to get rid of the corner pooling operation.

63209321-e8f72480-c0e7-11e9-8ab5-b75702dfcd29

AlexeyAB on 31 Jan 2020

Why anchor-free is good?

What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors

MatrixNet uses soft-NMS

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

reactivetype on 31 Jan 2020

👍1

@reactivetype

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

Execution time = 17.9 ms:

yolov3 model - 17.2 ms
3 x [yolo] layers - 0.3 ms
get_network_boxes - 0.4 ms
do_nms_sort - ~0.0 ms

https://github.com/AlexeyAB/darknet/issues/4497

The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

Yes, there is no fair comparison of accuracy / speed.

MatrixNet + ResNext101-X is better than CenterNet + very heavy HourGlass-104:

CenterNet: https://github.com/xingyizhou/CenterNet#object-detection-on-coco-validation

AlexeyAB on 31 Jan 2020

👍1

Hi @AlexeyAB , how are filters calculated in the centernet cfg?

keko950 on 13 Feb 2020

@keko950 Hi,
As usual for [Gaussian_yolo] layer

filters=(classes + coords + 1)*<number of mask> = (classes + 9)*4

AlexeyAB on 13 Feb 2020

@AlexeyAB Hmmm.. the cfg is wrong then?

[convolutional]
size=1
stride=1
pad=1
filters=40
activation=linear

[Gaussian_yolo]
yolo_point=right_bottom
mask = 8,9,10,11
anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 59,119, 80,80, 116,90, 156,198, 373,326
classes=1
num=12
jitter=.3
ignore_thresh = .7
truth_thresh = 1
iou_thresh=0.213
iou_normalizer=0.5
uc_normalizer=0.5
cls_normalizer=1.0
iou_loss=mse
scale_x_y = 1.1
random=0