Darknet: CenterNet : Objects as Points — 2x Better Than Yolo V3 in Speed and +4.4% Coco AP

Created on 19 Oct 2019  Â·  25Comments  Â·  Source: AlexeyAB/darknet

CenterNet

Objects as Points seems to achieve a good speed-accuracy tradeoff, better than Yolo v3, and probably better than CornerNet (#3229).

GitHub repo here.

Like CornerNet this one works without anchor boxes (nor NMS) and can regress many other properties with such as 3D location and pose estimation.

May be interesting to test this new detection head with Darknet backbones + PAN instead of Hourglass/DLA.

Speed-accuracy Trade Off (Titan Xp)

Speed-accuracy-trade-off

Coco Challenge State of the Art Networks

Coco-challenge-comp

Different Backbones for Speed-Accuracy Tradeoff

Backbone-comp

Yolo-comp

Most helpful comment

There is already MatrixNet in the Roadmap, that is faster and more accurate than CornerNet: https://github.com/AlexeyAB/darknet/issues/3772

Roadmap: https://github.com/AlexeyAB/darknet/projects/1

image

All 25 comments

There is already MatrixNet in the Roadmap, that is faster and more accurate than CornerNet: https://github.com/AlexeyAB/darknet/issues/3772

Roadmap: https://github.com/AlexeyAB/darknet/projects/1

image

I added something like CenterNet: https://github.com/AlexeyAB/darknet/issues/3229#issuecomment-569412122

Thank you for this implementation!

Two different papers exist for CenterNet: Key-point Triplets (the one you implemented) and Objects as Points.

I tried the second one with the author's repo but results were not as good as expected on my personal dataset, Yolo v3 Tiny Pan3 is still more accurate and faster.

I'll post complete results in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673 as usual in a week or two.

@laclouis5

Did you try CenterNet dla-34 512x512 ?
Can you add results to your table? https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673

Also try to train Yolo v3 Tiny Pan3 with pre-trained weights: https://drive.google.com/file/d/18v36esoXCh-PsOKwyP2GWrpYDptDY8Zf/view?usp=sharing

@AlexeyAB

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

@laclouis5

I trained Yolo v3 Tiny Pan3 but with yolov3-tiny.conv.15 as stated in "How to train Tiny Yolo" section, is there a major difference?

No.
You can use any of these files.

I also trained CenterNet dla-34 512x512 as well as CenterNet resnet-18 512x512.

I'm currently on vacation and I can't access the training server, l'll post everything in some weeks.

Thanks.
It will also be interesting to compare the speed (FPS) of models:
CenterNet dla-34 512x512 vs CenterNet resnet-18 512x512 vs CSPResNeXt50-PANet-SPP.

Since it seems CenterNet dla-34 512x512 is slower than stated: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/1#issuecomment-569684933

@AlexeyAB

Sure, I'll add FPS for all networks including CenterNet. Which command should I use to compute FPS precisely using Darknet framework? I run demo with -dont_show then average FPS?

I run demo with -dont_show then average FPS?

Yes, just run ./darknet detector demo ... test.mp4 -dont_show by using videofile (no video-camera)

@AlexeyAB
I just added new results and FPS in https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673.

I will train Yolo v3 Spp Pan Scale with pre-trained weights when I have GPU time.

@laclouis5 Thanks!

So on GeForce GTX 1060 you get:

  • CenterNet dla-34 512x512 - 25 FPS
  • Yolo V3 CSR Spp Panet 544x544 - 18 FPS (so should be ~20 FPS 512x512)
  1. What OS, CUDA and cuDNN versions did you use for Yolov3 and CenterNet?
  2. What FPS can you get by using default yolov3.cfg/weights ?
  3. Did you use pre-trained weights file for training CenterNet dla-34 512x512 ?

@AlexeyAB,

Ubuntu 18.04.3 TLS
Intel i7-7700 @ 3.6GHz x 8
GeForce GTX 1060 6GB
Cuda 10.0
CuDNN 7.6

I got 19 FPS with original Yolo v3 network (544x544), 25 FPS in 512x512.
I got 22 FPS with Yolo V3 CSR Spp Panet 512x512.

I used pre-trained weights from ctdet-coco-dla-2x.pth for CenterNet.

| Model | Network Resolution | GTX 1060 FPS | GTX 1080Ti FPS | [email protected] |[email protected]| AP |
| --- | --- | --- | --- | ---| ---| ---|
| CenterNet dla-34 | 512x512 | 25 | ~25 | 55.1% | 40.8% | 37.4% |
| CenterNet ResNet101 | 512x512 | - | 45 | 53.0% | 36.9% | 34.6% |
| csresnext50-panet-spp original-optimal.cfg | 512x512 | 22 | 44 | 64.4% | 45.9% | 42.4% |
| yolov3.cfg | 512x512 | 25 | 30 | ~56.0% | ~33.0% | ~32.0% |

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS [email protected] [email protected] AP
CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4%
CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6%
csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4%
yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

I think the benchmark between yolov3 and centernet-darknet53 would be interesting.

Model Network Resolution GTX 1060 FPS GTX 1080Ti FPS [email protected] [email protected] AP
CenterNet dla-34 512x512 25 ~25 55.1% 40.8% 37.4%
CenterNet ResNet101 512x512 - 45 53.0% 36.9% 34.6%
csresnext50-panet-spp original-optimal.cfg 512x512 22 44 64.4% 45.9% 42.4%
yolov3.cfg 512x512 25 30 ~56.0% ~33.0% ~32.0%

What is interesting is that Yolo V3 is 1% better in [email protected] than CenterNet dla-34 but far worse in [email protected] and Coco AP (~8% and ~6%).

I noticed the same behaviour on my tests https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-549470673, Center-Net dla-34 has 75.7% [email protected] and 41.6% Coco AP. While this [email protected] is the smallest of my trained networks, Coco AP is on of the best (between Yolo V3 Tiny Pan Mixup (40%) and Yolo V3 Tiny Pan3 (42%)).

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for [email protected] but when looking at [email protected] Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

Of course newest networks such as CSR50-Panet are better than CenterNet in any category including FPS and all mAPs.

Would you mind sharing a reference to CSR50-Panet?

@reactivetype

csresnext50-panet-spp-original-optimal.cfg
https://github.com/AlexeyAB/darknet#pre-trained-models

My interpretation is that CenterNet has a better precision than Yolo but a worse recall. CenterNet misses lots of detections compared to Yolo but when it detects something the box location and size is better and the label is ok.

For example on your results Yolo V3 is near on par with CenterNet dla-34 for [email protected] but when looking at [email protected] Yolo V3 losses 23% while CenterNet losses only 14%, thus, CenterNet is more precise than Yolo on this example.

@laclouis5 When comparing precision/recall of two detection architectures, it would be a fair comparison if we compare Centernet and Yolo with the same backbone. I suspect the dla-34 backbone may not be efficient and optimal.

In fact, it would be possible to use csresnext50 with Centernet. The good thing about centernet is that it's anchor-free and NMS is optional making post-processing a lot lighter. It maintains precision by enriching the supervision labels. An interesting variant of Centernet is TTFNet, which makes training even faster with better labels. https://arxiv.org/abs/1909.00700

@reactivetype

  • Why anchor-free is good?
  • What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

  • MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors
  • MatrixNet uses soft-NMS

https://arxiv.org/pdf/1908.04646v2.pdf

https://github.com/AlexeyAB/darknet/issues/3772

KP-xNet solves problem (1) of CornerNets because all the matrix layers represent different scales and aspect ratios rather than having them all in a single layer. This also allows us to get rid of the corner pooling operation.

63209321-e8f72480-c0e7-11e9-8ab5-b75702dfcd29

  • Why anchor-free is good?
  • What mAP can CenterNet achieve without NMS?

MatrixNet is better than CenterNet, and

  • MatrixNet uses limitations of size and aspect ratio of object for each detection-layer - like anchors
  • MatrixNet uses soft-NMS

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

@reactivetype

Anchor-free is good for faster inference. It seems MatrixNet is a variant of Centernet and CornerNet. Thanks for sharing it.

Execution time = 17.9 ms:


The figure 1 you shared about compares the model based on number of params. However, it does not always correlate with actual latency.

I see that the authors' report does not also compare the latencies against existing models. (Table 2 in https://arxiv.org/pdf/2001.03194.pdf)

Yes, there is no fair comparison of accuracy / speed.

MatrixNet + ResNext101-X is better than CenterNet + very heavy HourGlass-104:

image


CenterNet: https://github.com/xingyizhou/CenterNet#object-detection-on-coco-validation

image

Hi @AlexeyAB , how are filters calculated in the centernet cfg?

@keko950 Hi,
As usual for [Gaussian_yolo] layer

filters=(classes + coords + 1)*<number of mask> = (classes + 9)*4

@AlexeyAB Hmmm.. the cfg is wrong then?

[convolutional]
size=1
stride=1
pad=1
filters=40
activation=linear

[Gaussian_yolo]
yolo_point=right_bottom
mask = 8,9,10,11
anchors = 8,8, 10,13, 16,30, 33,23, 32,32, 30,61, 62,45, 59,119, 80,80, 116,90, 156,198, 373,326
classes=1
num=12
jitter=.3
ignore_thresh = .7
truth_thresh = 1
iou_thresh=0.213
iou_normalizer=0.5
uc_normalizer=0.5
cls_normalizer=1.0
iou_loss=mse
scale_x_y = 1.1
random=0

cfg-file is correct.

Oh yes, as usuall for [Gaussian_yolo], I fixed previous answer.

https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

when using [Gaussian_yolo] layers, change [filters=57] filters=(classes + 9)x3 in the 3 [convolutional] before each [Gaussian_yolo] layer

Nice, thank you for your time!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jasleen137 picture jasleen137  Â·  3Comments

Mididou picture Mididou  Â·  3Comments

yongcong1415 picture yongcong1415  Â·  3Comments

HilmiK picture HilmiK  Â·  3Comments

Greta-A picture Greta-A  Â·  3Comments