Darknet: Repo Claims To Be YOLOv5

Created on 10 Jun 2020 · 81Comments · Source: AlexeyAB/darknet

Hey there,

This repo is claiming to be YOLOv5: https://github.com/ultralytics/yolov5

~~They released~~ a blog here: https://blog.roboflow.ai/yolov5-is-here/

It's being discussed on HN here: https://news.ycombinator.com/item?id=23478151

In all honesty this looks like some bullshit company stole the name, but it would be good to get some proper word on this @AlexeyAB

Source

danielbarry

👍13

Most helpful comment

Invalid comparison results in the roboflow.ai blog: https://blog.roboflow.ai/yolov5-is-here/

Actually if both networks YOLOv4s and ultralytics-YOLOv5l are trained and tested on the same framework with the same batch on a commond dataset Microsoft COCO: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

weights size: YOLOv4s 245 MB vs YOLOv5l 192 MB vs YOLOv5x 366 MB

test-dev accuracy on MSCOCO: YOLOv4s-608 45% AP vs YOLOv5l-736 44.2% AP (YOLOv4 is more accurate)
speed with batch=16: YOLOv4s-608 10.3ms vs YOLOv5l-736 13.5ms (YOLOv4 is faster)
roboflow.ai shared the Latency-Accuracy chart withultralytics-YOLOv5 which are measured with batch=32 and then divided by 32, while latency must be measured with batch=1, because the higher batch - the higher latency, latency of 1 sample can't be less than latency of the whole batch, so real latency of YOLOv5 can be up to ~1 second with high batch-size=32-64
they stated 140 FPS for YOLOv5 (s/m/l/x ???) (what batch-size ???) while YOLOv4 achieves ~400 FPS just with batch=4 by using OpenCV-dnn or TensorRT on GPU RTX 2080ti (table above)

Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.

Actually YOLOv4 is faster and more accurate than YOLOv5l if it is tested with equal settings batch=16 on the same framework https://github.com/ultralytics/yolov5 on a common dataset Microsoft COCO (while YOLOv5x is much more slower than YOLOv4, and YOLOv5s is much less accurate than YOLOv4)

CSPDarknet53s-PASPP-Mish: 608x608 (~YOLOv4) is +0.8AP more accuracte and 1.3x times (+30%) faster than YOLOv5l 736x736

Full true comparsion: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

They compared size of models of small ultralytics-YOLOv5-version YOLOv5s (27 MB) with very low accuracy 26-36% AP on Microsoft COCO with big YOLOv4 (245 MB) with very high accuracy 41-43% AP on Microsoft COCO

Fourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.

They compared speed of very small and much less accurate version of ultralytics-YOLOv5 with very accurate and big YOLOv4. They did not provide the most critical details for comparison: what exactly YOLOv5 version was used s,l,x,... what training and testing resolutions were used, and what test batch was used for both YOLOv4 vs ultralytics-YOLOv5. They did not test it on the generally accepted Microsoft COCO dataset, with exactly the same settings, and they did not test it on the Microsoft COCO CodaLab-evaluation server, to reduce the likelihood of manipulation.

Third, YOLOv5 is accurate. In our tests on the blood cell count and detection (BCCD) dataset, we achieved roughly 0.895 mean average precision (mAP) after training for just 100 epochs. Admittedly, we saw comparable performance from EfficientDet and YOLOv4, but it is rare to see such across-the-board performance improvements without any loss in accuracy.

AlexeyAB on 11 Jun 2020

👍58 ❤6 👀5 🎉4

All 81 comments

Comparison YOLOv3 vs YOLOv4 vs YOLOv5: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

CSPDarknet53s-YOSPP gets 19.5% faster model inference speed and 1.3% higher AP than YOLOv5l.

YOLOv4 achieves 133 - 384 FPS with batch=4 using OpenCV and at least 2x more with batch=32:
OpenCV_Vs_TensorRT

84604438-abf6ec80-aec8-11ea-8341-f4563ea51dbc

AlexeyAB on 10 Jun 2020

👍24

@josephofiowa I've updated my comment to reflect you're not the author - sorry. I am just trying to get to the bottom of these dubious claims.

danielbarry on 10 Jun 2020

👍4

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

fat-tire on 10 Jun 2020

👍3

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

It's the last project by pjreddie, but not the last word on YOLO or Darknet.

danielbarry on 10 Jun 2020

I'm still confused cuz i thought YOLOv3 was the final one due to ethical concerns.

Tables 8-10: https://arxiv.org/pdf/2004.10934.pdf

(Real-time detectors with FPS 30 or higher are highlighted here. We compare the results with batch=1 without using tensorRT.)

comparison_gpus

https://medium.com/@alexeyab84/yolov4-the-most-accurate-real-time-neural-network-on-ms-coco-dataset-73adfd3602fe?source=friends_link&sk=6039748846bbcf1d960c3061542591d7

Therefore, we only show results with batch = 1 and without using TensorRT on comparison graphs.

AlexeyAB on 10 Jun 2020

👍13 😄2

@glenn-jocher did a lot for the development and improvements of Yolo and showed a lot of ideas, he created at least 2 very good repositories on Pytorch. Thus, he gave Yolo a long life outside of Darknet. All this hype around the Yolov5 was not raised by him.

AlexeyAB on 10 Jun 2020

👍4 👎1

Some notes on comparison: https://github.com/ultralytics/yolov5

The latency shouldn't be measured with batch=32. The latency must be measured with batch=1, because the higher batch - the higher latency. The latency is the time of a complete data processing cycle, it cannot be less than processing a whole batch, which can take up to 1 second depends on batch-size
If there is used batch=32 for both Yolov5 vs EfficientDet (I don't know), then this is ok, but only for Yolov5 vs EfficientDet, and only for FPS (not for latency), it can't be compared with any other results where is batch=1
Size of weights: yolov5x.pt - 366 MB, yolov5s.pt - 27 MB

AlexeyAB on 11 Jun 2020

👍9

Invalid comparison results in the roboflow.ai blog: https://blog.roboflow.ai/yolov5-is-here/

weights size: YOLOv4s 245 MB vs YOLOv5l 192 MB vs YOLOv5x 366 MB

test-dev accuracy on MSCOCO: YOLOv4s-608 45% AP vs YOLOv5l-736 44.2% AP (YOLOv4 is more accurate)
speed with batch=16: YOLOv4s-608 10.3ms vs YOLOv5l-736 13.5ms (YOLOv4 is faster)
roboflow.ai shared the Latency-Accuracy chart withultralytics-YOLOv5 which are measured with batch=32 and then divided by 32, while latency must be measured with batch=1, because the higher batch - the higher latency, latency of 1 sample can't be less than latency of the whole batch, so real latency of YOLOv5 can be up to ~1 second with high batch-size=32-64
they stated 140 FPS for YOLOv5 (s/m/l/x ???) (what batch-size ???) while YOLOv4 achieves ~400 FPS just with batch=4 by using OpenCV-dnn or TensorRT on GPU RTX 2080ti (table above)

Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.

Actually YOLOv4 is faster and more accurate than YOLOv5l if it is tested with equal settings batch=16 on the same framework https://github.com/ultralytics/yolov5 on a common dataset Microsoft COCO (while YOLOv5x is much more slower than YOLOv4, and YOLOv5s is much less accurate than YOLOv4)

CSPDarknet53s-PASPP-Mish: 608x608 (~YOLOv4) is +0.8AP more accuracte and 1.3x times (+30%) faster than YOLOv5l 736x736

Full true comparsion: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/32#issuecomment-638064640

CSPDarknet53s-PASPP-Mish: (~YOLOv4)

cd53s-paspp-mish 45.0% AP @ 608x608
Model Summary: 212 layers, 6.43092e+07 parameters, 6.43092e+07 gradients
Speed: 8.7/1.6/10.3 ms inference/NMS/total per 608x608 image at batch-size 16

YOLOv5l:

yolov5l 44.2% AP @ 736x736
Model Summary: 231 layers, 6.17556e+07 parameters, 6.17556e+07 gradients
Speed: 11.3/2.2/13.5 ms inference/NMS/total per 736x736 image at batch-size 16

They compared size of models of small ultralytics-YOLOv5-version YOLOv5s (27 MB) with very low accuracy 26-36% AP on Microsoft COCO with big YOLOv4 (245 MB) with very high accuracy 41-43% AP on Microsoft COCO

Fourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.

They compared speed of very small and much less accurate version of ultralytics-YOLOv5 with very accurate and big YOLOv4. They did not provide the most critical details for comparison: what exactly YOLOv5 version was used s,l,x,... what training and testing resolutions were used, and what test batch was used for both YOLOv4 vs ultralytics-YOLOv5. They did not test it on the generally accepted Microsoft COCO dataset, with exactly the same settings, and they did not test it on the Microsoft COCO CodaLab-evaluation server, to reduce the likelihood of manipulation.

Third, YOLOv5 is accurate. In our tests on the blood cell count and detection (BCCD) dataset, we achieved roughly 0.895 mean average precision (mAP) after training for just 100 epochs. Admittedly, we saw comparable performance from EfficientDet and YOLOv4, but it is rare to see such across-the-board performance improvements without any loss in accuracy.

AlexeyAB on 11 Jun 2020

👍58 ❤6 👀5 🎉4

@AlexeyAB Thank you for breaking that down. I think my suspicion of the comparisons was warranted.

I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527

YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.

Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.

danielbarry on 11 Jun 2020

I just noticed that their iOS app page calls their network YOLOv4: https://apps.apple.com/app/id1452689527

YOLOv4 is an updated version of YOLOv3-SPP, trained on the COCO dataset in PyTorch and transferred to an Apple CoreML model via ONNX.

Someone said that they were apparently very surprised when you released YOLOv4 as they were planning to also release YOLOv4. I think this really puts emphasis on the need for people to communicate their intentions.

Yeah. I see it's from Ultralytics LLC, who now becomes of the creator of YOLOv5. I agree your opinion. IMO Ultralytics has intended to succeed to YOLO by implementing PyTorch version with several contributions. Anyway it is the encouraging news for PyTorch community even it doesn't have a significant superior to YOLOv4 of @AlexeyAB.

rcg12387 on 12 Jun 2020

I think there is a strong case for either project to adjust their name to reflect the works are not built upon one another and are not a fair comparison.

As YOLO started in the Darknet framework, this repository was somewhat endorsed by pjreddie, @AlexeyAB was first to the punch with YOLOv4, Ultralytics already had their own "flavour" of YOLOv3 for TF - it would make sense to rename YOLOv5. Even something small like "uYOLOv5", or "YOuLOv5" could be significant in distinguishing the works.

Otherwise who publishes YOLOv6, and is YOLOv6 the improvement from YOLOv4 or YOLOv5? I think this is incredibly confusing and serves nobody.

danielbarry on 12 Jun 2020

😄3 👍3 👀2

It's Joseph, author of that Roboflow blog post announcing Glenn Jocher's YOLOv5 implementation.

Our goal is to make models more accessible for anyone to use on their own datasets. Our evaluation on a sample task (BCCD) is meant to highlight tradeoffs and expose differences if one were to clone each repo and use them with little customization. Our post is not intended to be a replacement nor representative of a formal benchmark on COCO.

Sincere thanks to the community on your feedback and continued evaluation. We have published a comprehensive updated post on Glenn Jocher's decision to name the model YOLOv5 as well as exactly how to reproduce the results we reported.

Blog Post: Responding to the Controversy about YOLOv5: YOLOv4 Versus YOLOv5
Colab Notebook: YOLOv4 on our sample task
Colab Notebook: YOLOv5 on our sample task

@AlexeyAB called out very important notes above that we included in this followup post and updated in the original post. Cloning the YOLOv5 repository defaults to YOLOv5s, and the Darknet implementation defaults to "big YOLOv4." In our sample task, both these models appear to max out their mAP at 0.91 mAP. YOLOv5s is 27 MB; big YOLOv5l is 192 MB; big YOLOv4 is 245 MB. For inference speed, Glenn's YOLOv5 implementation defaults to batch inference and divides the batch time by the number of images in the batch, resulting the reported 140 FPS figure. YOLOv4 defaults to a batch size of 1. This is an unfair comparison. In the detailed update, we set both batch sizes to 1, where we see YOLOv4 achieves 30 FPS and YOLOv5 achieves 10 FPS.

Ultimately, we encourage trying each on one's own problem, and consider the tradeoffs based on your domain considerations (like ease of setup, complexity of task, model size, inference speed reqs). We published guides in the post to make that deliberately easy. And we will continue to listen on where the community lands on what exact name is best for Glenn Jocher's YOLOv5 implementation.

josephofiowa on 12 Jun 2020

👍8

YOLOv3-spp vs YOLOv4(leaky) vs YOLOv5 - with the same batch=32, each point - another test-network-resolution: https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/35#issuecomment-643257711

AlexeyAB on 12 Jun 2020

👍9

@josephofiowa Thank you for your blog post Responding to the Controversy about YOLOv5: YOLOv4 Versus YOLOv5. But I am a little confused about that you wrote. First, in the last sentence of the section "Comparing YOLOv4 and YOLOv5s Model Storage Size" you wrote like this:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

Then what about YOLOv5x?

Second, in the fourth sentence of the section "Comparing YOLOV4 and YOLOv5s Inference Time" you wrote like this:

On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (10 FPS).

It should be 100ms or 50 FPS for YOLOv5s, I might say.

Thank you for your post.

rcg12387 on 13 Jun 2020

👍3

@rcg12387 Why do you think they should know arithmetic? )

AlexeyAB on 13 Jun 2020

😄10

@rcg12387
Thanks for the model sizes question. We've updated the post to show all sizes:

Updated to include model size of all YOLOv5 models. v5x: 367mb, v5l 192mb, v5m 84mb, v5s 27MB. YOLOv5s is the model compared in this article. YOLOv4-custom refers to the model we have been testing throughout this post.

Thanks for your callout of the arithmetic error. It's corrected as is the accompanying graph:

On single images (batch size of 1), YOLOv4 inferences in 33 ms (30 FPS) and YOLOv5s inferences in 20ms (50 FPS). (Update June 14 12:46 PM CDT - In response to rcg12387's GitHub comment, we have corrected an error where we previously calculated YOLOv5 inference to be 10 FPS. We regret this error.)

Note: Glenn Jocher provided inference time updates and pushed an update to his repo so that times are reported as end-to-end latencies. We have included his comments in the post and pasted them below:

The times ... are not for batched inference, they are for batch-size = 1 inference. This is the reason they are printed to the screen one at a time, because they are run in a for loop, with each image passed to the model by itself (tensor size 1x3x416x416). I know this because like many other things, we simply have not had time to modify detect.py properly for batched inference of images from a folder.
One disclaimer is that the above times are for inference only, not NMS. NMS will typically add 1-2ms per image to the times. So I would say 8-9ms is the proper batch-size 1 end-to-end latency in your experiment, while 7 ms is the proper batch-size 1 inference-only latency.
In response to this I've pushed a commit to improve detect.py time reporting. Times are now reported as full end-to-end latencies: FP32 pytorch inference + posprocessing + NMS. I tested out the new times on a 416x416 test image, and I see 8 ms now at batch-size 1 for full end-to-end latency of YOLOv5s.

@AlexeyAB
We have included your COCO benchmark performance in the post as well. Thank you for providing this.

josephofiowa on 14 Jun 2020

👍3 😄1

@josephofiowa But you still don’t know what is the difference between Inference time and FPS )

AlexeyAB on 14 Jun 2020

The latest comparison: https://github.com/ultralytics/yolov5/issues/6#issuecomment-643823425

84604438-abf6ec80-aec8-11ea-8341-f4563ea51dbc

AlexeyAB on 14 Jun 2020

🚀1

@josephofiowa Thank you for your reply. I have read your updated post.
However, the sentence still remains in the updated post:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB.
Thanks.

rcg12387 on 15 Jun 2020

@josephofiowa Thank you for your reply. I have read your updated post.
However, the sentence still remains in the updated post:

The largest YOLOv5 is YOLOv5l, and its weights are 192 MB.

In order to avoid any confusion you should correct this sentence like this: The largest YOLOv5 is YOLOv5x, and its weights are 367 MB.
Thanks.

Yes, done. Thanks.

josephofiowa on 15 Jun 2020

👍1

@AlexeyAB Thanks. Following performance updates on ultralytics/yolov5#6.

As it is clear Glenn is going to continue to create performance updates (even in the time since the post went live and now) and eventually publish a paper, we will reference that thread in the post for where to find the most up-to-date performance discussion on the COCO benchmark.

josephofiowa on 15 Jun 2020

👍1

Just to throw a spanner in the works: https://github.com/joe-siyuan-qiao/DetectoRS and https://arxiv.org/pdf/2006.02334.pdf. They claim 73.5 AP50. (I know it has nothing to do with yolo and naming continuity)

pfeatherstone on 15 Jun 2020

@pfeatherstone
DetectoRS is 15x - 60x times slower than Yolo: https://arxiv.org/pdf/2006.02334.pdf

54.7 AP - slower than 1 FPS (test-time augmentaton)
51.3 AP - 3.9 FPS

So this is offtopic.

AlexeyAB on 15 Jun 2020

@pfeatherstone Please don't make a hasty conclusion. A merit of YOLO versions is their lightness and speed. Practitioners don't welcome non-realistic latency even though a model has a high precision. It's useless.

rcg12387 on 15 Jun 2020

@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.

pfeatherstone on 15 Jun 2020

@josephofiowa Hello,

Why you change input size of YOLOv5s from default 736x736 to 416x416 in your testing, and compare with YOLOv4 which use input size 608x608.

WongKinYiu on 16 Jun 2020

@WongKinYiu All images were resized to 416x416 in preprocessing in training and testing for both tests. The version of the BCCD dataset used is consistent.

josephofiowa on 16 Jun 2020

@AlexeyAB @WongKinYiu @josephofiowa thank you all for your updates. I'm trying to address a few shortcomings simultaneously here. I've started training a few panet-based modifications, so hopefully I'll have those results back in about a week, though I can't guarantee they'll be improved much since this is the first time I've tried this. In the meantime the simplest update I can do is to match test settings to the original efficientdet metrics shown in my readme plot, which are --batch-size 8 and FP16 inference.

As part of this process I've upgraded the entire v5 system from FP32 to FP16 for model storage and inference (test.py and detect.py) when the conditions permit (essentially when a CUDA device is available for inference). This should help produce a better apples-to-apples comparison, and luckily pytorch makes this easy by using the .half() operator.

Since all models are stored in FP16 now, one benefit is that all model sizes have shrunk by half in terms of filesizes. I've also added a second independent hosting system for the weights, so the auto-download functionality should be doubly redundant now, and availability should be enhanced hopefully in China, which seems to not have access to the default Google Drive folder.

The model sizes now span 14MB for s to 183MB for x now, and GPUs with tensor cores, like the T4 and V100 should see inference times (and memory requirements) roughly halved from before. Other GPUs will not see any speed improvement, but will enjoy the same reduced memory requirements. This is the new default, so no special settings are required to see these benefits.

@WongKinYiu and @AlexeyAB can you guys please generate the same curve at batch-size 8 with FP16 inference in order to overlay everything on one graph? Thank you!

glenn-jocher on 16 Jun 2020

@AlexeyAB I agree it's off topic. But this thread was comparing latency, FPS and accuracy. I thought i might include other non-yolo based models. Maybe that is more suited to a forum.

There are others in the same speed-accuracy neighborhood, like FCOS perhaps. Many people zoom in on the one mAP number at the exclusion of all else unfortunately. From a business perspective, if you offer me one model that is 10% better than another, but costs 10x more (in time or money), the choice is going to be obvious I believe.

glenn-jocher on 16 Jun 2020

@josephofiowa

It is interesting that 416x416 and 608x608 get same FPS on YOLOv4 in your testing.
In your COLab, totally same image, totally same ms, with 608x608 input resolution.

...

update: It should be \~50 fps for 416x416 input resolution.

WongKinYiu on 16 Jun 2020

👍1

@WongKinYiu I think sometimes end-to-end speeds may be dominated by other factors than convolution times, especially for smaller batch sizes.

glenn-jocher on 16 Jun 2020

@glenn-jocher

I post this result https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644483769 due to @josephofiowa says the result posted on his blog is from his COLab. However, there is no 416x416 testing in the COLab, there is only 608x608 testing in the COLab, and he says all images were resized to 416x416 in all testing https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644465808.

WongKinYiu on 16 Jun 2020

👍1

Yeah I just read through, can concur I couldn't find a 416x416 setup, seems 608x608 only.

danielbarry on 16 Jun 2020

@josephofiowa

YOLOv5:

!python train.py --img 416 --batch 16 --epochs 200 --data '../data.yaml' --cfg ./models/custom_yolov5s.yaml --weights '' --name yolov5s_results --nosave --cache

YOLOv4:

0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF

So YOLOv5 was trained on a 416x416 input size and YOLOv4 was trained on a 608x608 input size?

danielbarry on 16 Jun 2020

@danielbarry

Yes, from the COLab we can get following information.

default training size of YOLOv5s is 640x640 > change to 416x416
default testing size of YOLOv5s is 640x640 > change to 416x416
default training size of YOLOv4 is 512x512 > change to 608x608
default testing size of YOLOv4 is 608x608 > no change (608x608)

WongKinYiu on 16 Jun 2020

@WongKinYiu looks correct, except v5 default --img-size is the same 640 for everything (train, test, detect).

glenn-jocher on 16 Jun 2020

@WongKinYiu @danielbarry You are correct that the config was not modified from 608x608, yet the inference time was comparable to @WongKinYiu's finding. Perhaps Glenn's comment per small batch size is correct. The config has been updated and Colab is now re-running. (EDIT: This is completed and the post is updated.)

It is also worth noting regarding inference speeds and Glenn's FP16 update: Colab currently does not provide GPU resources that leverage Tensor Cores. It provides a P100, not V100. The mentioned inference speed increase will not be present in Colab.

Please note the Colabs do not intend to be an official benchmark, but rather an "off-the-shelf" performance that one might find cloning these repos. These should not influence the COCO benchmark metrics.

josephofiowa on 16 Jun 2020

@glenn-jocher thanks, updated https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644493225.

WongKinYiu on 16 Jun 2020

@josephofiowa yes, you are correct, colab P100's will not benefit from the fp16 change, but it also doesn't hurt them. Every once in a while a T4 will show up in colab that does benefit though :)

glenn-jocher on 16 Jun 2020

🚀1

Ok I've finished the corrected benchmarks. Models are all exactly the same, but inference is fp16 now, and testing has been tuned a bit to improve speed at the slight expense of some mAP lost, which I thought was a worthwhile compromise.

Most importantly, I believe this is a true apples to apples comparison now, with all models run at --batch 8 and fp16.

I'll probably want to adjust the plot bounds in the future, but plotting with the same exact bounds as my existing plot I get this:

study_mAP_latency
EDIT: removed 'latency' from x axis and modified units from ms to ms/img per feedback.
EDIT2: perhaps a more proper label would be 'GPU Time (ms/img)'?

glenn-jocher on 16 Jun 2020

@glenn-jocher

You have to modify the label of x-axis to 1/FPS or GPU_Latency/Batch_Size (ms).

update: Oh, I see your update, (ms/imgs) is also OK.

update: hmm... I think Time is better than Speed, but I am not sure which one is exactly good,
maybe just follow efficientdet and use 1/batch8_throughput?

WongKinYiu on 16 Jun 2020

what was the batch size ?

pfeatherstone on 16 Jun 2020

@glenn-jocher

Do you use fast nms mode? I get higher AP but lower FPS than what report in your figure.

WongKinYiu on 16 Jun 2020

Looking at that graph, it looks like yolov3-spp is still a serious contender for the belt

pfeatherstone on 16 Jun 2020

👍1

Also were they all trained using same optimisers, schedulers and hyperparameters? @glenn-jocher achieved higher AP with yolov3-spp By retraining with his repo. So it goes well beyond the Model architecture

pfeatherstone on 16 Jun 2020

@pfeatherstone

Bag of Freebies (BoF) (Mosaic, CIoU, CBN, ...) can be applied to any model regardless of repository - and improve accuracy: https://arxiv.org/pdf/2004.10934.pdf

AlexeyAB on 16 Jun 2020

@josephofiowa

You're a persistent master of unfair comparisons and forgery of data )

AlexeyAB on 16 Jun 2020

😄11

Just another thought, was yolov3-spp trained using the augmentation tools ? That should maybe be something else to consider when making fair comparisons. It's maybe a bit unfair comparing the performance of different architectures when some have been trained with 'better data'. Maybe COCO is sufficiently large and diverse that augmentation doesn't really help, but just another thought. Maybe, the fairest thing would be to use a single repo like mmdetection and all models are trained using exactly the same data preparation settings and hyper parameters.

pfeatherstone on 18 Jun 2020

Oh and another observation, yolov5 results are a bit worse if you don't use letterbox resizing. I haven't done a full evaluation on COCO dataset, just an observation based on a few images. So that's an additional thing to take into account as part of 'data preparation' when comparing models. Now maybe, 'data preparation', training hyperparameters, all the bag of freebies as @AlexeyAB puts it, are 'part of' the model, so you don't care how it was trained or how it prepares the data to make a fair comparison, all you need is same input size and same software/hardware environment. BUT, how do you know if a model has reached its full potential when comparing it against other models? How do you know if it has been optimally trained? yolov3-spp is a good example. Do i use the model trained by darknet or ultralytics. The latter has better AP. So do I treat them as different models or take the latter as the official stats. You might argue that to make a fair comparison, all models need to be trained in the exact same way using the exact same hyper-parameters. But one optimizer with one set of hyper-parameters might suit one model very well, but not another. I find this whole model comparison debate very tricky to digest as they are too many variables that can affect a model's performance. All of them can be re-evaluated in a slightly different environment and it is likely that you get very different graphs.

pfeatherstone on 19 Jun 2020

@pfeatherstone

If your paper proposed a new plugin module or architecture based on a baseline method, better to use totally same other setting for comparison. There are two usually used strategies: 1) following same setting as your baseline, e.g. CSPNet; 2) create new setting and run both of baseline and your method on this setting, e.g. ASFF.

If your paper proposed architectures, loss function, data augmentation, training method... You have to design complete ablation studies.

WongKinYiu on 19 Jun 2020

@pfeatherstone very good question! It sounds a lot like the nature vs nurture debate in humans, i.e. what proportion of your actions are determined by your genetics and what proportion are dictated by your upbringing, education, and experiences.

glenn-jocher on 19 Jun 2020

👎1

@glenn-jocher

Hello, the controversy of ultralytics/yolov5 is not about this https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-646734609.

@glenn-jocher did a lot for the development and improvements of Yolo and showed a lot of ideas...

This shows ultralytics bring huge contribution into YOLO community https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-642268465, but some other things destroy them in recently update.

in ultralytics/yolov5, it uses time-out in NMS. link

        if (time.time() - t) > time_limit:
            break  # time limit exceeded

It used for solving the training issue of https://github.com/ultralytics/yolov3/issues/1251. I think it have to use a is_training flag to drive it, or if time limit is reached when inference, users will get unexpected results.

wrong comparison.
This figure is not showing GPU latency, it shows average GPU inference time of batch-32 yolov5 models and batch-8 efficientdet models. The GPU latency of yolov5 models are 0.1s\~0.6s when batch size is 32.
The issue about this comparison is raised more than 1 week, but it still not be fixed now.
confused table
We can get the information of AP and speed testing from the description of the table, but we can not recognize the information of FLOPs.

hope above mentioned parts can be fixed soon.

And almost all of other controversy are raised by @josephofiowa 's Blog. Here I only list two of those.

inconsistent predicted results
It will gets different predicted results even you use same model, same weights, same input image, and same testing command in each inference.

In @josephofiowa Colab and Blog :

Can you imagine that an auto-driving car sometimes can see a pedestrian in front of it, but sometimes not? I think no one will buy this kind of auto-driving car.

Second, YOLOv5 is fast – blazingly fast. In a YOLOv5 Colab notebook, running a Tesla P100, we saw inference times up to 0.007 seconds per image, meaning 140 frames per second (FPS)! By contrast, YOLOv4 achieved 50 FPS after having been converted to the same Ultralytics PyTorch library.

Even at this time, ultralytics/yolov5 not yet support running YOLOv4, how @josephofiowa tested the speed of yolov5s and YOLOv4 using same Ultralytics PyTorch library in 10 days ago?

WongKinYiu on 20 Jun 2020

👍1

It can reduce about 0.3 ms to 0.8 ms of inference time of each input images, so it can make your chart beautiful. However, it will gets different predicted results even you use same model, same weights, same input image, and same testing command in each inference.

Wow, good spot. This is quite troubling. Would be nice to see this re-tested with a massive timeout value.

We can get the information of AP and speed testing from the description of the table, but we can not recognize the information of FLOPs. Also I guess it is GFLOPs since FLOPs must be an integer.

I think the B after the number is "billion", so the unit is BFLOPS.

Even at this time, ultralytics/yolov5 not yet support running YOLOv4, how @josephofiowa tested the speed of yolov5s and YOLOv4 using same Ultralytics PyTorch library in 10 days ago?

They didn't, it was tested in two different frameworks. I think this was really just meant as a "fair as can be" comparison without the shared framework, but of course this concerns me.

Even the weight file size comparison doesn't really make sense - it could literally just be a case of representation between the two frameworks.

danielbarry on 20 Jun 2020

What I find particularly confusing is the bar YOLOv4 is held up to when it comes to @josephofiowa 's comparisons with YOLOv5 in their update blog.

For training time, the comparison is YOLOv4 vs YOLOv5s
For max MAP, the comparison is YOLOv4 (custom?) vs YOLOv5s
For object detection accuracy, the comparison is YOLOv4 vs YOLOv5l
For model storage size, the comparison is YOLOv4 (darknet) vs YOLOv5 all version (torch)
For inference time, the comparison is YOLOv4 vs YOLOv5s

So, it's mostly a comparison of YOLOv4 vs YOLOv5s, unless it's object detection accuracy, and magically the comparison is YOLOv5l? It seems like the version of YOLOv5 which bests suites each test is picked. Why not just test all YOLOv5 models - why pick and choose which to compare with?

Potential way forwards: As YOLOv3 is the common model in both frameworks, to me it makes more sense to compare YOLOv4 and YOLOv5 against their respective YOLOv3 versions until a proper framework network port is complete. That way you can mostly shake out framework specific differences.

danielbarry on 20 Jun 2020

@danielbarry

I think the B after the number is "billion", so the unit is BFLOPS.

Thanks, yes it is billion, i correct the description of my comment.

What I find particularly confusing is the bar YOLOv4 is held up to when it comes to @josephofiowa 's comparisons with YOLOv5 in their update blog...

I think it is better to focus on how to make yolo become better, no matter it is ylovx.
There are too many mystery in @josephofiowa 's blogs... I have no time to find all of them.

WongKinYiu on 20 Jun 2020

👍1

@WongKinYiu It should be good that you correct your comment
You wrote:

wrong comparison.
This figure is not showing GPU latency, it shows average GPU inference time of batch-32 yolov5 models and batch-8 efficientdet models. The GPU latency of yolov5 models are 0.1s~0.6s when batch size is 32, this is also the reason why @josephofiowa ever got 10 fps results of yolov5s #5920 (comment).

10 fps was an error of @josephofiowa. He updated as 50 fps.

rcg12387 on 20 Jun 2020

@rcg12387 Thanks,

edit: I think i can not say "may" about others thinking, I will delete this sentence.

WongKinYiu on 20 Jun 2020

@WongKinYiu I just today updated the v5 readme table with FP16 speeds for all current models. New models are being trained with panet heads, I'm waiting for the last of these to finish before updating the table again and the chart this weekend (yolov5x takes a bit of time to train).

In any case, the values shown in the chart right now are _slower_ than the actual batch-8 FP16 speeds that I will update to, so the chart should only look better in the future.

The timeout you cite is perfectly normal. It's purpose is to prevent testing times from becoming burdensome during training, for example as in this issue: https://github.com/ultralytics/yolov3/issues/1251

I instituted this code in yolov3 to address this:
https://github.com/ultralytics/yolov3/blob/master/utils/utils.py#L489

And it's carried over in v5:
https://github.com/ultralytics/yolov5/blob/master/utils/utils.py#L545

The time limit is designed to interrupt execution of NMS operations if they exceed 10.0 full seconds per batch, saving users from suffering from extremely long testing times during training as in the issue above. It does not affect any of the results we are discussing, because all of the models I have run NMS in about 0.001-0.002 seconds per image, and batch sizes used during testing are 32. So at about 0.030-0.06 seconds of elapsed time per batch, the 10.0 second limit will never be approached here.

glenn-jocher on 20 Jun 2020

Just another thought, might be worth doing comparisons using same inference engine like onnxruntime. For GPU inference that might not make a difference because most repos use cudnn or tensorrt but for CPU inference that makes a huge difference. For example the CPU gemm implementation in darknet isn’t the fastest. In any case, using the same inference engine regardless of target device makes it a little bit more of a fair game. You might have to do NMS as a postprocessing CPU step though but that seems fine to me.

pfeatherstone on 20 Jun 2020

At the end of the day, all models output a tensor of shape [B,D,F] where B is batch size, D is the total number of candidate detections and F is the number of features equal to 85 for COCO. The features are exactly the same for all models and the post-processing NMS step is the same for everyone. So you can use the exact same ONNXRUNTIME code to infer every model. That seems like a fair play.

pfeatherstone on 20 Jun 2020

I'm sure there are already quite a few pytorch ports of yolov4 on github so the ONNX port wouldn't be a lot of work.

pfeatherstone on 20 Jun 2020

@glenn-jocher Hello,

I just today updated the v5 readme table with FP16 speeds for all current models. New models are being trained with panet heads, I'm waiting for the last of these to finish before updating the table again and the chart this weekend (yolov5x takes a bit of time to train).

Thanks for the information, waiting for your new results.

In any case, the values shown in the chart right now are _slower_ than the actual batch-8 FP16 speeds that I will update to, so the chart should only look better in the future.

Yes, I know. I also draw the new figure in https://github.com/AlexeyAB/darknet/issues/5920#issuecomment-644627655.

The time limit is designed to interrupt execution of NMS operations if they exceed 10.0 full seconds per batch, saving users from suffering from extremely long testing times during training as in the issue above. It does not affect any of the results we are discussing, because all of the models I have run NMS in about 0.001-0.002 seconds per image, and batch sizes used during testing are 32. So at about 0.030-0.06 seconds of elapsed time per batch, the 10.0 second limit will never be approached here.

Thanks for the reply, If it used for solving the problem of training, I think it have to use a is_training flag to drive it. I will update the comment to make time_limit and inconsistent predicted results into two problems. Do you have any idea about that why same images and same testing command will generating different predicted results? The serious difference is about 60% (15RBCs, 1WBCs -> 9RBCs, 1WBCs) in josephofiowa's testing.

WongKinYiu on 20 Jun 2020

Thanks for the reply, If it used for solving the problem of training, I think it have to use a is_training flag to drive it. I will update the comment to make time_limit and inconsistent predicted results into two problems. Do you have any idea about that why same images and same testing command will generating different predicted results? The serious difference is about 60% (15RBCs, 1WBCs -> 9RBCs, 1WBCs) in josephofiowa's testing.

Yes that's an interesting question. Inference is deterministic, I'm not aware of any randomness in the process that should cause different results for an image in a --source directory than calling it directly as --source file. If I run a quick test in colab I see the same results either way, and same speeds too since these are all batch-size 1 operations. It's likely different models may have been used to obtain different results.

Here you can see almost 50 FPS with a K80, Colab's slowest GPU.

Screen Shot 2020-06-20 at 8 52 30 AM

glenn-jocher on 20 Jun 2020

@glenn-jocher

Thanks, I will move it to controversy raised by josephofiowa temporally.
But one thing is for sure, different batch size inference will gets different AP on COCO.
It is better to check why it is happens.

And for the small model training, for example yolov5s, I suggest to use lower resolution.
ThunderNet shows that small models can not afford high resolution training. Also efficientdet scaling depth, width, and input size, while ultralytics/yolov5 only scaling depth and width.
Here is an example of cspnet (trained by ultralytics/yolov3), it gets 26.5AP with 238 FPS on 1080ti using batch size equals to 1, and it is trained/tested with 416x416 resolution. Which is much faster and more accurate than yolov5s trained with 640x640 and tested with 288x288 resolution.

WongKinYiu on 20 Jun 2020

@WongKinYiu yes there may be very slight variations in ultralytics mAP when using different batch sizes. This is normal though, and is caused by variations in padding used when constructing letterboxed batches. For example with batch 32 the first 16 images in the batch are padded like this, and the results are as shown here:

!python test.py --weights yolov5s.pt --data ./data/coco.yaml --img 640 --batch 32

Speed: 5.4/3.0/8.5 ms inference/NMS/total per 640x640 image at batch-size 32
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.459
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.296
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.358
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.618
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.700

test_batch0_pred

But if I use batch 1 then the image is all by itself, so it's padding is not guided by the rest of the images in the batch. In this case it will be padded more minimally. In my test most metrics are almost exactly the same, though it's possible a few may vary minimally between the two scenarios.

!python test.py --weights yolov5s.pt --data ./data/coco.yaml --img 640 --batch 1

Speed: 8.6/2.5/11.1 ms inference/NMS/total per 640x640 image at batch-size 1
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.352
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.397
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.459
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.296
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.359
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.619
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

test_batch0_pred (1)

glenn-jocher on 20 Jun 2020

@WongKinYiu also yes, efficientdet does scale image size, while I do all training at 640. Their D0 is trained at 512, with the rest of the sizes at 512 + D * 128. By D7 they are training at 1536 pixels (!).

This is fantastic for rich people with free access to millions of dollars in hardware to train with, but in the real world for the rest of us we get this:
https://github.com/google/automl/issues/85#issuecomment-623709815

glenn-jocher on 20 Jun 2020

👍1

@glenn-jocher Thanks,

If results will change when using different batch size, it means the order of input data will effect the results. I think there are two possible solution, 1) always use batch-1 to get results, 2) always padding to full rectangle, e.g. 512x512, 640x640...

This is fantastic for rich people with free access to millions of dollars in hardware to train with, but in the real world for the rest of us we get this...

Yes, it also the reason why I only suggest use lower resolution in small model training.
Most of people can not afford large resolution training.
And another reason is, big model can learn low resolution well, but small model can not learn high resolution well.

WongKinYiu on 21 Jun 2020

Is that YOLOv5 real? afaik people have been talking about v4 on YouTube and most forums

SkuzzyxD on 24 Jun 2020

Hello there,

Well, yes I've seen the -fake- news.

As someone made hundreds of tests on YOLOv4, I can confirm that YOLOv5 is no way related to the Alexey's beautiful work. (my tests; https://www.youtube.com/user/Canonest/videos )

Some people are just trying to ride the hype, created by hardwork of original publishers.

Ignore the YOLOv5 (unless it has been published by Alexey in the future) and focus on YOLOv4!

CSTEZCAN on 25 Jun 2020

👎5 👍4

Well I wouldn’t say the yolov5 work should be ignored. Particularly yolov5s. That’s where the focus should be in my opinion as it is a good candidate for replacing yolov3-tiny due to inference speed and improved accuracy. If your interests lie in CPU friendly models then yolov5s is one is the best ones out there.

pfeatherstone on 25 Jun 2020

Maybe @glenn-jocher should have branded yolov5 differently to avoid controversy. At the end of the day, pick the one that suits your needs best, I.e performance requirements and your custom dataset.

pfeatherstone on 25 Jun 2020

@pfeatherstone
There is YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti): https://github.com/AlexeyAB/darknet/issues/6067

AlexeyAB on 25 Jun 2020

👍15 ❤6

Thanks for the update. It feels like there is competition in the YOLO market...

pfeatherstone on 26 Jun 2020

I just found out about the controversy believing that YOLOv5 was an upgraded version of YOLOv4

Kreijstal on 29 Jun 2020

👀2

python pytorch is popular. its a trend to use pytorch to train darknet model. differnet training system always make me confusing, for example efficientdet in tensorflow | pytorch and darknet backward grad in yolo layer

HardLaugh on 28 Jul 2020

YOLOv4 training and inference on different frameworks / libraries:

Pytorch-implementations:

TensorFlow: https://github.com/hunglc007/tensorflow-yolov4-tflite

OpenCV (YOLOv4 built-in OpenCV): https://github.com/opencv/opencv

TensorRT: https://github.com/ceccocats/tkDNN

Tencent/NCNN: https://github.com/Tencent/ncnn

TVM https://tvm.ai/about

OpenDataCam: https://github.com/opendatacam/opendatacam#-hardware-pre-requisite

BMW-InnovationLab - Training with YOLOv4 has never been so easy (monitor it in many different ways like TensorBoard or a custom REST API and GUI):

Training: https://github.com/BMW-InnovationLab/BMW-YOLOv4-Training-Automation
Inference on GPU: https://github.com/BMW-InnovationLab/BMW-YOLOv4-Inference-API-GPU
Inference on CPU: https://github.com/BMW-InnovationLab/BMW-YOLOv4-Inference-API-CPU

AlexeyAB on 28 Jul 2020

👍2

@AlexeyAB why use darknet to train models rather than pytorch? You’re time must be split between research and maintaining/updating darknet. Not trying to be funny or make a point, just trying to understand the reasoning. Wouldn’t you be more productive if you could just focus on models rather than fixing bugs or creating new layers in darknet ?

pfeatherstone on 28 Jul 2020

By the way, using darknet is also a great solution as a minimal inference framework on CPU as it can have very minimal dependencies. So I can see reasons from a personal point of view.

pfeatherstone on 28 Jul 2020

This has arrived https://arxiv.org/pdf/2007.12099v2.pdf. Another flavour of yolo...

pfeatherstone on 31 Jul 2020

@pfeatherstone https://github.com/AlexeyAB/darknet/issues/6350

AlexeyAB on 31 Jul 2020

@AlexeyAB thanks. Soz for the duplication

pfeatherstone on 31 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings