It is not a bug, but question about new feature. After some experiments with Caffe and opencv_dnn I have found that for a present moment Caffe with CUDA performs forward propagation (in average, across different networks) 25 times faster than the opencv_dnn with LAPACK or OPENCL. So, it is evident that CUDA gives great speed advantage in this task. Could anybody add CUDA backend to opencv_dnn?
@pi-null-mezon, we've added Halide backend since the issue was opened. It let us choose OpenCL computational target and run networks on GPU (even NVidia). We'll experiment with CUDA target and compare efficiency later.
On the other hand, default CPU efficiency has been dramatically improved last time. You may see efficiency comparison at table.
Hello @dkurt! Thanks for the good news! Am I right that according to the performance table, you've provided above, the fastest backend now is DNN C++ but not DNN Hallide?
@pi-null-mezon, You are right. In most cases default backend is more efficient on CPU. But Halide backend is now one and only way to run models on GPU. So if you have powerful GPU on board you can use OpenCV to run networks on it.
@dkurt, how can I switch opencv_dnn backend from C++ to Halide if I am working on Windows? Am I right that I need to download Halide binaries and rebuild opencv with some kind of USE_HALLIDE flags turned on?
@pi-null-mezon, unfortunately, the worst thing is LLVM and there is no pre-compiled LLVM binaries. But you may try to use truncated version of it (I've downloaded it by svn co on Ubuntu). We have some instruction for Windows in tutorial How to enable Halide backend for improve efficiency.
As far as I remember, we have no Halide in our testing system for Windows, Linux with OpenCL only. So we could miss some bugs there. Anyway, you may create an issue if something wont work out.
@dkurt hello! Finally I have build Opencv with Halide on Windows. At least it works, but one thing I can not find in the tutorials is how to make a selection between different GPUs on machine to perform calculations. For the instance I've got two GPU: Intel HD Graphics and AMD Radeon. How can I force Opencv to use particular one?
@pi-null-mezon, according to Halide documentation, you may select device id just by environment variable: export HL_GPU_DEVICE=1 for Linux or set HL_GPU_DEVICE=1 for Windows. I tested locally that it switches either between CPU and GPU (in short words between devices of clinfo output on Linux).
@dkurt thanks! GPU computations work! But results after dnn::net::forward() are not similar to CPU version. I need to make more tests and maybe will open new issue. Thanks!
@pi-null-mezon how did your tests work out? I'm wondering if I should bother putting in the effort to build the halide back end on Windows.
@TechnikEmpire you definitely should try it, but watch out #9530
@pi-null-mezon Cool thanks, but if it fails completely with GPU backend then that sort of defeats the purpose for me. I get a decent framerate using default backend and CPU with yahoo nsfw model, but I'm looking for a portable way to try and speed that up on the GPU when available. Last time I checked, the halide backend on CPU didn't perform as well.
@dkurt thanks! GPU computations work! But results after dnn::net::forward() are not similar to CPU version. I need to make more tests and maybe will open new issue. Thanks!
you run GPU computations work . Did you call cv::dnn::Net::setHalideScheduler ? . I skipped call setHalideScheduler and it crash.
Does it work out of the box? How do we configure the CUDA backend for this?
Everyone here - stop messing about with CUDA and Halide and just use the inference engine, which is now open source.
This is the best possible performance you can squeeze out of DNN and it does not disappoint.
@TechnikEmpire, IE cannot run deep learning models on NVIDIA GPUs. And OpenCV for now have no CUDA backend as well. One of the possible ways is to test Halide backend with CUDA target.
@dkurt Yeah I know, I was just throwing it out there that the IE is a very good, well optimized back end targeting CPU. Was letting people know because I was blown away by the performance. I realize a GPU accelerated back end can still out-perform a CPU backend.
We plan on leading a Google Summer of Code project to add a GPU backend for DNN. If you can help, see the idea page for OpenCV GSoC
This issue can now be closed. CUDA support was merged two days ago into master.
Related PR: https://github.com/opencv/opencv/pull/14827
@r0l1, sure. Thank you!
Most helpful comment
We plan on leading a Google Summer of Code project to add a GPU backend for DNN. If you can help, see the idea page for OpenCV GSoC