Javacpp-presets: UMat (GPU) functions in multithreading environment: how to synchronize correcrly?

Created on 10 Dec 2018 · 17Comments · Source: bytedeco/javacpp-presets

I detected strange behavior of many JavaCPP OpenCV functions, working with UMat, in our multithreading system. I have some number of function (some "network", representing an algorithm), that pass data (UMat or Mat) from one to another, and I run all them in several threads via Java standard technique parallelStream / forEach Of course, all function are thoroughly synchronized: if some function needs results of other, it waits until it will finish with help of standard Java synchronization ("synchronized" blocks). But sometimes several functions can work in parallel, and it allows to increase performance.

All worked fine, while I used Mat objects and usual Java memory (arrays / byte buffers). But when I use functions with UMat arguments (i.e. GPU), the system sometimes fails: the results are incorrect! In my last example, I used call "opencv_core.max(matA, matB, result)", where all matrices are UMat, and matA/matB are generated by conversion Mat -> UMat (for example, just loaded from file). Sometimes - rarely, but often enough - the result of max call essentially differs from the correct result, in particular, sometimes it is just zero matrix. It seems that one thread doesn't "see" that data, prepared in other CPU core, if it needs them for transferring into GPU (copying Mat -> UMat).

Can you tell me, maybe, parallel usage of UMat require some additional synchronization, that Java cannot provide automatically? Is it possible to freely use UMat from multiple threads, if they are synchronized in usual way, as any other Java objects? Or I need to make additional efforts to synchronize GPU objects? If so, what functions should I use for this in Javacpp?

question

Source

Daniel-Alievsky

All 17 comments

It's entirely possible that the OpenCL backend isn't thread safe. The
OpenMP one isn't, so that's why I disabled it and linked with pthreads
directly instead.

saudet on 10 Dec 2018

Unfrortunately I didn't use pthreads, so I didn't understand your last words well ("linked with pthreads").

Do I understand correctly that multi-threading is _absolutely forbidden_ while using JavaCPP UMat functions? Is there any way to synchronize GPU data in parallel threads by functions, available from JavaCPP? Advise me please.

Java is multi-thread-oriented platform, it creates threads frequently and silently - for user interface, for GC, for servlets and other server API, etc. It is almost impossible to guarantee that all functions are executed in the _single_ thread. Of course, I can reorder all calculations so that UMat objects are not shared between different threads - some thread takes Mat, converts it into UMat, process by different functions, then converts the result into Mat and returns for usage in multi-threading environment - but will it be enough?

Daniel-Alievsky on 11 Dec 2018

This has nothing to do with Java or JavaCPP. Those are issues solely with
OpenCV itself. Please report this upstream, and hopefully they will fix
this!

saudet on 11 Dec 2018

I understand. So, can you just tell me yes or no? Can I use current OpenCV, built-in Javacpp JARs, in such a manner - every thread takes Mat, converts it into UMat, process it, then converts the result back into Mat and returns for usage in multi-threading environment? Or I must guarantee that all UMat functions are called from the only single thread for all JVM (probable some single-thread pool, shared by all classes)?

Daniel-Alievsky on 11 Dec 2018

From what you're saying above, it doesn't look possible at the moment, no. But that's something you need to ask the developers of OpenCV, not me.

saudet on 11 Dec 2018

Thank you very much.

Daniel-Alievsky on 11 Dec 2018

Could you look at the comment of alalek on opencv issue? He writes that we usually _can_ use UMat if multithreading environment, but I must call clFinish() before passing UMat to another thread. I searching in Javacpp for this function, but didn't find.

Daniel-Alievsky on 11 Dec 2018

👍1

There's a cv::ocl::finish() function here that probably does the same thing, @alalek?
http://bytedeco.org/javacpp-presets/opencv/apidocs/org/bytedeco/javacpp/opencv_core.html#finish--

saudet on 12 Dec 2018

I'm assuming your application works fine with cv::ocl::finish(), but let me know if this isn't the case. Thanks!

saudet on 15 Dec 2018

Unfortunately not: opencv_core.finish() doesn't solve this problem completely. In a simple test, yes, it helps: results become correct. But in more complex project, where I have hundreds of functions in several chains (networks), some of them are multithreaded, - unfortunately, I often see "black" matrix instead of the correct one.

Now I solve this problem by enforced copying all UMat into usual Mat at the end of calculation in every thread, before passing data to another threads. This method, yes, does resolve the problem at the expense of some decreasing performance. But opencv_core.finish(), called in the same place, - does not.

Daniel-Alievsky on 16 Dec 2018

I'm testing this in JavaCPP 1.4 (OpenCV 3.4.0), but, as I think, this method did not change behavior in later versions?

Daniel-Alievsky on 16 Dec 2018

This version is quite old. Chances are good that this has been fixed in more recent versions, yes.

saudet on 17 Dec 2018

Unfortunately not. I recompiled all our system for using JavaCPP 1.4.4-SNAPSHOT with OpenCV 4.0.0-1.4.4-SNAPSHOT, and results are the same. Simple chain of functions works correctly, if I call opencv_core.finish() (at the end of every function), but complex network of functions with multi-threading does not work. At the same time, copying UMat to Mat (at the end of every function) resolves the problem.

It seems that all this complex testing was unnecessary: in C++ source code I see the same implementation of finish() function in ocl.cpp in 3.3.0 and 4.0.0

void finish()
{
    Queue::getDefault().finish();
}

And Queue.finish() method look similar in both versions: it calls clFinish()

Daniel-Alievsky on 17 Dec 2018

Ok, so this is still an issue. You'll need to find a way to reproduce this
outside your system to have the developers of OpenCV look at it. They don't
typically test with multiple threads, so they need help!

saudet on 18 Dec 2018

I can say more. I tried to call opencv_core.finish() in other placed, for example, after each passing data or after finishing every function (not depending on using GPU). And it begins to throw exceptions like this:

OpenCV Error: Unknown error code -220 (OpenCL error CL_INVALID_COMMAND_QUEUE (-36) during call: clEnqueueWriteBuffer(q, (cl_mem)u->handle, CL_TRUE, dstrawofs, total, alignedPtr.getAlignedPtr(), 0, 0, 0)) in cv::ocl::OpenCLAllocator::upload, file C:\projects\bytedeco\javacpp-presets\opencv\cppbuildwindows-x86_64\opencv-3.4.0\modules\core\src\ocl.cpp, line 5352

It seems it is not a safe function, that I can freely call anytime (like "flush" methods of file streams etc.) Do you think, it is a problem that I should report to OpenCV? Or it is just behavior of all OpenCL subsystem?

Daniel-Alievsky on 18 Dec 2018

Sounds like an issue with your OpenCL implementation more than anything else. Try a different one before reporting to OpenCV.

saudet on 19 Dec 2018

👍1

JavaCPP/OpenCV is an universal library for developing software, in particular, commercial. We cannot be sure what software and drivers will be installed on the customer's computer, besides our own. Of course, every application can require some OS version with all last updates (installed by OS automatically, some CPU performance and memory volume, but, I believe, nothing more... If some function is dangerous in this relation (can destroy application), it just means that we must not use it or must understand, in which situations we can use it safely.

Daniel-Alievsky on 19 Dec 2018

Was this page helpful?

0 / 5 - 0 ratings