Javacpp-presets: UMat (GPU) functions in multithreading environment: how to synchronize correcrly?

Created on 10 Dec 2018  路  17Comments  路  Source: bytedeco/javacpp-presets

I detected strange behavior of many JavaCPP OpenCV functions, working with UMat, in our multithreading system. I have some number of function (some "network", representing an algorithm), that pass data (UMat or Mat) from one to another, and I run all them in several threads via Java standard technique parallelStream / forEach Of course, all function are thoroughly synchronized: if some function needs results of other, it waits until it will finish with help of standard Java synchronization ("synchronized" blocks). But sometimes several functions can work in parallel, and it allows to increase performance.

All worked fine, while I used Mat objects and usual Java memory (arrays / byte buffers). But when I use functions with UMat arguments (i.e. GPU), the system sometimes fails: the results are incorrect! In my last example, I used call "opencv_core.max(matA, matB, result)", where all matrices are UMat, and matA/matB are generated by conversion Mat -> UMat (for example, just loaded from file). Sometimes - rarely, but often enough - the result of max call essentially differs from the correct result, in particular, sometimes it is just zero matrix. It seems that one thread doesn't "see" that data, prepared in other CPU core, if it needs them for transferring into GPU (copying Mat -> UMat).

Can you tell me, maybe, parallel usage of UMat require some additional synchronization, that Java cannot provide automatically? Is it possible to freely use UMat from multiple threads, if they are synchronized in usual way, as any other Java objects? Or I need to make additional efforts to synchronize GPU objects? If so, what functions should I use for this in Javacpp?

question

All 17 comments

It's entirely possible that the OpenCL backend isn't thread safe. The
OpenMP one isn't, so that's why I disabled it and linked with pthreads
directly instead.

Unfrortunately I didn't use pthreads, so I didn't understand your last words well ("linked with pthreads").

Do I understand correctly that multi-threading is _absolutely forbidden_ while using JavaCPP UMat functions? Is there any way to synchronize GPU data in parallel threads by functions, available from JavaCPP? Advise me please.

Java is multi-thread-oriented platform, it creates threads frequently and silently - for user interface, for GC, for servlets and other server API, etc. It is almost impossible to guarantee that all functions are executed in the _single_ thread. Of course, I can reorder all calculations so that UMat objects are not shared between different threads - some thread takes Mat, converts it into UMat, process by different functions, then converts the result into Mat and returns for usage in multi-threading environment - but will it be enough?

This has nothing to do with Java or JavaCPP. Those are issues solely with
OpenCV itself. Please report this upstream, and hopefully they will fix
this!

I understand. So, can you just tell me yes or no? Can I use current OpenCV, built-in Javacpp JARs, in such a manner - every thread takes Mat, converts it into UMat, process it, then converts the result back into Mat and returns for usage in multi-threading environment? Or I must guarantee that all UMat functions are called from the only single thread for all JVM (probable some single-thread pool, shared by all classes)?

From what you're saying above, it doesn't look possible at the moment, no. But that's something you need to ask the developers of OpenCV, not me.

Thank you very much.

Could you look at the comment of alalek on opencv issue? He writes that we usually _can_ use UMat if multithreading environment, but I must call clFinish() before passing UMat to another thread. I searching in Javacpp for this function, but didn't find.

There's a cv::ocl::finish() function here that probably does the same thing, @alalek?
http://bytedeco.org/javacpp-presets/opencv/apidocs/org/bytedeco/javacpp/opencv_core.html#finish--

I'm assuming your application works fine with cv::ocl::finish(), but let me know if this isn't the case. Thanks!

Unfortunately not: opencv_core.finish() doesn't solve this problem completely. In a simple test, yes, it helps: results become correct. But in more complex project, where I have hundreds of functions in several chains (networks), some of them are multithreaded, - unfortunately, I often see "black" matrix instead of the correct one.

Now I solve this problem by enforced copying all UMat into usual Mat at the end of calculation in every thread, before passing data to another threads. This method, yes, does resolve the problem at the expense of some decreasing performance. But opencv_core.finish(), called in the same place, - does not.

I'm testing this in JavaCPP 1.4 (OpenCV 3.4.0), but, as I think, this method did not change behavior in later versions?

This version is quite old. Chances are good that this has been fixed in more recent versions, yes.

Unfortunately not. I recompiled all our system for using JavaCPP 1.4.4-SNAPSHOT with OpenCV 4.0.0-1.4.4-SNAPSHOT, and results are the same. Simple chain of functions works correctly, if I call opencv_core.finish() (at the end of every function), but complex network of functions with multi-threading does not work. At the same time, copying UMat to Mat (at the end of every function) resolves the problem.

It seems that all this complex testing was unnecessary: in C++ source code I see the same implementation of finish() function in ocl.cpp in 3.3.0 and 4.0.0

void finish()
{
    Queue::getDefault().finish();
}

And Queue.finish() method look similar in both versions: it calls clFinish()

Ok, so this is still an issue. You'll need to find a way to reproduce this
outside your system to have the developers of OpenCV look at it. They don't
typically test with multiple threads, so they need help!

I can say more. I tried to call opencv_core.finish() in other placed, for example, after each passing data or after finishing every function (not depending on using GPU). And it begins to throw exceptions like this:

OpenCV Error: Unknown error code -220 (OpenCL error CL_INVALID_COMMAND_QUEUE (-36) during call: clEnqueueWriteBuffer(q, (cl_mem)u->handle, CL_TRUE, dstrawofs, total, alignedPtr.getAlignedPtr(), 0, 0, 0)) in cv::ocl::OpenCLAllocator::upload, file C:\projects\bytedeco\javacpp-presets\opencv\cppbuildwindows-x86_64\opencv-3.4.0\modules\core\src\ocl.cpp, line 5352

It seems it is not a safe function, that I can freely call anytime (like "flush" methods of file streams etc.) Do you think, it is a problem that I should report to OpenCV? Or it is just behavior of all OpenCL subsystem?

Sounds like an issue with your OpenCL implementation more than anything else. Try a different one before reporting to OpenCV.

JavaCPP/OpenCV is an universal library for developing software, in particular, commercial. We cannot be sure what software and drivers will be installed on the customer's computer, besides our own. Of course, every application can require some OS version with all last updates (installed by OS automatically, some CPU performance and memory volume, but, I believe, nothing more... If some function is dangerous in this relation (can destroy application), it just means that we must not use it or must understand, in which situations we can use it safely.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

eix128 picture eix128  路  23Comments

blueberry picture blueberry  路  34Comments

archenroot picture archenroot  路  23Comments

archenroot picture archenroot  路  29Comments

Neiko2002 picture Neiko2002  路  32Comments