Hi,
This is literary the same issue as the previously resolved #629, but on macOS.
Namely, on my Linux box everything works well, according to the solution from #629. However, on my macOS laptop, the duplicate libiomp issue arises because javacpp's dnnl jar bundles libiomp.dylib and tries to load it, but my system's MKL have already loaded it.
Interestingly, the same setup on Linux has no issues.
I have set up jvm options, and they work on Linux, but seem to be ignored on macOS?
"-Dorg.bytedeco.javacpp.mklml.load=mkl_rt"
"-Dorg.bytedeco.javacpp.pathsfirst=true"
Another hint: when I go into my local maven ropository's javacpp dnnl jar, and butcher it by deleting libiomp.dylib and repackaging it again, this issue disappears.
Any ideas how to approach this?
libiomp.dylib isn't getting bundled. Where did you see that?
A typo: it's libomp.dylib in .m2/repository/org/bytedeco/dnnl/1.5-1.5.3
But everything else stands: when I remove it, the problem disappears.
It's a dependency of libdnnl.1.dylib. It won't be able to load without it.
Update: the linux binary bundles libgomp.so, while Windows doesn't bundle anything.
Regarding it being dependency of libdnnl, do you have idea why it isn't a problem on linux, while it is a problem on mac?
Also, is there a way to tell javacpp to not load it, similarly to the org.bytedeco.javacpp.mklml.load=mkl_rt property?
Not loading it will result in DNNL not loading. It won't work.
No, I do not know why it seems to work fine on Linux.
Interestingly, 1.3-1.5.3 does not bundle it.
The proper solution is probably to build DNNL without OpenMP, using TBB instead.
If you have some time to give it a try, please do!
Now it's 3AM here, so not now, but otherwise yes. Can you point me to the proper place to change that?
Also, did you see my message about openmp not being bundled in 1.3 and earlier versions? Was it introduced in DNNL upstream, or it's a javacpp-specific change?
In the cppbuild.sh script: https://github.com/bytedeco/javacpp-presets/blob/master/dnnl/cppbuild.sh
It's also possible to build without OpenMP or TBB, that's what it does on Mac by default.
Now I'm confused. It's building it without OMP and TBB, yet, the lib is bundled?
You can check old versions of the cppbuild.sh in the repository as well. It's not something that changes only upstream, no.
No, it's building with OpenMP. That's why it's bundling the library, otherwise it wouldn't be able to load.
You might be OK without OpenMP or TBB, but some users actually use DNNL for performance reasons, so I don't think it would be a good idea to just disable OpenMP. I think using TBB would be satisfactory, so let's please do that.
Now I see that this change was made recently:
https://github.com/bytedeco/javacpp-presets/commit/a65e4f65e167906e5791b2ca8cd9e636ab5e0b92#diff-a614b90320b01007ddc3bc0ec9ef679f
If I understood correctly, previously it did not bundle OpenMP, and there were performance issues (on mac or on all systems)?
I definitely care about performance, too. In my experience, previous builds on Linux were equally performant as the latest (might be different on mac).
At least we identified when this issue appeared, and you gave me valuable pointer about TBB. Will try to make it work in a few days, and will report here (or cry for help :)
Ok, I've made the switch. Please give it a try with 1.5.1-1.5.4-SNAPSHOT. Thanks!
Hi @saudet
It works better now. I still have some issue on macOS, but that is probably DNNL, instead of javacpp. I've looked at the recent changes, and -liomp switch is still there, and the only change seems to be DNNL 1.5 -> 1.5.1 Can you be more specific about what switch you are referring to, so I can investigate further?
Thank you!
Just the switch from OpenMP to TBB. There's no more OpenMP or -lomp for that matter.
I was expecting that, but looking at this:
https://github.com/bytedeco/javacpp-presets/tree/master/dnnl
The latest change seems to be 4 days ago, and the relevant macOS line from the cppbuild.sh is still:
"$CMAKE" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$INSTALL_PATH -DCMAKE_INSTALL_LIBDIR="lib" -DCMAKE_C_FLAGS="-I/usr/local/include -L/usr/local/lib -lomp" -DCMAKE_CXX_FLAGS="-I/usr/local/include -L/usr/local/lib -lomp" -DARCH_OPT_FLAGS='' -DMKLDNN_BUILD_EXAMPLES=OFF -DMKLDNN_BUILD_TESTS=OFF .
make -j $MAKEJ
I must be looking at the wrong place?
Hm, something is off here. On Linux, 1.5.1-1.5.4-SNAPSHOT became almost twice as slow in my examples as 1.5-1.5.4-SNAPSHOT.
Did you make the switch to TBB on macOS only, or on all systems (Linux and Windows were not affected by the OpenMP loading issue previously).
I must be looking at the wrong place?
Yes, you are, see commit https://github.com/bytedeco/javacpp-presets/commit/083ec084da5e3d063af63118e0c2e96b20d00281.
Hm, something is off here. On Linux, 1.5.1-1.5.4-SNAPSHOT became almost twice as slow in my examples as 1.5-1.5.4-SNAPSHOT.
It's possible that TBB is slower. Please ask upstream about this, and figure out what you'd like to do.
I know what I'd like. I was happy with OpenMP on Linux, I am happy with TBB on macOS :)
Is it possible to keep OpenMP on Linux and Windows, and only fix the macOS conflict by using TBB on macOS?
Why? You're saying it's not working for you on Mac anyway, so if you're happy with OpenMP everywhere, we'll switch back to that! Please make up your mind or I will make it up for you.
I guess I didn't explain this well. The TBB switch fixed the OpenMP loading issue on macOS.
Of course, it doesn't use OpenMP anymore. In any case, I'll leave it at TBB for now, and when you figure out how you want it exactly over the next few days/weeks, we'll repatch that accordingly. Sounds good?
Sure! Thanks!
I propose to make the change as minimal as possible, and just fix the macOS issue with minimal changes, without changing what worked well before (Linux and Windows).
The build that introduced the macOS issue by explicitly linking lomp is this one: https://github.com/bytedeco/javacpp-presets/commit/a65e4f65e167906e5791b2ca8cd9e636ab5e0b92#diff-b3cd65e7169d82012ff0d6582f35df4d
My suggestion is to:
This way, DNNL will work well on Linux and Windows, and the issue will be resolved on macOS.
Why it is (in my opinion) better than the total switch to TBB on all systems: OpenMP is considerably faster for moderate work sizes and TBB seems to not catch up completely for even larger workloads. This is something I confirmed on my machine, but I've also thrawled discussions on Intel forums that confirm that. Even if TBB had slight advantage somewhere, OpenMP is the default for DNN, so I guess it should not be changed unless really necessary (as the case on macOS).
Why I argue against the explicit linking to OpenMP (or TBB) on Linux and Windows: The macOS issue was introduced by explicit linking. Linux used multiple threads through OpenMP with the old settings that did not mention OpenMP explicitly. I am not sure why and how, but I am 100% positive that on my machines DNNL used all cores and the speed was as expected with this minimal setting (and considerably faster than TBB). I also tested and confirmed that by simply switching back and forth between 1.5-1.5.4-SNAPSHOT and 1.5.1-1.5.4-SNAPSHOT.
You're saying the build doesn't work for you on Mac anyway, so please get that working first, and we'll see after that.
It turns out that libomp.dylib is the same library as libiomp5.dylib. I've reverted to OpenMP, hacked the build on Mac to use libiomp5.dylib and it seems to work fine with MKL for me. Please let me know if you still encounter any issues though. Thanks!
@saudet I can confirm that it works on Linux and on macOS (albeit strangely it clashes on mac when I run my JVM from emacs, but I can live with that constraint; when I start JVM from terminal, it loads the libs without complaints).
Thank you so much. As far as I'm concerned, this issue can be closed.
Workaround included in version 1.5.4. Thanks for reporting and for testing this!
Most helpful comment
Now I see that this change was made recently:
https://github.com/bytedeco/javacpp-presets/commit/a65e4f65e167906e5791b2ca8cd9e636ab5e0b92#diff-a614b90320b01007ddc3bc0ec9ef679f
If I understood correctly, previously it did not bundle OpenMP, and there were performance issues (on mac or on all systems)?
I definitely care about performance, too. In my experience, previous builds on Linux were equally performant as the latest (might be different on mac).
At least we identified when this issue appeared, and you gave me valuable pointer about TBB. Will try to make it work in a few days, and will report here (or cry for help :)