Hi,
I just tried to build LightGBM with the Java wrapper in OS X and the build finishes OK, but te resulting jar does not include the needed dylib libraries:
java.lang.UnsatisfiedLinkError: Could not load the native libraries
because we encountered the following problems:
no _lightgbm in java.library.path and Could not find resource /com/microsoft
/ml/lightgbm/osx/x86_64/lib_lightgbm.dylib in jar
And
java.lang.UnsatisfiedLinkError: Could not load the native libraries
because we encountered the following problems:
no _lightgbm_swig in java.library.path and Could not find resource /com/microsoft
/ml/lightgbm/osx/x86_64/lib_lightgbm_swig.dylib in jar
Any plans to support this feature for OS X by editing CMakeLists.txt? I am not a very experienced Mac developer by myself, and I don't "fully" understand dyanmic libs in OS X...
Thank you!
@imatiach-msft any ideas about swig in oxs ?
@guolinke @julioasotodv I've replied here:
https://github.com/Azure/mmlspark/issues/290
"sorry, we only test running spark on linux right now (windows and OS X are currently not tested). I added the SWIG Java wrappers to LightGBM, the logic to construct the jar file, and built and uploaded the jar to maven central. Prior to uploading to maven I should probably build lightGBM on OS X and add the related osx/x86_64/lib_lightgbm.dylib files to the jar."
@guolinke can the lightgbm repo be built in osx? Are dylib files created as a result? The issue is just that when the native files are packaged in the jar besides linux/x86_64/*.so files we also need a separate folder for osx/x86_64/*.dylib it seems? I'm not familiar with osx dylib files.
it seems that osx can also create/load .so files from linux (?) - if so it might just be an issue in the code that tries to load the native files
@imatiach-msft
we use this to convert the dylib to so:
https://github.com/Microsoft/LightGBM/blob/master/CMakeLists.txt#L135-L141 (default is off : https://github.com/Microsoft/LightGBM/blob/master/CMakeLists.txt#L29-L31)
maybe similar conversion is needed in https://github.com/Microsoft/LightGBM/blob/master/CMakeLists.txt#L160-L170
I am not familiar with OSX too..
@julioasotodv can you try to load the .so file (ml/lightgbm/linux/x86_64/lib_lightgbm.so) ?
@guolinke That worked, but I couldn’t do the same with lib_lightgbm_swig.dylib
@imatiach-msft any idea about lib_lightgbm_swig.dylib ?
@guolinke I think I will need to change the jar packaging logic to add dylib files too
@julioasotodv what was the error when you tried to load lib_lightgbm_swig.dylib?
@julioasotodv can you answer @imatiach-msft 's question, thanks very much.
Same issues here, have you guys find a way to actually running lightGBM in spark and macOS? I really hope somebody can answer this! By the way, love what you are doing, thanks a lot!
See the Microsoft mmlspark project which includes lightgbm
https://github.com/Azure/mmlspark
YangChaoKiKa notifications@github.com schrieb am So. 6. Mai 2018 um 08:21:
Same issues here, have you guys find a way to actually running lightGBM in
spark and macOS? I really hope somebody can answer this! By the way, love
what you are doing, thanks a lot!—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/1326#issuecomment-386857216,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABnc9PcARUHEt1BYRetzQ5wQRWAXb6rNks5tvpZugaJpZM4Tboad
.
@geoHeil this is exception that I got when running lightGBM in scala, please help @imatiach-msft
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.UnsatisfiedLinkError: Could not load the native libraries because we encountered the following problems: no _lightgbm in java.library.path and Could not find resource /com/microsoft/ml/lightgbm/osx/x86_64/lib_lightgbm.dylib in jar.
at com.microsoft.ml.spark.NativeLoader.loadLibraryByName(NativeLoader.java:62)
at com.microsoft.ml.spark.LightGBMUtils$.initializeNativeLibrary(LightGBMUtils.scala:24)
at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:136)
at com.microsoft.ml.spark.LightGBMClassifier$$anonfun$1.apply(LightGBMClassifier.scala:47)
at com.microsoft.ml.spark.LightGBMClassifier$$anonfun$1.apply(LightGBMClassifier.scala:47)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

I compliled some lib from tips in the open issues, but I still don't know how to tell my local spark to find them when running lightGBM
did you check the fat-jar if the compiled native code files are actually
contained within the jar?
YangChaoKiKa notifications@github.com schrieb am So., 6. Mai 2018 um
14:50 Uhr:
[image: screen shot 2018-05-06 at 8 47 20 pm]
https://user-images.githubusercontent.com/29375406/39673438-c982ce68-516e-11e8-9aed-1f3183794b16.png
I compliled some lib from tips in the open issues, but I still don't know
how to tell my local spark to find them when running lightGBM—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/1326#issuecomment-386877238,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABnc9GRgvRWja0-ptI3fDV8ZPDXuq3HFks5tvvFxgaJpZM4Tboad
.
Also did you try to run it as a plain scala program (without spark) do you still see the same exception? I once had similar problems with native code due to some weirdness with spark's class loader.
@YangChaoKiKa hi, sorry for any confusion, we currently do not support running LightGBM on mac OSX and windows. This is something that we are planning to add in the future - basically we just need to compile the binaries (dylib for mac and dll for windows) and add them to the jar that is published to maven in the corresponding /osx/x86_64/*.dylib or similar for windows folder.
@YangChaoKiKa I'm curious as to why you had a jnilib file for the swig wrappers instead of dylib
@imatiach-msft looking forward to the new edition for macOS and windows, and the linux version you provide has trouble with my centos's glibc and it needs a 2.27 version that is not compatible with our current system. I had to resort to the docker and already have some fun with it. And to be honest, I really need to replace the native GBTClassifier spark provides.
@imatiach-msft Let me try to reproduce the jnilib and figure out what exactly happens, then get back to you
@YangChaoKiKa Any news?
I rename .jnilib to dylib. Everything works well on my mac
Actually, I checked lightgbm in mml-spark, the implementation is not a good way to launch lightgbm on spark. a lot of features are not supported such as predict_leaf.
In fact, we do not need to launch lightgbm on spark. We need to launch it on Yarn.
I am going to work on lightgbm-yarn and release a runnable version ASAP.
@Esail could you please elaborate, for example what is predict_leaf? Which functionality in mmlspark is not supported? How would you pass data to the yarn job from the spark dataframe? Why is the implementation not a good way to launch lightgbm? Thanks!
@imatiach-msft predict_leaf is to predict leaf indices. It is used to extract features from gbdt then feed the new features to LR.
refer to http://quinonero.net/Publications/predicting-clicks-facebook.pdf
same issue #845
How would you pass data to the yarn job from the spark dataframe?
I wont pass data from spark. Actually, I am going to launch lightGBM on yarn directly by implementing the applicationMaster and yarnClient.
Why is the implementation not a good way to launch lightgbm?
the socketport can be a issue if you are running hundreds of lightgbm training on cluster. Better way is to use rpc protocol to communicate just like what spark and hadoop MR do on yarn.
@Esail it should be very easy to support predict_leaf, we just need to pass this param:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#predict_leaf_index
I can create a PR for this in mmlspark.
I wont pass data from spark
How would you load and pass the data to lightgbm on yarn? The nice thing about the mmlspark implementation is we can pass the data in memory, it never has to go through disk storage, and the data never has to be shuffled - it can just be passed to the native layer on each node -> executor -> task -> partition.
the socketport can be a issue if you are running hundreds of lightgbm training on cluster
I don't believe lightgbm supports rpc, and if it does, we could surely switch to rpc in the mmlspark implementation. However, @guolinke recommended that we use sockets.
I think it would be great for you to add the yarn implementation, but for spark users I still think there is a need for the spark-based API. Most distributed ML folks prefer spark, and using the spark API is the more user-friendly way to go. Eventually, we will switch to use barrier execution mode in the next version of spark, which supports MPI-style execution model much better than what we are currently doing:
https://jira.apache.org/jira/browse/SPARK-24374
Would really like to hear more of your thoughts on this.
Hi @Esail , what you did makes lightgbm workable on mac spark env. You just said renamed .jnilib to pylib, can you please tell a little more about that. Thanks a lot.
Hi @imatiach-msft , the current scenario is that lightgbm including regression and classification cannot use on mac env, right?
Still no support for OSX build :(
Also, there is no osx libraries in jar file from maven, so, developers with macs can't use it out of the box. Missing:
com/microsoft/ml/lightgbm/osx/x86_64/lib_lightgbm_swig.dylib
com/microsoft/ml/lightgbm/osx/x86_64/lib_lightgbm.dylib
To summarise all above, one way to build it by yourself on mac is:
_Step 1_
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
_Step 2_
Manually change CMakeLists.txt:
Set APPLE and APPLE_OUTPUT_DYLIB to ON
Change everywhere /linux/x86 to /osx/x86_64
$ENV{JAVA_HOME}/include/linux to $ENV{JAVA_HOME}/include/darwin
*.so to *.dylib
_Step 3_
JAVA_HOME path should be properly set upped.
mkdir build ; cd build
cmake -DUSE_SWIG=ON ..
make -j4
You get the lightgbmlib.jar, but there is no lib_lightgbm_swig.dylib inside. So, you need last step.
_Step 4_
Rename lib_lightgbm_swig.jnilib to lib_lightgbm_swig.dylib and again:
make -j4
Now the lightgbmlib.jar is ready to use. Checking that all included:
build ❯ jar tf lightgbmlib.jar
META-INF/
META-INF/MANIFEST.MF
com/
com/microsoft/
com/microsoft/ml/
com/microsoft/ml/lightgbm/
com/microsoft/ml/lightgbm/SWIGTYPE_p_double.class
com/microsoft/ml/lightgbm/lightgbmlib.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_int32_t.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_float.class
com/microsoft/ml/lightgbm/osx/
com/microsoft/ml/lightgbm/osx/x86_64/
com/microsoft/ml/lightgbm/osx/x86_64/lib_lightgbm_swig.dylib
com/microsoft/ml/lightgbm/osx/x86_64/lib_lightgbm.dylib
com/microsoft/ml/lightgbm/SWIGTYPE_p_int.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_int64_t.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_p_double.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_p_void.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_long.class
com/microsoft/ml/lightgbm/lightgbmlibConstants.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_void.class
com/microsoft/ml/lightgbm/lightgbmlibJNI.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_p_char.class
com/microsoft/ml/lightgbm/SWIGTYPE_p_p_int.class
That worked for me. Looking forward to see changes in CMakeLists.txt to support osx.
Thank you.
hi @prohor33 , thank you for bringing this up. I have created a PR here to fix CMakeLists.txt for macOS: https://github.com/Microsoft/LightGBM/pull/2002
I have tested it on a mac machine I borrowed. Hopefully this should resolve your issue.
@imatiach-msft Great news! Thank you! Hope, this soon will be in master.
brew install libomp
Most helpful comment
hi @prohor33 , thank you for bringing this up. I have created a PR here to fix CMakeLists.txt for macOS: https://github.com/Microsoft/LightGBM/pull/2002
I have tested it on a mac machine I borrowed. Hopefully this should resolve your issue.