Opening issue on behalf of @adwivedi on the discussion forum (https://discuss.mxnet.io/t/error-while-running-the-mxnet-spark-examples-and-test-cases/2720).
I am trying to run the mxnet in distributed mode using spark as implemented here : https://github.com/apache/incubator-mxnet/tree/master/scala-package/spark
but I am not able to run the examples and/or tests.
The commented tests in the file : https://github.com/apache/incubator-mxnet/blob/master/scala-package/spark/src/test/scala/org/apache/mxnet/spark/MXNetGeneralSuite.scala keep running into the following error. I get the same error when I try to run the examples in the repo.
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
Exception in thread "Thread-21" java.lang.IllegalArgumentException: requirement failed: Failed to start ps scheduler process with exit code 1
at scala.Predef$.require(Predef.scala:224)
at org.apache.mxnet.spark.MXNet.org$apache$mxnet$spark$MXNet$$startPSSchedulerInner$1(MXNet.scala:159)
at org.apache.mxnet.spark.MXNet$$anonfun$startPSScheduler$1.apply(MXNet.scala:162)
at org.apache.mxnet.spark.MXNet$$anonfun$startPSScheduler$1.apply(MXNet.scala:162)
at org.apache.mxnet.spark.MXNet$MXNetControllingThread.run(MXNet.scala:38)
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: scala.collection.Seq
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
I have seen this error generally when there鈥檚 a mismatch between the scala versions in the api, but in this case I am using the using the pom file that鈥檚 in the project and not including any external libraries.
I have also looked at the pom file but I鈥檝e not found any lib that might have a mismatch in this case, all the libraries in pom are 2.11 version
I am building it from source, running it with this vm parameter : -
Djava.library.path=/path_to_mxnet_source/incubator-mxnet/scala-package/native/osx-x86_64-cpu/target
to let it find the native library.
@lanking520
@mxnet-label-bot add [Scala]
Thanks for @thomelane raising it here. Add a bunch of Scala guru here:
@piyushghai @zachgk @andrewfayres @CodingCat
Please take a look at this issue. I also think it is a good time now to fix the flaky test on the disabled tests and allow Spark run in the CI.
@adwivedi , Your issue for running the examples should be resolved by this PR : https://github.com/apache/incubator-mxnet/pull/13849 and https://github.com/apache/incubator-mxnet/pull/13891. These fixes are now merged into master.
There were recent changes to the Maven POM files due to which the examples were temporarily not running. They should be back to normal now. Let me know if you still face issues with running your examples.
And yes. the java library path that you are setting is the correct one.
@piyushghai The error still persists. You can reproduce it by un-commenting the test = run spark with MLP with it's related methods in the file org/apache/mxnet/spark/MXNetGeneralSuite.scala
Also, the latest version in master doesn't respect the -Djava.library.path or LD_LIBRARY_PATH_VARIABLES and only picks up the native files from the jar (which doesn't seem to have these files). So to test this I am running a hacked version to make the libraries get picked up from the path (incubator-mxnet/scala-package/native/osx-x86_64-cpu/target) I hard code in NativeLibraryLoader.scala
@piyushghai The path of the jars are still incorrect / incomplete. I've fixed this in this pull request, here : https://github.com/apache/incubator-mxnet/pull/14020
However there's still one problem, with the latest changes. The mxnet shared library fails to build when USE_DIST_KVSTORE = 1 because of a the following error :
checking whether the C compiler works... no
configure: error: in `/path_to_mxnet/incubator-mxnet/3rdparty/ps-lite/protobuf-2.5.0':
configure: error: C compiler cannot create executables
See `config.log' for more details
make[1]: *** [/path_to_mxnet/incubator-mxnet/deps/include/google/protobuf/message.h] Error 77
make: *** [PSLITE] Error 2
@aashudwivedi Thanks for making the fix to the calsspath problem for the Spark issue.
Can you point me to the specific build instructions you followed to build the mxnet shared library ?
Also what's the instance type on which you're building it ?
Here's what I did to build the libmxnet.so from source :
git clone --recursive https://github.com/apache/incubator-mxnet.git
cd incubator-mxnet
make clean && make -j$(nproc) USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda-9.0 USE_CUDNN=1 USE_DIST_KVSTORE=1
Here's the instance info on which I built mxnet:
('Version :', '2.7.12')
('Compiler :', 'GCC 5.4.0 20160609')
('Build :', ('default', 'Nov 12 2018 14:36:49'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '18.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version :', '1.5.0')
('Directory :', '/home/ubuntu/.local/lib/python2.7/site-packages/mxnet')
('Commit Hash :', 'da5242b732de39ad47d8ecee582f261ba5935fa9')
----------System Info----------
('Platform :', 'Linux-4.4.0-1074-aws-x86_64-with-Ubuntu-16.04-xenial')
('system :', 'Linux')
('node :', 'ip-172-31-78-46')
('release :', '4.4.0-1074-aws')
('version :', '#84-Ubuntu SMP Thu Dec 6 08:57:58 UTC 2018')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 1227.804
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.14
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63
@piyushghai
I am building it on macOS High Sierra 10.13.6.
Here's what I do to build the libmxnet.so. I have followed the instructions from http://mxnet.incubator.apache.org/versions/master/install/osx_setup.html#build-the-shared-library
git clone --recursive https://github.com/apache/incubator-mxnet ~/mxnet
cd ~/mxnet
cp make/osx.mk ./config.mk
echo "USE_BLAS = openblas" >> ./config.mk
echo "ADD_CFLAGS += -I/usr/local/opt/openblas/include" >> ./config.mk
echo "ADD_LDFLAGS += -L/usr/local/opt/openblas/lib" >> ./config.mk
echo "ADD_LDFLAGS += -L/usr/local/lib/graphviz/" >> ./config.mk
echo "USE_DIST_KVSTORE=1" >> ./config.mk
make -j$(sysctl -n hw.ncpu)
Here's the instance info :
----------Python Info----------
('Version :', '2.7.15')
('Compiler :', 'GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.2)')
('Build :', ('default', 'Oct 2 2018 11:47:18'))
('Arch :', ('64bit', ''))
------------Pip Info-----------
('Version :', '18.0')
('Directory :', '/usr/local/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform :', 'Darwin-17.7.0-x86_64-i386-64bit')
('system :', 'Darwin')
('node :', 'ashutdwi-mac')
('release :', '17.7.0')
('version :', 'Darwin Kernel Version 17.7.0: Wed Oct 10 23:06:14 PDT 2018; root:xnu-4570.71.13~1/RELEASE_X86_64')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'i386')
machdep.cpu.brand_string: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID FPU_CSDS
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT RDTSCP TSCI
Here are the details of xcode:
Xcode 10.1
Build version 10B61
I've also attached the config.log from 3rdparty/ps-lite/protobuf-2.5.0 dir here config.log
cannot reproduce the same issue with High Sierra. Here is the list of dependencies I installed:
xcode command-line tool 10
brew install openssl automake pkg-config nasm
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
@aashudwivedi Any luck with these above steps ?
@piyushghai unfortunately I still have the same problem and I am not sure what other information I should provide you to be able to reproduce this.
From the previous message:
checking whether the C compiler works... no
configure: error: in `/path_to_mxnet/incubator-mxnet/3rdparty/ps-lite/protobuf-2.5.0':
configure: error: C compiler cannot create executables
Have checked your C compiler clang --version? It seemed the problems appeared to be some compiler issues. Try make clean and make -j again
Most helpful comment
Thanks for @thomelane raising it here. Add a bunch of Scala guru here:
@piyushghai @zachgk @andrewfayres @CodingCat
Please take a look at this issue. I also think it is a good time now to fix the flaky test on the disabled tests and allow Spark run in the CI.