Incubator-mxnet: Failing to build scalapkg with USE_DIST_KVSTORE=1 and USE_OPENCV=1 on Amazon Linux (RHEL)

Created on 22 Sep 2017  路  5Comments  路  Source: apache/incubator-mxnet

Environment info

Operating System: Amazon Linux ami-a4c7edb2

Package used (Python/R/Scala/Julia): Scala

MXNet version: 0.11

Error Message:

[INFO] ------------------------------------------------------------------------
[INFO] Building MXNet Scala Package - Core 0.11.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ mxnet-core_2.11 ---
[INFO] 
[INFO] --- maven-resources-plugin:2.7:resources (default-resources) @ mxnet-core_2.11 ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /root/mxnet/scala-package/core/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.3:compile (default-compile) @ mxnet-core_2.11 ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-scala-plugin:2.15.2:compile (default) @ mxnet-core_2.11 ---
[INFO] Checking for multiple versions of scala
[WARNING] Invalid POM for ml.dmlc.mxnet:mxnet-macros_2.11:jar:0.11.0-SNAPSHOT, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING]  Expected all dependencies to require Scala version: 2.11.8
[WARNING]  ml.dmlc.mxnet:mxnet-init_2.11:0.11.0-SNAPSHOT requires scala version: 2.11.8
[WARNING]  ml.dmlc.mxnet:mxnet-init_2.11:0.11.0-SNAPSHOT requires scala version: 2.11.8
[WARNING]  ml.dmlc.mxnet:mxnet-core_2.11:0.11.0-SNAPSHOT requires scala version: 2.11.8
[WARNING]  org.scala-lang:scala-reflect:2.11.8 requires scala version: 2.11.8
[WARNING]  org.scalatest:scalatest_2.11:2.2.4 requires scala version: 2.11.2
[WARNING] Multiple versions of scala libraries detected!
[INFO] includes = [**/*.java,**/*.scala,]
[INFO] excludes = []
[INFO] /root/mxnet/scala-package/core/src/main/scala:-1: info: compiling
[INFO] Compiling 49 source files to /root/mxnet/scala-package/core/target/classes at 1506111412300
[INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.11.8,2.1.0)
[ERROR] terminate called after throwing an instance of 'std::length_error'
[INFO]   what():  basic_string::append
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] MXNet Scala Package - Parent ....................... SUCCESS [  3.548 s]
[INFO] MXNet Scala Package - Initializer .................. SUCCESS [  3.788 s]
[INFO] MXNet Scala Package - Initializer Native Parent .... SUCCESS [  0.025 s]
[INFO] MXNet Scala Package - Initializer Native Linux-x86_64 SUCCESS [  2.509 s]
[INFO] MXNet Scala Package - Macros ....................... SUCCESS [  5.256 s]
[INFO] MXNet Scala Package - Core ......................... FAILURE [  3.094 s]
[INFO] MXNet Scala Package - Native Parent ................ SKIPPED
[INFO] MXNet Scala Package - Native Linux-x86_64 CPU-only . SKIPPED
[INFO] MXNet Scala Package - Examples ..................... SKIPPED
[INFO] MXNet Scala Package - Spark ML ..................... SKIPPED
[INFO] MXNet Scala Package - Full Parent .................. SKIPPED
[INFO] MXNet Scala Package - Full Linux-x86_64 CPU-only ... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19.603 s
[INFO] Finished at: 2017-09-22T20:16:55Z
[INFO] Final Memory: 30M/754M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scala-tools:maven-scala-plugin:2.15.2:compile (default) on project mxnet-core_2.11: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 134(Exit value: 134) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :mxnet-core_2.11
make: *** [scalapkg] Error 1

Minimum reproducible example

I am building mxnet from source.
Here's a shell script to reproduce on Amazon Linux ami-a4c7edb2:

#!/usr/bin/env bash

sudo su

yum update

#install OpenBLAS
yum groupinstall 'Development Tools'
yum install openblas-devel.x86_64

#install jdk8
yum install java-1.8.0-openjdk-devel.x86_64
unlink /etc/alternatives/java
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java /etc/alternatives/java
unlink /etc/alternatives/jre_openjdk
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ /etc/alternatives/jre_openjdk
unlink /etc/alternatives/jre
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64 /etc/alternatives/jre

#install OpenCV
yum install cmake git gtk2-devel pkgconfig numpy ffmpeg
git clone https://github.com/opencv/opencv.git
cd opencv && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DLIB_SUFFIX=64 ..
make -j $(nproc)
make install
#cp /home/ec2-user/opencv/build/unix-install/opencv.pc /usr/share/pkgconfig
#echo "/usr/lib" >>/etc/ld.so.conf.d/libopencv_dnn.conf
#ldconfig

#install mxnet
cd ~
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet --branch 0.11.0
cd mxnet && cp make/config.mk .
echo "USE_CUDA=0" >>config.mk
echo "USE_CUDNN=0" >>config.mk
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "ADD_LDFLAGS += -lopencv_core -lopencv_imgproc -lopencv_imgcodecs" >>config.mk
echo "USE_OPENCV=1" >>config.mk
echo "USE_DIST_KVSTORE=1" >>config.mk
make -j $(nproc)

#install maven
cd ~
wget http://mirrors.koehn.com/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
tar xzvf apache-maven-3.5.0-bin.tar.gz
mv apache-maven-3.5.0 /opt/
echo "export PATH=/opt/apache-maven-3.5.0/bin:$PATH" >>/etc/profile.d/mvn.sh
source /etc/profile.d/mvn.sh
mvn -v

#install mxnet scala library
cd ~/mxnet/
make -j $(nproc) scalapkg

What have you tried to solve it?

The error is too generic and has to do with some out of bound index in a string.
If I build mxnet without the USE_DIST_KVSTORE=1 flag however, building scalapkg succeeded without any issues.

Most helpful comment

@cjolivier01, Spot on. OpenCV3 was installing protobuf as well causing the error.
I downgraded openCV to 2.4 and everything built like a charm.

Here's the full working install script if someone else needs to install on Amazon Linux (should maybe also work on RHEL and CentOS)

#!/usr/bin/env bash

sudo su

yum update -y

#install OpenBLAS
yum groupinstall -y 'Development Tools'
yum install -y openblas-devel.x86_64

#install jdk8
yum install -y java-1.8.0-openjdk-devel.x86_64
unlink /etc/alternatives/java
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java /etc/alternatives/java
unlink /etc/alternatives/jre_openjdk
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ /etc/alternatives/jre_openjdk
unlink /etc/alternatives/jre
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64 /etc/alternatives/jre

#install OpenCV 2.4
#OpenCV3 includes protobuf library that conflict with the mxnet ones. https://github.com/apache/incubator-mxnet/issues/7998
yum install -y cmake git gtk2-devel pkgconfig numpy ffmpeg
git clone https://github.com/opencv/opencv.git
cd opencv && git checkout 2.4
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS=ON ..
make -j $(nproc)
make install

#install mxnet
cd ~
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet --branch 0.11.0
yum install -y libcurl-devel.x86_64 openssl-devel.x86_64 lapack-devel.x86_64 lapack64-static.x86_64
ln -s /usr/lib64/liblapack64.a /usr/lib64/liblapack.a
cd mxnet && cp make/config.mk .
echo "USE_CUDA=0" >>config.mk
echo "USE_CUDNN=0" >>config.mk
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "USE_OPENCV=1" >>config.mk
echo "USE_DIST_KVSTORE=1" >>config.mk
echo "USE_S3=1" >>config.mk
echo "USE_LAPACK_PATH=/usr/lib64" >>config.mk
make -j $(nproc)

#install maven
cd ~
wget http://mirrors.koehn.com/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
tar xzvf apache-maven-3.5.0-bin.tar.gz
mv apache-maven-3.5.0 /opt/
echo "export PATH=/opt/apache-maven-3.5.0/bin:$PATH" >>/etc/profile.d/mvn.sh
source /etc/profile.d/mvn.sh
mvn -v

#install mxnet scala library and run unit-tests
cd ~/mxnet/
make -j $(nproc) scalapkg
make scalatest

#install library in local maven repo
make scalainstall
#replace placeholders in the pom files (seems to be a bug that keeps ${project.version} in the pom files)
find /root/.m2/repository/ml/dmlc/mxnet/ -type f -exec sed -i 's/${project.version}/0.11.0-SNAPSHOT/g' {} +

#install mxnet libraries on the java lib path
#mvn -f /root/.m2/repository/ml/dmlc/mxnet/mxnet-full_2.11-linux-x86_64-cpu/0.11.0-SNAPSHOT/mxnet-full_2.11-linux-x86_64-cpu-0.11.0-SNAPSHOT.pom \
#dependency:copy-dependencies -DoutputDirectory=/usr/lib64/mxnet
mkdir /usr/lib64/mxnet
find /root/.m2/repository/ml/dmlc/mxnet/ -type f \
    ! -name "*pom" \
    ! -name "*xml" \
    ! -name "*javadoc*" \
    ! -name "_remote.repositories" \
    ! -name "*-sources.jar" \
    -exec cp '{}' /usr/lib64/mxnet/ \;
ln -s /usr/lib64/mxnet/libmxnet-scala-linux-x86_64-cpu-0.11.0-SNAPSHOT.so /usr/lib64/libmxnet-scala.so
ln -s /usr/lib64/mxnet/libmxnet-init-scala-linux-x86_64-0.11.0-SNAPSHOT.so /usr/lib64/libmxnet-init-scala.so

#Run MNIST
./scala-package/core/scripts/get_mnist_data.sh
java -Xmx4G -cp \
  /usr/lib64/mxnet/mxnet-core_2.11-0.11.0-SNAPSHOT.jar:/usr/lib64/mxnet/mxnet-examples_2.11-0.11.0-SNAPSHOT.jar:scala-package/examples/target/classes/lib/* \
  ml.dmlc.mxnetexamples.imclassification.TrainMnist \
  --data-dir=./data/ \
  --num-epochs=10 \
  --network=mlp \
  --cpus=0,1,2,3

#RUN GAN MNIST
java -Xmx4G -cp \
    /usr/lib64/mxnet/mxnet-core_2.11-0.11.0-SNAPSHOT.jar:/usr/lib64/mxnet/mxnet-examples_2.11-0.11.0-SNAPSHOT.jar:scala-package/examples/target/classes/lib/*  \
    ml.dmlc.mxnetexamples.gan.GanMnist \
    --mnist-data-path=./data/ \
    --output-path=./data/gan/ \
    --gpu=-1

All 5 comments

I tried it on osx and ubuntu, with USE_DIST_KVSTORE=1, it built successfully. Could you help to try other Linux distro to see whether this issue still exist?

@javelinjs, the issue I reported is with Amazon Linux (a version of RHEL)

if I disable opencv in the build, it works

echo "USE_OPENCV=0" >>config.mk

I get the kvstore but I lose the image io capabilities

You may have (aka probably) conflicting versions of either protobuf or OpenCV installed. The build process with kvstore will try to download and build protobuf. This might be out of sync with headers or libraries that it is finding elsewhere.

@cjolivier01, Spot on. OpenCV3 was installing protobuf as well causing the error.
I downgraded openCV to 2.4 and everything built like a charm.

Here's the full working install script if someone else needs to install on Amazon Linux (should maybe also work on RHEL and CentOS)

#!/usr/bin/env bash

sudo su

yum update -y

#install OpenBLAS
yum groupinstall -y 'Development Tools'
yum install -y openblas-devel.x86_64

#install jdk8
yum install -y java-1.8.0-openjdk-devel.x86_64
unlink /etc/alternatives/java
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java /etc/alternatives/java
unlink /etc/alternatives/jre_openjdk
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/ /etc/alternatives/jre_openjdk
unlink /etc/alternatives/jre
ln -s /usr/lib/jvm/jre-1.8.0-openjdk.x86_64 /etc/alternatives/jre

#install OpenCV 2.4
#OpenCV3 includes protobuf library that conflict with the mxnet ones. https://github.com/apache/incubator-mxnet/issues/7998
yum install -y cmake git gtk2-devel pkgconfig numpy ffmpeg
git clone https://github.com/opencv/opencv.git
cd opencv && git checkout 2.4
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS=ON ..
make -j $(nproc)
make install

#install mxnet
cd ~
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet --branch 0.11.0
yum install -y libcurl-devel.x86_64 openssl-devel.x86_64 lapack-devel.x86_64 lapack64-static.x86_64
ln -s /usr/lib64/liblapack64.a /usr/lib64/liblapack.a
cd mxnet && cp make/config.mk .
echo "USE_CUDA=0" >>config.mk
echo "USE_CUDNN=0" >>config.mk
echo "USE_BLAS=openblas" >>config.mk
echo "ADD_CFLAGS += -I/usr/include/openblas" >>config.mk
echo "USE_OPENCV=1" >>config.mk
echo "USE_DIST_KVSTORE=1" >>config.mk
echo "USE_S3=1" >>config.mk
echo "USE_LAPACK_PATH=/usr/lib64" >>config.mk
make -j $(nproc)

#install maven
cd ~
wget http://mirrors.koehn.com/apache/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz
tar xzvf apache-maven-3.5.0-bin.tar.gz
mv apache-maven-3.5.0 /opt/
echo "export PATH=/opt/apache-maven-3.5.0/bin:$PATH" >>/etc/profile.d/mvn.sh
source /etc/profile.d/mvn.sh
mvn -v

#install mxnet scala library and run unit-tests
cd ~/mxnet/
make -j $(nproc) scalapkg
make scalatest

#install library in local maven repo
make scalainstall
#replace placeholders in the pom files (seems to be a bug that keeps ${project.version} in the pom files)
find /root/.m2/repository/ml/dmlc/mxnet/ -type f -exec sed -i 's/${project.version}/0.11.0-SNAPSHOT/g' {} +

#install mxnet libraries on the java lib path
#mvn -f /root/.m2/repository/ml/dmlc/mxnet/mxnet-full_2.11-linux-x86_64-cpu/0.11.0-SNAPSHOT/mxnet-full_2.11-linux-x86_64-cpu-0.11.0-SNAPSHOT.pom \
#dependency:copy-dependencies -DoutputDirectory=/usr/lib64/mxnet
mkdir /usr/lib64/mxnet
find /root/.m2/repository/ml/dmlc/mxnet/ -type f \
    ! -name "*pom" \
    ! -name "*xml" \
    ! -name "*javadoc*" \
    ! -name "_remote.repositories" \
    ! -name "*-sources.jar" \
    -exec cp '{}' /usr/lib64/mxnet/ \;
ln -s /usr/lib64/mxnet/libmxnet-scala-linux-x86_64-cpu-0.11.0-SNAPSHOT.so /usr/lib64/libmxnet-scala.so
ln -s /usr/lib64/mxnet/libmxnet-init-scala-linux-x86_64-0.11.0-SNAPSHOT.so /usr/lib64/libmxnet-init-scala.so

#Run MNIST
./scala-package/core/scripts/get_mnist_data.sh
java -Xmx4G -cp \
  /usr/lib64/mxnet/mxnet-core_2.11-0.11.0-SNAPSHOT.jar:/usr/lib64/mxnet/mxnet-examples_2.11-0.11.0-SNAPSHOT.jar:scala-package/examples/target/classes/lib/* \
  ml.dmlc.mxnetexamples.imclassification.TrainMnist \
  --data-dir=./data/ \
  --num-epochs=10 \
  --network=mlp \
  --cpus=0,1,2,3

#RUN GAN MNIST
java -Xmx4G -cp \
    /usr/lib64/mxnet/mxnet-core_2.11-0.11.0-SNAPSHOT.jar:/usr/lib64/mxnet/mxnet-examples_2.11-0.11.0-SNAPSHOT.jar:scala-package/examples/target/classes/lib/*  \
    ml.dmlc.mxnetexamples.gan.GanMnist \
    --mnist-data-path=./data/ \
    --output-path=./data/gan/ \
    --gpu=-1
Was this page helpful?
0 / 5 - 0 ratings