Ray: Fails to install from source in NVIDIA Jetson Xavier board

Created on 14 Jan 2019  路  9Comments  路  Source: ray-project/ray

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    > - Linux Ubuntu 18.04 ( Linux for tegra )
  • Ray installed from (source or binary):
    > - source
  • Ray version:
    > - 0.6.1 (master)
  • Python version:
    > - 3.6.6
    > - Cython version : 0.29.0
  • Exact command to reproduce:
    pip3 install -e . --verbose

Describe the problem


I'm trying to install ray in the NVIDIA Jetson Xavier board.

Xavier board has Arm cpu so I'm trying to install from source.

However, Following errror message occurs :

    [ 33%] Built target _csv_pyx
    make[4]: *** write jobserver: Bad file descriptor.  Stop.
    make[4]: *** Waiting for unfinished jobs....
    [ 33%] Built target _plasma_pyx
    make[4]: *** write jobserver: Bad file descriptor.  Stop.
    Makefile:83: recipe for target 'all' failed
    make[3]: *** [all] Error 2
    error: command 'cmake' failed with exit status 2
    CMakeFiles/pyarrow_ext.dir/build.make:106: recipe for target 'external/pyarrow/src/pyarrow_ext-stamp/pyarrow_ext-configure' failed
    make[2]: *** [external/pyarrow/src/pyarrow_ext-stamp/pyarrow_ext-configure] Error 1
    CMakeFiles/Makefile2:335: recipe for target 'CMakeFiles/pyarrow_ext.dir/all' failed
    make[1]: *** [CMakeFiles/pyarrow_ext.dir/all] Error 2
    make[1]: *** Waiting for unfinished jobs....

Is anyone know about this issue? Any help will be much appreciated.

Source code / logs

All 9 comments

This looks strange: write jobserver: Bad file descriptor

Is there some artificially low limit on the number of file descriptors (see https://unix.stackexchange.com/questions/84227/limits-on-the-number-of-file-descriptors)?

Also potentially you can compile arrow without the jobserver? This might be possible by defining export PYARROW_PARALLEL=1, see the arrow setup.py.

Also, is it using some sort of non-standard shell or non-standard environment?

Upgrading your cmake might also help.

Thank you so much for super quick answer!

Problem is solved after I changed

make -j${PARALLEL}

to

make

However, I got another error :

    [ 74%] Linking CXX executable raylet
    ../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::mutex::lock()':
    /home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
    ../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::unique_lock<boost::mutex>::unlock()':
    /home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
    /home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
    ../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::unique_lock<boost::mutex>::lock()':
    /home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
    /home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
    ../../../external/arrow-install/lib/libplasma.a(client.cc.o):/home/nvidia/.local/include/boost/thread/exceptions.hpp:84: more undefined references to `boost::system::generic_category()' follow
    ../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `__static_initialization_and_destruction_0':
    /home/nvidia/.local/include/boost/system/error_code.hpp:210: undefined reference to `boost::system::system_category()'
    ../../../external/arrow-install/lib/libarrow.a(thread-pool.cc.o): In function `__static_initialization_and_destruction_0':
    /home/nvidia/.local/include/boost/system/error_code.hpp:206: undefined reference to `boost::system::generic_category()'
    /home/nvidia/.local/include/boost/system/error_code.hpp:208: undefined reference to `boost::system::generic_category()'
    /home/nvidia/.local/include/boost/system/error_code.hpp:210: undefined reference to `boost::system::system_category()'
    collect2: error: ld returned 1 exit status
    src/ray/raylet/CMakeFiles/raylet.dir/build.make:92: recipe for target 'src/ray/raylet/raylet' failed
    make[2]: *** [src/ray/raylet/raylet] Error 1
    CMakeFiles/Makefile2:1600: recipe for target 'src/ray/raylet/CMakeFiles/raylet.dir/all' failed
    make[1]: *** [src/ray/raylet/CMakeFiles/raylet.dir/all] Error 2
    Makefile:140: recipe for target 'all' failed
    make: *** [all] Error 2
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/nvidia/jychoi/ray/python/setup.py", line 180, in <module>
        license="Apache 2.0")
      File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 143, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.6/dist-packages/setuptools/command/develop.py", line 38, in run
        self.install_for_development()
      File "/usr/local/lib/python3.6/dist-packages/setuptools/command/develop.py", line 138, in install_for_development
        self.run_command('build_ext')
      File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
        cmd_obj.run()
      File "/home/nvidia/jychoi/ray/python/setup.py", line 79, in run
        subprocess.check_call(["../build.sh", "-p", sys.executable])
      File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['../build.sh', '-p', '/usr/bin/python3']' returned non-zero exit status 2.

It looks like the compiler cannot link libboost correctly.

I have libboost-dev version 1.65.1.0ubuntu1

How can I link it properly?

  • seem like I have to put '-lboost_system' to somewhere but not sure..

Hey,

it should link with the version of boost that is downloaded by the ray build system. Can you provide the full output of the build script?

-- Philipp.

ray_build_error.txt

This is full output of pip3 install -e . --verbose

raylet stuff build starts somewhere near line 1506 and error occurs at line 1707

Thank you for help!

I installed the boost version 1.68.0 on the system and above issue is solved.

Now I got next compile error :

[ 78%] Building CXX object src/ray/util/CMakeFiles/signal_test.dir/signal_test.cc.o
    /tmp/cc5M1g3e.s: Assembler messages:
    /tmp/cc5M1g3e.s:7474: Error: unknown mnemonic `ud2' -- `ud2'
    src/ray/util/CMakeFiles/signal_test.dir/build.make:62: recipe for target 'src/ray/util/CMakeFiles/signal_test.dir/signal_test.cc.o' failed

Seems like compiler is trying to use 'ud2' assembler instruction, which Aarch64 CPU does not have..

I changed 'src/ray/util/signal_test.cc' as follows :

#include <signal.h>
#include <cstdlib>
#include <iostream>
#include <csignal>

#include "gtest/gtest.h"
#include "ray/util/logging.h"
#include "ray/util/util.h"

// This test just print some call stack information.
namespace ray {

void Sleep() { usleep(100000); }

void TestSendSignal(const std::string &test_name, int signal) {
  pid_t pid;
  pid = fork();
  ASSERT_TRUE(pid >= 0);
  if (pid == 0) {
    while (true) {
      int n = 1000;
      while (n--)
        ;
    }
  } else {
    Sleep();
    RAY_LOG(ERROR) << test_name << ": kill pid " << pid
                   << " with return value=" << kill(pid, signal);
    Sleep();
  }
}

TEST(SignalTest, SendTermSignalTest) { TestSendSignal("SendTermSignalTest", SIGTERM); }

TEST(SignalTest, SendBusSignalTest) { TestSendSignal("SendBusSignalTest", SIGBUS); }

TEST(SignalTest, SIGABRT_Test) {
  pid_t pid;
  pid = fork();
  ASSERT_TRUE(pid >= 0);
  if (pid == 0) {
    // This code will cause SIGABRT sent.
    std::abort();
  } else {
    Sleep();
    RAY_LOG(ERROR) << "SIGABRT_Test: kill pid " << pid
                   << " with return value=" << kill(pid, SIGKILL);
    Sleep();
  }
}

TEST(SignalTest, SIGSEGV_Test) {
  pid_t pid;
  pid = fork();
  ASSERT_TRUE(pid >= 0);
  if (pid == 0) {
    int *pointer = reinterpret_cast<int *>(0x1237896);
    *pointer = 100;
  } else {
    Sleep();
    RAY_LOG(ERROR) << "SIGSEGV_Test: kill pid " << pid
                   << " with return value=" << kill(pid, SIGKILL);
    Sleep();
  }
}

TEST(SignalTest, SIGILL_Test) {
  pid_t pid;
  pid = fork();
  ASSERT_TRUE(pid >= 0);
  if (pid == 0) {
    // This code will cause SIGILL sent.
    //asm("ud2");
    std::raise(SIGILL);
  } else {
    Sleep();
    RAY_LOG(ERROR) << "SIGILL_Test: kill pid " << pid
                   << " with return value=" << kill(pid, SIGKILL);
    Sleep();
  }
}

}  // namespace ray

int main(int argc, char **argv) {
  InitShutdownRAII ray_log_shutdown_raii(ray::RayLog::StartRayLog,
                                         ray::RayLog::ShutDownRayLog, argv[0],
                                         ray::RayLogLevel::INFO,
                                         /*log_dir=*/"");
  ray::RayLog::InstallFailureSignalHandler();
  ::testing::InitGoogleTest(&argc, argv);
  int failed = RUN_ALL_TESTS();
  return failed;
}

Ray is successfully built it seems like work well.

I have no knowledge about assembler so I'm not sure whether std::raise(SIGILL)

do the same thing.

Please consider changing asm("ud2") for non x86 users if two are equivalent.

I hope this helps other guys who want to run ray on Jetson or other ARM64 cpus.

@gliese581gg I'm glad you got it working! Do you want to submit a PR to change asm("ud2") to std::raise(SIGILL)?

Was this page helpful?
0 / 5 - 0 ratings