I'm trying to install ray in the NVIDIA Jetson Xavier board.
Xavier board has Arm cpu so I'm trying to install from source.
However, Following errror message occurs :
[ 33%] Built target _csv_pyx
make[4]: *** write jobserver: Bad file descriptor. Stop.
make[4]: *** Waiting for unfinished jobs....
[ 33%] Built target _plasma_pyx
make[4]: *** write jobserver: Bad file descriptor. Stop.
Makefile:83: recipe for target 'all' failed
make[3]: *** [all] Error 2
error: command 'cmake' failed with exit status 2
CMakeFiles/pyarrow_ext.dir/build.make:106: recipe for target 'external/pyarrow/src/pyarrow_ext-stamp/pyarrow_ext-configure' failed
make[2]: *** [external/pyarrow/src/pyarrow_ext-stamp/pyarrow_ext-configure] Error 1
CMakeFiles/Makefile2:335: recipe for target 'CMakeFiles/pyarrow_ext.dir/all' failed
make[1]: *** [CMakeFiles/pyarrow_ext.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Is anyone know about this issue? Any help will be much appreciated.
This looks strange: write jobserver: Bad file descriptor
Is there some artificially low limit on the number of file descriptors (see https://unix.stackexchange.com/questions/84227/limits-on-the-number-of-file-descriptors)?
Also potentially you can compile arrow without the jobserver? This might be possible by defining export PYARROW_PARALLEL=1, see the arrow setup.py.
Also, is it using some sort of non-standard shell or non-standard environment?
Upgrading your cmake might also help.
Thank you so much for super quick answer!
Problem is solved after I changed
make -j${PARALLEL}
to
make
However, I got another error :
[ 74%] Linking CXX executable raylet
../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::mutex::lock()':
/home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::unique_lock<boost::mutex>::unlock()':
/home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
/home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `boost::unique_lock<boost::mutex>::lock()':
/home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
/home/nvidia/.local/include/boost/thread/exceptions.hpp:51: undefined reference to `boost::system::generic_category()'
../../../external/arrow-install/lib/libplasma.a(client.cc.o):/home/nvidia/.local/include/boost/thread/exceptions.hpp:84: more undefined references to `boost::system::generic_category()' follow
../../../external/arrow-install/lib/libplasma.a(client.cc.o): In function `__static_initialization_and_destruction_0':
/home/nvidia/.local/include/boost/system/error_code.hpp:210: undefined reference to `boost::system::system_category()'
../../../external/arrow-install/lib/libarrow.a(thread-pool.cc.o): In function `__static_initialization_and_destruction_0':
/home/nvidia/.local/include/boost/system/error_code.hpp:206: undefined reference to `boost::system::generic_category()'
/home/nvidia/.local/include/boost/system/error_code.hpp:208: undefined reference to `boost::system::generic_category()'
/home/nvidia/.local/include/boost/system/error_code.hpp:210: undefined reference to `boost::system::system_category()'
collect2: error: ld returned 1 exit status
src/ray/raylet/CMakeFiles/raylet.dir/build.make:92: recipe for target 'src/ray/raylet/raylet' failed
make[2]: *** [src/ray/raylet/raylet] Error 1
CMakeFiles/Makefile2:1600: recipe for target 'src/ray/raylet/CMakeFiles/raylet.dir/all' failed
make[1]: *** [src/ray/raylet/CMakeFiles/raylet.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/nvidia/jychoi/ray/python/setup.py", line 180, in <module>
license="Apache 2.0")
File "/usr/local/lib/python3.6/dist-packages/setuptools/__init__.py", line 143, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.6/dist-packages/setuptools/command/develop.py", line 38, in run
self.install_for_development()
File "/usr/local/lib/python3.6/dist-packages/setuptools/command/develop.py", line 138, in install_for_development
self.run_command('build_ext')
File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/nvidia/jychoi/ray/python/setup.py", line 79, in run
subprocess.check_call(["../build.sh", "-p", sys.executable])
File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../build.sh', '-p', '/usr/bin/python3']' returned non-zero exit status 2.
It looks like the compiler cannot link libboost correctly.
I have libboost-dev version 1.65.1.0ubuntu1
How can I link it properly?
Hey,
it should link with the version of boost that is downloaded by the ray build system. Can you provide the full output of the build script?
-- Philipp.
This is full output of pip3 install -e . --verbose
raylet stuff build starts somewhere near line 1506 and error occurs at line 1707
Thank you for help!
I installed the boost version 1.68.0 on the system and above issue is solved.
Now I got next compile error :
[ 78%] Building CXX object src/ray/util/CMakeFiles/signal_test.dir/signal_test.cc.o
/tmp/cc5M1g3e.s: Assembler messages:
/tmp/cc5M1g3e.s:7474: Error: unknown mnemonic `ud2' -- `ud2'
src/ray/util/CMakeFiles/signal_test.dir/build.make:62: recipe for target 'src/ray/util/CMakeFiles/signal_test.dir/signal_test.cc.o' failed
Seems like compiler is trying to use 'ud2' assembler instruction, which Aarch64 CPU does not have..
I changed 'src/ray/util/signal_test.cc' as follows :
#include <signal.h>
#include <cstdlib>
#include <iostream>
#include <csignal>
#include "gtest/gtest.h"
#include "ray/util/logging.h"
#include "ray/util/util.h"
// This test just print some call stack information.
namespace ray {
void Sleep() { usleep(100000); }
void TestSendSignal(const std::string &test_name, int signal) {
pid_t pid;
pid = fork();
ASSERT_TRUE(pid >= 0);
if (pid == 0) {
while (true) {
int n = 1000;
while (n--)
;
}
} else {
Sleep();
RAY_LOG(ERROR) << test_name << ": kill pid " << pid
<< " with return value=" << kill(pid, signal);
Sleep();
}
}
TEST(SignalTest, SendTermSignalTest) { TestSendSignal("SendTermSignalTest", SIGTERM); }
TEST(SignalTest, SendBusSignalTest) { TestSendSignal("SendBusSignalTest", SIGBUS); }
TEST(SignalTest, SIGABRT_Test) {
pid_t pid;
pid = fork();
ASSERT_TRUE(pid >= 0);
if (pid == 0) {
// This code will cause SIGABRT sent.
std::abort();
} else {
Sleep();
RAY_LOG(ERROR) << "SIGABRT_Test: kill pid " << pid
<< " with return value=" << kill(pid, SIGKILL);
Sleep();
}
}
TEST(SignalTest, SIGSEGV_Test) {
pid_t pid;
pid = fork();
ASSERT_TRUE(pid >= 0);
if (pid == 0) {
int *pointer = reinterpret_cast<int *>(0x1237896);
*pointer = 100;
} else {
Sleep();
RAY_LOG(ERROR) << "SIGSEGV_Test: kill pid " << pid
<< " with return value=" << kill(pid, SIGKILL);
Sleep();
}
}
TEST(SignalTest, SIGILL_Test) {
pid_t pid;
pid = fork();
ASSERT_TRUE(pid >= 0);
if (pid == 0) {
// This code will cause SIGILL sent.
//asm("ud2");
std::raise(SIGILL);
} else {
Sleep();
RAY_LOG(ERROR) << "SIGILL_Test: kill pid " << pid
<< " with return value=" << kill(pid, SIGKILL);
Sleep();
}
}
} // namespace ray
int main(int argc, char **argv) {
InitShutdownRAII ray_log_shutdown_raii(ray::RayLog::StartRayLog,
ray::RayLog::ShutDownRayLog, argv[0],
ray::RayLogLevel::INFO,
/*log_dir=*/"");
ray::RayLog::InstallFailureSignalHandler();
::testing::InitGoogleTest(&argc, argv);
int failed = RUN_ALL_TESTS();
return failed;
}
Ray is successfully built it seems like work well.
I have no knowledge about assembler so I'm not sure whether std::raise(SIGILL)
do the same thing.
Please consider changing asm("ud2") for non x86 users if two are equivalent.
I hope this helps other guys who want to run ray on Jetson or other ARM64 cpus.
@gliese581gg I'm glad you got it working! Do you want to submit a PR to change asm("ud2") to std::raise(SIGILL)?
I created a PR in https://github.com/ray-project/ray/pull/3800