I can build the regular R, C and python packages just fine. Even the first test cases from the JVM based build i.e. where the DMatrix is tested are green. However with rabit/JNI on spark
https://gist.github.com/geoHeil/bc88c2b849eca875e580b8ff170fd598 I see only JNI error messages.
Operating System: mac osx 10.12.5
Compiler: gcc7
Package used (python/R/jvm/C++): JVM
xgboost
version used: current master branch
If installing from source, please provide
git rev-parse HEAD
) cd7659937b2c6a4a82988a72761a7f21d9b53743If you are using jvm package, please
gcc --version [±master ✓]
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
@CodingCat again a problem with rabit on OS X. Any thoughts on this? The regular JNI test cases succeeds, only when spark/rabit is involved these problems occur.
Tracker started, with env={}
still the issue with network address binding
I do not have bandwidth to work on it for now.....
https://github.com/dmlc/xgboost/issues/1004 suggest:
RabitTracker call Runtime to exec a command like "python ..." which is depended on env variable PATH. If there is an exception or an error, the return of getEnv() will be empty. Setting the correct python version via add python path to the beginning of PATH fixes this issues.
Maybe using the experimental Scala rabit implementation will help out here.
I am not 100% sure but can you check that your hostname
resolves to 127.0.0.1 in /etc/hosts
?
@superbobry
hostname ✹ âœ
Georgs-MacBook-Pro.local
and the hosts file
% cat /etc/hosts ✹ âœ
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
# https://github.com/docker/compose/issues/3419
# /etc/hosts
127.0.0.1 localunixsocket.local
Is this what you mean?
However, nslookup will fail for nslookup Georgs-MacBook-Pro.local
@superbobry unfortunately,
127.0.0.1 Georgs-MacBook-Pro.local
adapting the hosts file does not fix the problem.
Okay, could you try compiling with clang instead of gcc7? I imagine you're overriding CC/CXX with GCC, right?
Indeed, I am compiling with:
export CC=gcc-7
export CXX=g++-7
which settings would you suggest here?
when unset CC; unsetCXX
is applied, I see the same errors.
I suggest just to use the OS X defaults. It should build fine, but the resulting binary wouldn't have OMP support (hence single-thread only).
Update: sorry, didn't spot your edit. Could you also remove xgboost/build
directory to make sure you'd build from scratch?
Why should changing the compiler fix the RABIT networking issues?
Still failing. However, I observed a different error message this time:
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=192.168.5.160, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=8}
17/07/03 13:40:32 INFO RabitTracker$TrackerProcessLogger: 2017-07-03 13:40:32,249 WARNING gethostbyname(socket.getfqdn()) failed... trying on hostname()
Do you have any other error messages in the output?
Why should changing the compiler fix the RABIT networking issues?
It shouldn't, but I've observed the same error locally with GCC7 while clang build worked fine. Also, Travis is able to build&test xgboost4j using clang.
Intersting:
17/07/03 05:37:44 INFO RabitTracker$TrackerProcessLogger: 2017-07-03 05:37:44,489 WARNING gethostbyname(socket.getfqdn()) failed... trying on hostname()
is displayed on travis as well but does not show any problem afterwards: https://travis-ci.org/dmlc/xgboost/jobs/249494987#L2249
clang did not help me.
Could you confirm that you're having exactly the same issue as before?
Please see https://gist.github.com/geoHeil/c7a67b31b1f5b3eb390b35008f552855#file-errors-txt-L1110 for yourself
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=192.168.0.18, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=8}
Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss
And then the usual exception of: at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
as the source.
I am using Ubuntu 16.04
And I have the same error.
Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss.
Any way to fix it?
You can manually apply the patch in dmlc/dmlc-core#351.
Hi,
I have the same error too: https://gist.github.com/mizotm/914e146538c5720885e6e854eb97f07e
I'm on Ubuntu 16.04. The fix suggested by @superbobry allows the tests to run, but the scala code doesn't compile later with the fix.
my error is fixed with the patch suggested by @superbobry and it works in spark as well.
thank you sooooo much!
@superbobry Sorry I was mistaken, your fix does actually work. Thanks a lot!
I want to point out that fixing this issue doesn't require applying not yet merged patch. Original patch issue highlights that root cause is locale-dependent code for parameter parsing. 0.5 is parsed as 0 because input is expected to be 0,5 under certain locales (for example, russian). You can avoid this error by enforcing en_US locale (especially LC_NUMERIC):
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Build will pass fine with such settings. In my case locale slipped in on another host through ssh session because of SendEnv LANG LC_*
setting in /etc/ssh/ssh_config
Most helpful comment
my error is fixed with the patch suggested by @superbobry and it works in spark as well.
thank you sooooo much!