For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.
Operating System: ubuntu 4.4.0-21-generic
Compiler: gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1)
Package used (python/R/jvm/C++): python
xgboost
version used: cloned today (6/8/2017) fresh using command:
git clone --recursive https://github.com/dmlc/xgboost
If installing from source, please provide
here are my install steps:
pip install numpy
pip install scipy
pip install -U scikit-learn
wget https://github.com/NVlabs/cub/archive/1.6.4.zip
unzip 1.6.4.zip
wget https://cmake.org/files/v3.7/cmake-3.7.2-Linux-x86_64.sh
yes | /bin/sh cmake-3.7.2-Linux-x86_64.sh
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
mkdir build
cd build
../../cmake-3.7.2-Linux-x86_64/bin/cmake .. -DPLUGIN_UPDATER_GPU=ON -DCUB_DIRECTORY=../../cub-1.6.4 -DCUDA_NVCC_FLAGS="--expt-extended-lambda"
make
The commit hash (git rev-parse HEAD
)
N/A
Logs will be helpful (If logs are large, please upload as attachment).
[16:37:43] Tree method is automatically selected to be 'approx' for faster speed. to use old behavior(exact greedy algorithm on single machine), set tree_method to 'exact'
[16:50:22] Device: [0] Tesla P100-PCIE-16GB
[16:50:22] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
[16:50:22] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:73: GPU plugin exception: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x475) [0x7fd899fb45b5]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd8d88c23df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd8d88c6d82]
Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:73: GPU plugin exception: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x475) [0x7fd899fb45b5]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd8d88c23df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd8d88c6d82]
If you are using python package, please provide
The python version and distribution
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
The command to install xgboost
if you are not installing from source
(see above)
I can't provide the data which is proprietary but here are the params I used for training:
(note: depth = 10, num_round = 200)
dtrain = xgb.DMatrix(train_feature_file)
train_grps = []
with open(train_group_file) as fi:
for line in fi:
grp_sz = int(line.strip())
train_grps.append(grp_sz)
dtrain.set_group(train_grps)
param = {'silent':0, 'objective':'rank:pairwise', 'eta':0.1, 'gamma':1.0, 'min_child_weight':0.1, 'max_depth': int(depth), 'updater':'grow_gpu', 'scale_pos_weight':11.0}
watchlist = [(dtrain,'train')]
bst = xgboost.train(param, dtrain, num_round, watchlist)
1.
2.
3.
[17:24:50] Device: [0] Tesla P100-PCIE-16GB
Segmentation fault
And when I tried to set 'updater':'grow_gpu' (previously in the above crash I was using 'updater':'grow_gpu_hist') I am getting similar crash:
[17:52:33] Tree method is automatically selected to be 'approx' for faster speed. to use old behavior(exact greedy algorithm on single machine), set tree_method to 'exact'
[18:02:39] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [18:02:39] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: exact::GPUBuilder - must have 1 column block
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fdb98ccec4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fdb98e6dfce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fdb98d706a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fdb98d7176d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fdb98e56c6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fdb98cc0c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fdbc755ce40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fdbc755c8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fdbc776c3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fdbc7770d82]
Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [18:02:39] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: exact::GPUBuilder - must have 1 column block
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fdb98ccec4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fdb98e6dfce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fdb98d706a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fdb98d7176d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fdb98e56c6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fdb98cc0c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fdbc755ce40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fdbc755c8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fdbc776c3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fdbc7770d82]
And when I tried to set 'updater':'grow_gpu' and 'tree_method':'exact' (previously in the above crash I was using 'updater':'grow_gpu' but did not set 'tree_method':'exact') I am getting a different crash:
[18:51:01] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [18:51:01] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: vector::reserve
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd9d2ac2c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fd9d2c61fce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd9d2b646a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd9d2b6576d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd9d2c4ac6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd9d2ab4c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd9e3350e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd9e33508ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd9e35603df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd9e3564d82]
Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [18:51:01] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: vector::reserve
Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd9d2ac2c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fd9d2c61fce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd9d2b646a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd9d2b6576d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd9d2c4ac6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd9d2ab4c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd9e3350e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd9e33508ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd9e35603df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd9e3564d82]
I have tried everything I could think of. Please take a look and let me know what I can do to use the GPU! :) I have 13 million entries so far but still have RAM left so want to put in more data. But currently even on machines with 32 CPU it's taking 1 hour for every 3 rounds, so I am hoping GPU can speed it up...thanks in advance
[18:35:57] 13264985x937 matrix with 12429290945 entries loaded
As you saw, the "Try setting 'tree_method' parameter to 'exact'" is related to the limited capabilities of the underlying GPU algorithm.
As for the "vector::reserve" error, it seems likely you are exceeding the memory of the GPU. Can you compute the total bytes of your data and compare that with your GPU memory? The error is not very informative, of course, and this will be improved.
Thanks for the prompt response. Can you please elaborate on the supported parameter combination? Are you saying specifying both ‘grow_gpu’ and ‘exact’ is the only supported combination? I am also confused because I thought ‘grow_gpu’ means it already performs the same exact algorithm while ‘grow_gpu_hist’ is approximate, as mentioned on https://github.com/dmlc/xgboost/blob/master/plugin/updater_gpu/README.md ?
Regarding the vector::reserve error, are you saying the training data size has to be less than memory available on GPU? If so that sounds very limiting and I am forced to choose between 1) multiple cpu with more memory (500GB) but slow, or 2) 1 gpu with a lot less memory (16GB) but fast? I was expecting the algorithm to move data from main memory to memory on gpu as training goes on?
I meet with same error. I use TitanX GPU. The same algorithm can be run successfully with a smaller dataset. I wonder whether it fails because my dataset is too large?
Yes, currently the GPU does not swap back to disk or main memory. You can check your data size and compare with how much memory you have on the GPU. A PR is in to support multi-GPU to allow more memory across multiple GPUs, but otherwise support for swapping to disk or memory will be only later.
hi pseudo tensor - thanks for confirming, can you please also elaborate on the supported parameter combination? Are you saying specifying both ‘grow_gpu’ and ‘exact’ is the only supported combination? I am also confused because I thought ‘grow_gpu’ means it already performs the same exact algorithm while ‘grow_gpu_hist’ is approximate, as mentioned on https://github.com/dmlc/xgboost/blob/master/plugin/updater_gpu/README.md ?
@RAMitchell probably has a better answer for this.
For the time being specify tree method 'exact' with all GPU algorithms. The reason for this is to prevent XGBoost from automatically using distributed mode which does not work with our algorithms.
I realise this doesn't make sense and we will hopefully change the API soon.
@pseudotensor @RAMitchell I'm observing the same issue without regard to GPU.
Having the same issue, installed with
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; make -j4
cd python-package; sudo python setup.py install
Gcc versioning info:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f2e412fd869]
[bt] (1) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost4tree8ColMakerINS0_9GradStatsENS0_12NoConstraintEE7Builder8InitDataERKSt6vectorINS_9bst_gpairESaIS7_EERKNS_7DMatrixERKNS_7RegTreeE+0x725) [0x7f2e413ffe45]
[bt] (2) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost4tree8ColMakerINS0_9GradStatsENS0_12NoConstraintEE7Builder6UpdateERKSt6vectorINS_9bst_gpairESaIS7_EEPNS_7DMatrixEPNS_7RegTreeE+0x27) [0x7f2e41401137]
This is the same issue as #2278 , or at least very linked.
@CSNoyes @RAMitchell @pseudotensor
I am also getting this problem with the non-GPU version:
git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; make -j4
cd python-package; sudo python setup.py install
Gcc version info:
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
from xgboost import XGBClassifier
x = np.array([[1,2,3],[2,4,6]])
y = np.array([1,0,1])
model = XGBClassifier()
model.fit(x, y)
Traceback (most recent call last):
File "<ipython-input-5-d3dc977168f5>", line 1, in <module>
model.fit(x, y)
File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/sklearn.py", line 507, in fit
verbose_eval=verbose, xgb_model=None)
File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.py", line 896, in update
dtrain.handle))
File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
XGBoostError: [10:26:39] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss
Stack trace returned 10 entries:
[bt] (0) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f3fd4ee998c]
[bt] (1) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost3obj18LogisticRegression12ProbToMarginEf+0x2bf) [0x7f3fd4f74caf]
[bt] (2) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost11LearnerImpl13LazyInitModelEv+0x2f3) [0x7f3fd4ef4b13]
[bt] (3) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(XGBoosterUpdateOneIter+0x33) [0x7f3fd5054fe3]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f402f895e18]
[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x32a) [0x7f402f89587a]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x2a4) [0x7f402faa8844]
[bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x10245) [0x7f402faa8245]
[bt] (8) /usr/bin/python(PyEval_EvalFrameEx+0x54c0) [0x5566b14c3650]
[bt] (9) /usr/bin/python(PyEval_EvalCodeEx+0x35a) [0x5566b14bbb3a]
Most helpful comment
Yes, currently the GPU does not swap back to disk or main memory. You can check your data size and compare with how much memory you have on the GPU. A PR is in to support multi-GPU to allow more memory across multiple GPUs, but otherwise support for swapping to disk or memory will be only later.