Xgboost: GPU plugin crashes for python api

Created on 9 Jun 2017  Â·  12Comments  Â·  Source: dmlc/xgboost

For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.

Environment info

Operating System: ubuntu 4.4.0-21-generic

Compiler: gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1)

Package used (python/R/jvm/C++): python

xgboost version used: cloned today (6/8/2017) fresh using command:
git clone --recursive https://github.com/dmlc/xgboost

If installing from source, please provide

here are my install steps:

pip install numpy
pip install scipy
pip install -U scikit-learn

CUB

wget https://github.com/NVlabs/cub/archive/1.6.4.zip
unzip 1.6.4.zip

cmake

wget https://cmake.org/files/v3.7/cmake-3.7.2-Linux-x86_64.sh
yes | /bin/sh cmake-3.7.2-Linux-x86_64.sh

xgboost

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
mkdir build
cd build
../../cmake-3.7.2-Linux-x86_64/bin/cmake .. -DPLUGIN_UPDATER_GPU=ON -DCUB_DIRECTORY=../../cub-1.6.4 -DCUDA_NVCC_FLAGS="--expt-extended-lambda"
make

  1. The commit hash (git rev-parse HEAD)
    N/A

  2. Logs will be helpful (If logs are large, please upload as attachment).

[16:37:43] Tree method is automatically selected to be 'approx' for faster speed. to use old behavior(exact greedy algorithm on single machine), set tree_method to 'exact'
[16:50:22] Device: [0] Tesla P100-PCIE-16GB
[16:50:22] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]

[16:50:22] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:73: GPU plugin exception: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x475) [0x7fd899fb45b5]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd8d88c23df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd8d88c6d82]

Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:73: GPU plugin exception: [16:50:22] /task_runtime/xgboost/plugin/updater_gpu/src/gpu_hist_builder.cu:582: Check failed: fmat.SingleColBlock() grow_gpu_hist: must have single column block. Try setting 'tree_method' parameter to 'exact'

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder8InitDataERKSt6vectorINS_9bst_gpairESaIS3_EERNS_7DMatrixERKNS_7RegTreeE+0xaf) [0x7fd899fca45f]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree14GPUHistBuilder6UpdateERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEPNS_7RegTreeE+0x2b) [0x7fd899fcc6ab]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x282) [0x7fd899fb43c2]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (6) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (7) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd899e24c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree12GPUHistMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x475) [0x7fd899fb45b5]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd899ec66a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd899ec776d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd899facc6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd899e16c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd8d86b2e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd8d86b28ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd8d88c23df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd8d88c6d82]

If you are using python package, please provide

  1. The python version and distribution
    Python 2.7.12 (default, Nov 19 2016, 06:48:10)

  2. The command to install xgboost if you are not installing from source
    (see above)

Steps to reproduce

I can't provide the data which is proprietary but here are the params I used for training:

(note: depth = 10, num_round = 200)

    dtrain = xgb.DMatrix(train_feature_file)
    train_grps = []
    with open(train_group_file) as fi:
        for line in fi:
            grp_sz = int(line.strip())
            train_grps.append(grp_sz)
    dtrain.set_group(train_grps)

    param = {'silent':0, 'objective':'rank:pairwise', 'eta':0.1, 'gamma':1.0, 'min_child_weight':0.1, 'max_depth': int(depth), 'updater':'grow_gpu', 'scale_pos_weight':11.0}

    watchlist  = [(dtrain,'train')]
    bst = xgboost.train(param, dtrain, num_round, watchlist)

1.
2.
3.

What have you tried?

  1. I also tried setting 'tree_method' to 'exact' as suggested from the message in the log above, but it resulted in another crash with less info

[17:24:50] Device: [0] Tesla P100-PCIE-16GB
Segmentation fault

Most helpful comment

Yes, currently the GPU does not swap back to disk or main memory. You can check your data size and compare with how much memory you have on the GPU. A PR is in to support multi-GPU to allow more memory across multiple GPUs, but otherwise support for swapping to disk or memory will be only later.

All 12 comments

another thing tried

And when I tried to set 'updater':'grow_gpu' (previously in the above crash I was using 'updater':'grow_gpu_hist') I am getting similar crash:

[17:52:33] Tree method is automatically selected to be 'approx' for faster speed. to use old behavior(exact greedy algorithm on single machine), set tree_method to 'exact'
[18:02:39] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [18:02:39] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: exact::GPUBuilder - must have 1 column block

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fdb98ccec4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fdb98e6dfce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fdb98d706a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fdb98d7176d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fdb98e56c6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fdb98cc0c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fdbc755ce40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fdbc755c8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fdbc776c3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fdbc7770d82]

Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [18:02:39] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: exact::GPUBuilder - must have 1 column block

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fdb98ccec4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fdb98e6dfce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fdb98d706a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fdb98d7176d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fdb98e56c6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fdb98cc0c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fdbc755ce40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fdbc755c8ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fdbc776c3df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fdbc7770d82]

one more thing tried:

And when I tried to set 'updater':'grow_gpu' and 'tree_method':'exact' (previously in the above crash I was using 'updater':'grow_gpu' but did not set 'tree_method':'exact') I am getting a different crash:

[18:51:01] /task_runtime/xgboost/dmlc-core/include/dmlc/././logging.h:300: [18:51:01] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: vector::reserve

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd9d2ac2c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fd9d2c61fce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd9d2b646a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd9d2b6576d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd9d2c4ac6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd9d2ab4c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd9e3350e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd9e33508ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd9e35603df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd9e3564d82]

Traceback (most recent call last):
File "./ranker_boost.py", line 445, in
bst = xgb.train(param, dtrain, num_round, watchlist)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/task_runtime/xgboost/python-package/xgboost/training.py", line 74, in _train_internal
bst.update(dtrain, i, obj)
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 827, in update
dtrain.handle))
File "/task_runtime/xgboost/python-package/xgboost/core.py", line 130, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [18:51:01] /task_runtime/xgboost/plugin/updater_gpu/src/updater_gpu.cu:40: GPU plugin exception: vector::reserve

Stack trace returned 10 entries:
[bt] (0) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7fd9d2ac2c4c]
[bt] (1) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost4tree8GPUMakerINS0_9GradStatsEE6UpdateERKSt6vectorINS_9bst_gpairESaIS5_EEPNS_7DMatrixERKS4_IPNS_7RegTreeESaISD_EE+0x3cae) [0x7fd9d2c61fce]
[bt] (2) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree13BoostNewTreesERKSt6vectorINS_9bst_gpairESaIS3_EEPNS_7DMatrixEiPS2_ISt10unique_ptrINS_7RegTreeESt14default_deleteISB_EESaISE_EE+0x8c3) [0x7fd9d2b646a3]
[bt] (3) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost3gbm6GBTree7DoBoostEPNS_7DMatrixEPSt6vectorINS_9bst_gpairESaIS5_EEPNS_11ObjFunctionE+0x86d) [0x7fd9d2b6576d]
[bt] (4) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(_ZN7xgboost11LearnerImpl13UpdateOneIterEiPNS_7DMatrixE+0x22b) [0x7fd9d2c4ac6b]
[bt] (5) /task_runtime/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterUpdateOneIter+0x27) [0x7fd9d2ab4c67]
[bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fd9e3350e40]
[bt] (7) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fd9e33508ab]
[bt] (8) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fd9e35603df]
[bt] (9) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fd9e3564d82]

I have tried everything I could think of. Please take a look and let me know what I can do to use the GPU! :) I have 13 million entries so far but still have RAM left so want to put in more data. But currently even on machines with 32 CPU it's taking 1 hour for every 3 rounds, so I am hoping GPU can speed it up...thanks in advance

[18:35:57] 13264985x937 matrix with 12429290945 entries loaded

As you saw, the "Try setting 'tree_method' parameter to 'exact'" is related to the limited capabilities of the underlying GPU algorithm.

As for the "vector::reserve" error, it seems likely you are exceeding the memory of the GPU. Can you compute the total bytes of your data and compare that with your GPU memory? The error is not very informative, of course, and this will be improved.

Thanks for the prompt response. Can you please elaborate on the supported parameter combination? Are you saying specifying both ‘grow_gpu’ and ‘exact’ is the only supported combination? I am also confused because I thought ‘grow_gpu’ means it already performs the same exact algorithm while ‘grow_gpu_hist’ is approximate, as mentioned on https://github.com/dmlc/xgboost/blob/master/plugin/updater_gpu/README.md ?

Regarding the vector::reserve error, are you saying the training data size has to be less than memory available on GPU? If so that sounds very limiting and I am forced to choose between 1) multiple cpu with more memory (500GB) but slow, or 2) 1 gpu with a lot less memory (16GB) but fast? I was expecting the algorithm to move data from main memory to memory on gpu as training goes on?

I meet with same error. I use TitanX GPU. The same algorithm can be run successfully with a smaller dataset. I wonder whether it fails because my dataset is too large?

Yes, currently the GPU does not swap back to disk or main memory. You can check your data size and compare with how much memory you have on the GPU. A PR is in to support multi-GPU to allow more memory across multiple GPUs, but otherwise support for swapping to disk or memory will be only later.

hi pseudo tensor - thanks for confirming, can you please also elaborate on the supported parameter combination? Are you saying specifying both ‘grow_gpu’ and ‘exact’ is the only supported combination? I am also confused because I thought ‘grow_gpu’ means it already performs the same exact algorithm while ‘grow_gpu_hist’ is approximate, as mentioned on https://github.com/dmlc/xgboost/blob/master/plugin/updater_gpu/README.md ?

@RAMitchell probably has a better answer for this.

For the time being specify tree method 'exact' with all GPU algorithms. The reason for this is to prevent XGBoost from automatically using distributed mode which does not work with our algorithms.

I realise this doesn't make sense and we will hopefully change the API soon.

@pseudotensor @RAMitchell I'm observing the same issue without regard to GPU.

Having the same issue, installed with

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; make -j4
cd python-package; sudo python setup.py install

Gcc versioning info:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x39) [0x7f2e412fd869]
[bt] (1) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost4tree8ColMakerINS0_9GradStatsENS0_12NoConstraintEE7Builder8InitDataERKSt6vectorINS_9bst_gpairESaIS7_EERKNS_7DMatrixERKNS_7RegTreeE+0x725) [0x7f2e413ffe45]
[bt] (2) /usr/local/lib/python2.7/dist-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost4tree8ColMakerINS0_9GradStatsENS0_12NoConstraintEE7Builder6UpdateERKSt6vectorINS_9bst_gpairESaIS7_EEPNS_7DMatrixEPNS_7RegTreeE+0x27) [0x7f2e41401137]

This is the same issue as #2278 , or at least very linked.

@CSNoyes @RAMitchell @pseudotensor
I am also getting this problem with the non-GPU version:

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost; make -j4
cd python-package; sudo python setup.py install

Gcc version info:
gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0

from xgboost import XGBClassifier
x = np.array([[1,2,3],[2,4,6]])
y = np.array([1,0,1])
model = XGBClassifier()
model.fit(x, y)

Traceback (most recent call last):
  File "<ipython-input-5-d3dc977168f5>", line 1, in <module>
    model.fit(x, y)

  File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/sklearn.py", line 507, in fit
    verbose_eval=verbose, xgb_model=None)

  File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.py", line 204, in train
    xgb_model=xgb_model, callbacks=callbacks)

  File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/training.py", line 74, in _train_internal
    bst.update(dtrain, i, obj)

  File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.py", line 896, in update
    dtrain.handle))

  File "/home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/core.py", line 130, in _check_call
    raise XGBoostError(_LIB.XGBGetLastError())

XGBoostError: [10:26:39] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 10 entries:
[bt] (0) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f3fd4ee998c]
[bt] (1) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost3obj18LogisticRegression12ProbToMarginEf+0x2bf) [0x7f3fd4f74caf]
[bt] (2) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(_ZN7xgboost11LearnerImpl13LazyInitModelEv+0x2f3) [0x7f3fd4ef4b13]
[bt] (3) /home/stefan/.local/lib/python2.7/site-packages/xgboost-0.6-py2.7.egg/xgboost/libxgboost.so(XGBoosterUpdateOneIter+0x33) [0x7f3fd5054fe3]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f402f895e18]
[bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x32a) [0x7f402f89587a]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x2a4) [0x7f402faa8844]
[bt] (7) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x10245) [0x7f402faa8245]
[bt] (8) /usr/bin/python(PyEval_EvalFrameEx+0x54c0) [0x5566b14c3650]
[bt] (9) /usr/bin/python(PyEval_EvalCodeEx+0x35a) [0x5566b14bbb3a]
Was this page helpful?
0 / 5 - 0 ratings

Related issues

wenbo5565 picture wenbo5565  Â·  3Comments

pplonski picture pplonski  Â·  3Comments

lizsz picture lizsz  Â·  3Comments

XiaoxiaoWang87 picture XiaoxiaoWang87  Â·  3Comments

ivannz picture ivannz  Â·  3Comments