Lightgbm: lightgbm.basic.LightGBMError: Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704

Created on 21 Feb 2020  路  27Comments  路  Source: microsoft/LightGBM

version: 2.3.2

[LightGBM] [Fatal] Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704

Traceback (most recent call last):
  File "lgb_prefit_4ff5fa97-86b3-420c-aa87-5f01abcc18c3.py", line 10, in <module>
    model.fit(X, y, sample_weight=sample_weight, init_score=init_score, eval_set=eval_set, eval_names=valid_X_features, eval_sample_weight=eval_sample_weight, eval_init_score=init_score, eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds, feature_name=X_features, verbose=verbose_fit)
  File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 818, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 610, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/engine.py", line 250, in train
    booster.update(fobj=fobj)
  File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 2106, in update
    ctypes.byref(is_finished)))
  File "/home/jon/.pyenv/versions/3.6.7/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 46, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Bug in GPU histogram! split 11937: 12, smaller_leaf: 10245, larger_leaf: 1704

script and pickle file:

lgbm_histbug.zip

@sh1ng need help seeing if this is fixed in even later master.

bug

Most helpful comment

@guolinke Have just built it from the latest master branch, still fails. I'll try to separate a minimum reproducible example and create an issue then.

All 27 comments

I think the latest master branch will not produce this error anymore, as cnt is removed in histogram.

But this still is a potential bug in GPU learner. ping @huanzhang12

On master

[LightGBM] [Fatal] Check failed: best_split_info.right_count > 0 at /root/repo/LightGBM/src/treelearner/serial_tree_learner.cpp, line 706 .

Traceback (most recent call last):
  File "lgbm_histbug.py", line 8, in <module>
    model.fit(X, y, sample_weight=sample_weight, init_score=init_score, eval_set=eval_set, eval_names=valid_X_features, eval_sample_weight=eval_sample_weight, eval_init_score=init_score, eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds, feature_name=X_features, verbose=verbose_fit)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 829, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 614, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/engine.py", line 250, in train
    booster.update(fobj=fobj)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 2145, in update
    ctypes.byref(is_finished)))
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 46, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Check failed: best_split_info.right_count > 0 at /root/repo/LightGBM/src/treelearner/serial_tree_learner.cpp, line 706 .

it is still a GPU bug.
ping @huanzhang12

@guFalcon @huanzhang12 FYI, we are tracking a major accuracy issue with latest lightgbm compared to before. This is just a heads-up, perhaps it's related to this issue. But we'll post a separate issue once we have moment to generate MRE.

Thanks @pseudotensor , could the accuracy issue reproduce in CPU?

https://github.com/microsoft/LightGBM/issues/2813 yes, it's CPU run. Same setup with GPU hits this GPU histogram bug issue so can't be run.

But I think the GPU histogram is more generally occurring than the accuracy Issue #2813

I think this may be fixed by #2811 too.

So in the latest master branch, the CPU version is okay, while the GPU version failed?

@guolinke correct

stack trace of the error

/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/basic.py:893: UserWarning: categorical_feature keyword has been found in `params` and will be ignored.
Please use categorical_feature argument of the Dataset constructor to pass this parameter.
  .format(key))
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 22008
[LightGBM] [Info] Number of data points in the train set: 1348045, number of used features: 150
[LightGBM] [Info] Using GPU Device: GeForce MX150, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 138 dense feature groups (179.98 MB) transferred to GPU in 0.273129 secs. 1 sparse feature groups
[LightGBM] [Info] Start training from score -11.811581
[LightGBM] [Info] Start training from score -7.921803
[LightGBM] [Info] Start training from score -0.432866
[LightGBM] [Info] Start training from score -1.142893
[LightGBM] [Info] Start training from score -3.439298
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Fatal] Check failed: best_split_info.left_count > 0 at /root/repo/LightGBM/src/treelearner/serial_tree_learner.cpp, line 702 .

Traceback (most recent call last):
  File "lgb_accuracyissue.py", line 14, in <module>
    eval_init_score=init_score, eval_metric=eval_metric, early_stopping_rounds=early_stopping_rounds, feature_name=X_features, verbose=verbose_fit)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 829, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/sklearn.py", line 614, in fit
    callbacks=callbacks, init_model=init_model)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/engine.py", line 250, in train
    booster.update(fobj=fobj)
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 2145, in update
    ctypes.byref(is_finished)))
  File "/home/sh1ng/dev/.venv/lib/python3.6/site-packages/lightgbm_gpu/basic.py", line 46, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Check failed: best_split_info.left_count > 0 at /root/repo/LightGBM/src/treelearner/serial_tree_learner.cpp, line 702 .

Just letting you know that I'm unable to reproduce the issue with dataset originally provided, but it's easily reproducible with data from https://github.com/microsoft/LightGBM/issues/2813

@guolinke I'm trying to track down an issue where after upgrading to latest master branch in mmlspark I am seeing a similar error - any recommendations for code/commits I should look into to investigate what might be the root cause:

[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Trying to bind port 12422...
[LightGBM] [Info] Binding port 12422 succeeded
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Connecting to rank 1 failed, waiting for 200 milliseconds
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Trying to bind port 12426...
[LightGBM] [Info] Binding port 12426 succeeded
[LightGBM] [Info] Listening...
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Warning] Set TCP_NODELAY failed
[LightGBM] [Info] Local rank: 0, total number of machines: 2
[LightGBM] [Info] Connected to rank 0
[LightGBM] [Info] Local rank: 1, total number of machines: 2
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 610, number of negative: 762
[LightGBM] [Info] Number of positive: 610, number of negative: 762
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000514 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 916
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000664 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 916
[LightGBM] [Info] Number of data points in the train set: 686, number of used features: 4
[LightGBM] [Info] Number of data points in the train set: 686, number of used features: 4
[LightGBM] [Info] Start training from score -0.222518
[LightGBM] [Info] Start training from score -0.222518
[LightGBM] [Info] Finished linking network in 0.003935 seconds
[LightGBM] [Fatal] Check failed: best_split_info.left_count > 0 at /home/ilya/LightGBM/src/treelearner/serial_tree_learner.cpp, line 709 .

20/02/29 00:35:01 WARN LightGBMClassifier: LightGBM reached early termination on one worker, stopping training on worker. This message should rarely occur

Could it run by only one node?

@guolinke amazing insight! I tried 1 node instead of 2 and almost all of my tests passed (except 1 test due to the number of nodes which is expected)

image

Here is the output from the same test as above (except it was successful):

[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000942 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 327
[LightGBM] [Info] Number of data points in the train set: 106, number of used features: 9
[LightGBM] [Info] Start training from score -1.572397
[LightGBM] [Info] Start training from score -1.618917
[LightGBM] [Info] Start training from score -2.024382
[LightGBM] [Info] Start training from score -1.955389
[LightGBM] [Info] Start training from score -1.890850
[LightGBM] [Info] Start training from score -1.773067
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002017 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 327
[LightGBM] [Info] Number of data points in the train set: 106, number of used features: 9
[LightGBM] [Info] Start training from score -1.572397
[LightGBM] [Info] Start training from score -1.618917
[LightGBM] [Info] Start training from score -2.024382
[LightGBM] [Info] Start training from score -1.955389
[LightGBM] [Info] Start training from score -1.890850
[LightGBM] [Info] Start training from score -1.773067
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000835 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 327
[LightGBM] [Info] Number of data points in the train set: 106, number of used features: 9
[LightGBM] [Info] Start training from score -1.572397
[LightGBM] [Info] Start training from score -1.618917
[LightGBM] [Info] Start training from score -2.024382
[LightGBM] [Info] Start training from score -1.955389
[LightGBM] [Info] Start training from score -1.890850
[LightGBM] [Info] Start training from score -1.773067
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] metric is set=, metric= will be ignored. Current value: metric=
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001298 seconds.
You can set force_col_wise=true to remove the overhead.
[LightGBM] [Info] Total Bins 327
[LightGBM] [Info] Number of data points in the train set: 106, number of used features: 9
[LightGBM] [Info] Using GOSS
[LightGBM] [Info] Start training from score -1.572397
[LightGBM] [Info] Start training from score -1.618917
[LightGBM] [Info] Start training from score -2.024382
[LightGBM] [Info] Start training from score -1.955389
[LightGBM] [Info] Start training from score -1.890850
[LightGBM] [Info] Start training from score -1.773067
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

Note this is from this commit on 2/21 (both failing and successful runs):
"Better documentation for Contributing (#2781)"
I'm currently trying to work back to older versions/commits of lightgbm to see which commit is causing the tests to fail, but it is a slow process to build and update the jar and rerun the tests (I'm currently skipping small batches of commits at a time but I might do a binary search to make this optimal since it looks like the issue goes back before 2/21).

@imatiach-msft you can try the commit (509c2e50c25eded99fc0997afe25ebee1b33285d) and its parent (https://github.com/microsoft/LightGBM/commit/bc7bc4a158d47bd9a12b89de21176e1e67a6e961)

@guolinke you're right, it looks like the issue is with commit (509c2e5).
I validated that including that commit causes the error, and removing it fixes the issue.

@imatiach-msft could you share the data (and config) to me for the debugging?

@guolinke I'm running the mmlspark scala tests, maybe I can try to create an example that you can easily run?
You can find the lightgbm classifier tests here:
https://github.com/Azure/mmlspark/blob/master/src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala

The first test that failed was below, but I tried several others and they failed as well:
https://github.com/Azure/mmlspark/blob/master/src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala#L169

The compressed file with most datasets used in mmlspark can be found here:
https://mmlspark.blob.core.windows.net/installers/datasets-2020-01-20.tgz

@shiyu1994 con you help to investigate this too?
you can start from @imatiach-msft 's test.

Still happens in version 3.0

lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /root/repo/LightGBM/src/treelearner/serial_tree_learner.cpp, line 630

https://github.com/h2oai/h2o4gpu/blob/master/tests/python/open_data/gbm/test_lightgbm.py#L265-L284

@shiyu1994 con you help to investigate this too?
you can start from @imatiach-msft 's test.

Ok.

@shiyu1994 @guolinke FYI my issue was resolved when I upgraded after my fix https://github.com/microsoft/LightGBM/pull/3110 , but it sounds like others are still encountering issues similar to what I had

I have this issue with CPU learner, not GPU. Got it after upgrading from 2.3.1 to 3.0.0, makes every test with a tiny testing dataset fail for exactly the same reason:

lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /__w/1/s/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 630 .

@diditforlulz273 could you try the latest master branch?
if the problem still exists, you can create a new issue and would be better if you can provide a reproducible example.

@guolinke Have just built it from the latest master branch, still fails. I'll try to separate a minimum reproducible example and create an issue then.

+1, this bug makes lightgbm GPU useless. still happens to me on latest master

Was this page helpful?
0 / 5 - 0 ratings