Lightgbm: is_unbalance=true produces error.

Created on 3 Nov 2016 · 5Comments · Source: microsoft/LightGBM

Hi, I am performing binary classification with an unbalanced target. I would like to try the is_unbalance setting, however when I set it to 'true' in my config file, I get the message "cannot find more split with gain = -inf , current #leaves=1". When I set to false, it trains without an issue.
Also I cannot tell from the configuration page how this parameter will be used in the model. It would be great if you can advise. Config file here

With is_unbalance=true
_[LightGBM] [Error] Feature Column_575 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_577 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_578 only contains one value, will be ignored
[LightGBM] [Info] Finish loading data, use 21.484110 seconds
[LightGBM] [Info] Number of postive:5503, number of negative:941495
[LightGBM] [Info] Number of data:946998, Number of features:1542
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = -inf , current #leaves=1
[LightGBM] [Info] Can't training anymore, there isn't any leaf meets split requirements.
[LightGBM] [Info] Finish train_

With is_unbalance=false
_[LightGBM] [Error] Feature Column_575 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_577 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_578 only contains one value, will be ignored
[LightGBM] [Info] Finish loading data, use 21.266789 seconds
[LightGBM] [Info] Number of postive:5503, number of negative:941495
[LightGBM] [Info] Number of data:946998, Number of features:1542
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] 0.547807 seconds elapsed, finished 1 iteration
[LightGBM] [Info] 0.918354 seconds elapsed, finished 2 iteration
[LightGBM] [Info] 1.394204 seconds elapsed, finished 3 iteration
[LightGBM] [Info] 1.808137 seconds elapsed, finished 4 iteration
[LightGBM] [Info] 2.188765 seconds elapsed, finished 5 iteration
[LightGBM] [Info] 2.593393 seconds elapsed, finished 6 iteration_

Source

darraghdog

Most helpful comment

Also I cannot tell from the configuration page how this parameter will be used in the model.

is_unbalance for binary classification in LightGBM sets the weights of the negative class to the sum of positive labels / sum of negative labels. I think it is better to change the bias (init_score) and leave is_unbalance alone (unless you want to assign weights to labels).

Looks like Bosch data set at Kaggle, use init_score (bias) if you want to converge faster. is_unbalance will balance the positive and negative labels weights (by tuning the negative weight), and as there is no simple and direct way in Bosch data set to separate the positive and negative class with balanced weights, it should not find any split point.

Note that with is_unbalanced = true, the total weights become 2x count of positive labels (instead of the total being the amount of samples). You will need to adjust regularization accordingly to fit these 1.1% total weight (5503x2) instead of the original 100% weights (946998) (with is_unbalanced = false).