Hi, I am performing binary classification with an unbalanced target. I would like to try the is_unbalance setting, however when I set it to 'true' in my config file, I get the message "cannot find more split with gain = -inf , current #leaves=1". When I set to false, it trains without an issue.
Also I cannot tell from the configuration page how this parameter will be used in the model. It would be great if you can advise. Config file here
With is_unbalance=true
_[LightGBM] [Error] Feature Column_575 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_577 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_578 only contains one value, will be ignored
[LightGBM] [Info] Finish loading data, use 21.484110 seconds
[LightGBM] [Info] Number of postive:5503, number of negative:941495
[LightGBM] [Info] Number of data:946998, Number of features:1542
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = -inf , current #leaves=1
[LightGBM] [Info] Can't training anymore, there isn't any leaf meets split requirements.
[LightGBM] [Info] Finish train_
With is_unbalance=false
_[LightGBM] [Error] Feature Column_575 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_577 only contains one value, will be ignored
[LightGBM] [Error] Feature Column_578 only contains one value, will be ignored
[LightGBM] [Info] Finish loading data, use 21.266789 seconds
[LightGBM] [Info] Number of postive:5503, number of negative:941495
[LightGBM] [Info] Number of data:946998, Number of features:1542
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] 0.547807 seconds elapsed, finished 1 iteration
[LightGBM] [Info] 0.918354 seconds elapsed, finished 2 iteration
[LightGBM] [Info] 1.394204 seconds elapsed, finished 3 iteration
[LightGBM] [Info] 1.808137 seconds elapsed, finished 4 iteration
[LightGBM] [Info] 2.188765 seconds elapsed, finished 5 iteration
[LightGBM] [Info] 2.593393 seconds elapsed, finished 6 iteration_
Also I cannot tell from the configuration page how this parameter will be used in the model.
is_unbalance for binary classification in LightGBM sets the weights of the negative class to the sum of positive labels / sum of negative labels. I think it is better to change the bias (init_score) and leave is_unbalance alone (unless you want to assign weights to labels).
Looks like Bosch data set at Kaggle, use init_score (bias) if you want to converge faster. is_unbalance will balance the positive and negative labels weights (by tuning the negative weight), and as there is no simple and direct way in Bosch data set to separate the positive and negative class with balanced weights, it should not find any split point.
Note that with is_unbalanced = true, the total weights become 2x count of positive labels (instead of the total being the amount of samples). You will need to adjust regularization accordingly to fit these 1.1% total weight (5503x2) instead of the original 100% weights (946998) (with is_unbalanced = false).
@darraghdog can you update to latest code and try again?
@guolinke I ran the below and still face the issue... I am on Ubuntu
cd LightGBM;git fetch origin; git pull;git status;
cd build;
cmake ..;
make -j
@darraghdog I have updated the code just now. You can have a try.
BTW, you also set "min_sum_hessian_in_leaf" to a smaller value, e.g. 1 .
Feel free to reopen it, if issue still there
Most helpful comment
is_unbalancefor binary classification in LightGBM sets the weights of the negative class to the sum of positive labels / sum of negative labels. I think it is better to change the bias (init_score) and leave is_unbalance alone (unless you want to assign weights to labels).Looks like Bosch data set at Kaggle, use
init_score(bias) if you want to converge faster. is_unbalance will balance the positive and negative labels weights (by tuning the negative weight), and as there is no simple and direct way in Bosch data set to separate the positive and negative class with balanced weights, it should not find any split point.Note that with
is_unbalanced = true, the total weights become 2x count of positive labels (instead of the total being the amount of samples). You will need to adjust regularization accordingly to fit these 1.1% total weight (5503x2) instead of the original 100% weights (946998) (withis_unbalanced = false).