For a data with 24 columns and 1000 records, running automl with following parameter set
"LGBM_Estimators" : [100,200,300,400,500,600,700,800,900,1000],
"LGBM_Learning_Rate" : [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0],
"LGBM_ColByTree" : [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0],
"LGBM_SubSample" : [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0],
"LGBM_MinCWeight" : [0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009],
"LGBM_NumLeaves" : [8,16,24,32,36,40,46,52,58,64,68,72]
throws lot of repeated below warning
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
May i know whats the mistake i am making?
Thanks
Hi @hanzigs , thanks for using LightGBM!
This doesn't mean that you've made any "mistakes", necessarily. That warning means that the boosting process has effectively ended early, because you've overfit to the training data.
The default min_data_in_leaf for LightGBM is 20 (https://lightgbm.readthedocs.io/en/latest/Parameters.html#min_data_in_leaf). That means that any split that would create a leaf node with less than 20 records in it is ignored. Setting num_leaves to values as high as 72 will grow very deep trees, and I think it's too aggressive for a dataset with only 1000 records.
If you want to avoid this in the future, you can enable early stopping by passing early_stopping_rounds and a validation set: https://lightgbm.readthedocs.io/en/latest/Python-Intro.html?highlight=early%20stopping#early-stopping. That will stop the training process after a few consecutive iterations with no gain.
I also recommend not using num_leaves values larger than 32 with such a small dataset.
Thanks @jameslamb
Its good now, for the automl I set the above default parameter set, so for testing, passed a small dataset and got the warnings
Most helpful comment
Hi @hanzigs , thanks for using LightGBM!
This doesn't mean that you've made any "mistakes", necessarily. That warning means that the boosting process has effectively ended early, because you've overfit to the training data.
The default
min_data_in_leaffor LightGBM is 20 (https://lightgbm.readthedocs.io/en/latest/Parameters.html#min_data_in_leaf). That means that any split that would create a leaf node with less than 20 records in it is ignored. Settingnum_leavesto values as high as 72 will grow very deep trees, and I think it's too aggressive for a dataset with only 1000 records.If you want to avoid this in the future, you can enable early stopping by passing
early_stopping_roundsand a validation set: https://lightgbm.readthedocs.io/en/latest/Python-Intro.html?highlight=early%20stopping#early-stopping. That will stop the training process after a few consecutive iterations with no gain.I also recommend not using
num_leavesvalues larger than 32 with such a small dataset.