Lightgbm: Default Parameter Set & Ranges for Hypertuning

Created on 17 Feb 2018 · 6Comments · Source: microsoft/LightGBM

I have obtained 82% binary classification accuracy on test-set with a static set of parameters. Then, I performed GridSearch to increase the accuracy. However, the resulting accuracy was lower.

I would like to ask what would be a default hypertuning search parameters for a regular data (not unbalanced) that a newbie can use to perform an initial test for binary classification problem.

I would be glad if you can suggest a comprehensive parameter set. It would be great if the range of GridSearch parameters that you would suggest would finish less than a day using a Macbook Pro.

Source

akaniklaus

All 6 comments

Check here: https://github.com/Microsoft/LightGBM/issues/695

Applies also to balanced datasets if you remove scale_pos_weight.

Laurae2 on 17 Feb 2018

👍1

@Laurae2 Thanks a lot!!! I will try that. I was just about to reply to you in that issue in order to ask this.

akaniklaus on 17 Feb 2018

@Laurae2 Can you give smaller ranges or step-sizes for num_leaves, max_depth as the current ones that you've given are too large. (It can take too long to tune them without a high step-size on a regular computer). For ones that you haven't recommended tuning, can you suggest default parameters if they are different than the default used by LightGBM. Also, what would be your suggestion about static ones such as boosting type? Should I just try all possible options, or are you able to say that e.g. 'dart' would be preferred over others.

akaniklaus on 17 Feb 2018

@akaniklaus Never use grid search, use random search or bayesian optimization.

Use GBDT only unless you have a special need to use DART or GOSS.

Laurae2 on 17 Feb 2018

👍1

@Laurae2 Thanks!!! I switched to random search as I don't know how I can use bayesian optimization. Do you have any recommendations about feature_fraction (I have 400 features and some of them can be vague or similar to each other) Also, what about lambda_l1, lambda_l2 and min_gain_to_split? Lastly, should I set min_data_in_leaf and min_sum_hessian_in_leaf parameters to one and zero because my sample data size is quite small (2000 items)? I also didn't figure out how can I enable early stopping, should I use early_stopping_round? Finally, what should be early_stopping_rounds? P.S. P.S. I actually get better validation accuracy with feature_fraction and bagging_fraction rather than subsample and colsample_bytree, probably because I am using a bag of features.

akaniklaus on 17 Feb 2018

@akaniklaus you can try:

feature_fraction between (0.3 ~ 0.9), its best value is depend on your dataset.
if your dataset is large, you can ignore lambda_l1, lambda_l2 and min_gain_to_split. If the dataset is small, you can try a relative small lambda_l1 and lambda_l2, e.g. 0.01 or 0.1.
I recommend to use min_data_in_leaf, and you can set it to a value smaller than num_total_data / num_leaves.
you can use 5, 10, 20, 50, 100, for early_stopping_rounds.