Lightgbm: hyper parameter optimization - suggested parameter grid

Created on 16 Jul 2017 · 15Comments · Source: microsoft/LightGBM

Which parameter and which range of values would you consider most useful for hyper parameter optimization of light gbm during an bayesian optimization process for a highly imbalanced classification problem?

parameters denotes the search grid and static_parametersparameters which are statically applied during the search but not optimized for.

parameters = [
            dict(name="max_bin", type="int", bounds=dict(min=20, max=20000)),
            dict(name="learning_rate", type="double", bounds=dict(min=0.001, max=0.3)),
            dict(name="num_leaves", type="int", bounds=dict(min=100, max=4095)),
            # dict(name="num_leaves", type="int", bounds=dict(min=100, max=45000)),
            dict(name="scale_pos_weight", type="double", bounds=dict(min=0.01, max=2000.0)),
            dict(name="n_estimators", type="int", bounds=dict(min=10, max=10000)),
            dict(name="min_child_weight", type="int", bounds=dict(min=1, max=2000)),
            dict(name="subsample", type="double", bounds=dict(min=0.4, max=1)),
            dict(name="bagging_fraction", type="double", bounds=dict(min=0.3, max=1)),
            dict(name="max_depth", type="int", bounds=dict(min=2, max=50)),
        ]
static_parameters = {'boosting_type': 'dart', 'reg_alpha': 0, 'reg_lambda': 2, 'is_unbalance': True,
                             'min_split_gain': 0, 'min_child_samples': 10, 'colsample_bytree': 0.8, 'subsample_freq': 3,
                             'subsample_for_bin': 50000,
                             'histogram_pool_size': detect_available_memory_for_histogram_cache()}

Source

geoHeil

❤3

Most helpful comment

For heavily unbalanced datasets such as 1:10000:

max_bin: keep it only for memory pressure, not to tune (otherwise overfitting)
learning rate: keep it only for training speed, not to tune (otherwise overfitting)
n_estimators: must be infinite (like 9999999) and use early stopping to auto-tune (otherwise overfitting)
num_leaves: [7, 4095]
max_depth: [2, 63] and infinite (I personally saw metric performance increases with such 63 depth with small number of leaves on sparse unbalanced datasets)
scale_pos_weight: [1, 10000] (if over 10000, something might be wrong because I never saw it that good after 5000)
min_child_weight: [0.01, (sample size / 1000)] if you are using logloss (think about the hessian possible value range before putting "sample size / 1000", it is dataset-dependent and loss-dependent)
subsample: [0.4, 1]
bagging_freq: only 1, keep as is (otherwise overfitting)
colsample_bytree: [0.4, 1]
is_unbalance: false (make your own weighting with scale_pos_weight)
USE A CUSTOM METRIC (to reflect reality without weighting, otherwise you have weights inside your metric with premade metrics like xgboost)

Never tune these parameters unless you have an explicit requirement to tune them:

Learning rate (lower means longer to train but more accurate, higher means smaller to train but less accurate)
Number of boosting iterations (automatically tuned with early stopping and learning rate)
Maximum number of bins (RAM dependent)

Laurae2 on 16 Jul 2017

👍48 ❤17

All 15 comments

For heavily unbalanced datasets such as 1:10000:

max_bin: keep it only for memory pressure, not to tune (otherwise overfitting)
learning rate: keep it only for training speed, not to tune (otherwise overfitting)
n_estimators: must be infinite (like 9999999) and use early stopping to auto-tune (otherwise overfitting)
num_leaves: [7, 4095]
max_depth: [2, 63] and infinite (I personally saw metric performance increases with such 63 depth with small number of leaves on sparse unbalanced datasets)
scale_pos_weight: [1, 10000] (if over 10000, something might be wrong because I never saw it that good after 5000)
min_child_weight: [0.01, (sample size / 1000)] if you are using logloss (think about the hessian possible value range before putting "sample size / 1000", it is dataset-dependent and loss-dependent)
subsample: [0.4, 1]
bagging_freq: only 1, keep as is (otherwise overfitting)
colsample_bytree: [0.4, 1]
is_unbalance: false (make your own weighting with scale_pos_weight)
USE A CUSTOM METRIC (to reflect reality without weighting, otherwise you have weights inside your metric with premade metrics like xgboost)

Never tune these parameters unless you have an explicit requirement to tune them:

Learning rate (lower means longer to train but more accurate, higher means smaller to train but less accurate)
Number of boosting iterations (automatically tuned with early stopping and learning rate)
Maximum number of bins (RAM dependent)

Laurae2 on 16 Jul 2017

👍48 ❤17

@laurae2 thanks. When searching in the latest python documentation for the sklearn. API some of the parameters no longer seem to be present. This applies to is_unbalance and scale_pos_weight. Are these still accessible i.e. via kwargs?

geoHeil on 16 Jul 2017

Also, I did not find a thing in the documentation that both of these should be applied XOR i.e. in the past I had the best results with:
is_unbalance : True, scale_pos_weight: 0.1. Note, int the parameters.md they are still present - just not in the sklearn API.

USE A CUSTOM METRIC (to reflect reality without weighting, otherwise you have weights inside your metric with premade metrics like xgboost)

please could you explain this one a bit more. I thought that calculating something like the f_beta score based on the classification results should be sufficient here.

Also the hyper parameter tuning guide suggests looking at min_data_in_leaf. What about the other similar paraders of: min_child_weight, min_sum_hessian_in_leaf

geoHeil on 16 Jul 2017

@geoHeil never mind for the last part (about the metric), it seems it was fixed in LightGBM (the bug is still present in xgboost). The weights are not applied for the metric computation, which is very great to see.

If I remember, scale_pos_weight and is_unbalance are mixing themselves. They should still be accessible.

See here: https://github.com/Microsoft/LightGBM/blob/master/src/objective/binary_objective.hpp#L73-L82

Laurae2 on 16 Jul 2017

@wxchan question below:

@geoHeil: When searching in the latest python documentation for the sklearn. API some of the parameters no longer seem to be present. This applies to is_unbalance and scale_pos_weight. Are these still accessible i.e. via kwargs?

Laurae2 on 16 Jul 2017

@Laurae2 indeed, in the c code they are accessible - but https://github.com/Microsoft/LightGBM/blob/master/python-package/lightgbm/sklearn.py does not contain any references to these parameters.

geoHeil on 16 Jul 2017

@geoHeil You can check here for the whole list of parameters: https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.md

Laurae2 on 16 Jul 2017

@Laurae2 indeed - as written above the parameters are documented there and are used in the C code. I just find it strange that the python code i.e. in particular the scikit-learn API is not using any of these parameters.

May I clarify one more parameter: the hyper parameter tuning guide suggests looking at min_data_in_leaf. What about the other similar paraders of: min_child_weight, min_sum_hessian_in_leaf

geoHeil on 16 Jul 2017

@geoHeil I would just use min_child_weight unless it is really needed to expand the optimization dimensions (or if it is required to get the extra better digits on the performance metric).

Laurae2 on 16 Jul 2017

They are still accessible via kwargs. There is a test case here https://github.com/Microsoft/LightGBM/blob/master/tests/python_package_test/test_sklearn.py#L101. drop_rate is set via kwargs.

I remove some parameters only shown in some of boosting_types or some of tasks.

wxchan on 16 Jul 2017

if neg/pos is about 45-65, in this case, should we set scale_pos_weight to 1 or neg/pos ?
when i use BayesianOptimization(bayes tuning) to tune xgboost, if i set scale_pos_weight to neg/pos the ks on valid dataset is about 18, but if i set scale_pos_weight to 1, the ks on valid dataset up to 38 ... i don't know why ? can somebody help to explain ..

StutuPig on 15 Nov 2017

Thank you @Laurae2, for providing the guidelines for training models on Imbalanced dataset.
I am dealing with the multi-class problem of this kind. As a starting point, I am following the values you provided.
Do these also extend to the multi-class problem?

For heavily unbalanced datasets such as 1:10000:

* max_bin: keep it only for memory pressure, not to tune (otherwise overfitting)

* learning rate: keep it only for training speed, not to tune (otherwise overfitting)

* n_estimators: must be infinite (like 9999999) and use early stopping to auto-tune (otherwise overfitting)

* num_leaves: [7, 4095]

* max_depth: [2, 63] and infinite (I personally saw metric performance increases with such 63 depth with small number of leaves on sparse unbalanced datasets)

* scale_pos_weight: [1, 10000] (if over 10000, something might be wrong because I never saw it that good after 5000)

* min_child_weight: [0.01, (sample size / 1000)] if you are using logloss (think about the hessian possible value range before putting "sample size / 1000", it is dataset-dependent and loss-dependent)

* subsample: [0.4, 1]

* bagging_fraction: only 1, keep as is (otherwise overfitting)

* colsample_bytree: [0.4, 1]

* is_unbalance: false (make your own weighting with scale_pos_weight)

* **USE A CUSTOM METRIC** (to reflect reality without weighting, otherwise you have weights inside your metric with premade metrics like xgboost)

Never tune these parameters unless you have an explicit requirement to tune them:

* Learning rate (lower means longer to train but more accurate, higher means smaller to train but less accurate)

* Number of boosting iterations (automatically tuned with early stopping and learning rate)

* Maximum number of bins (RAM dependent)

salilmishra23 on 18 Jan 2019

@salilmishra23 yes, but avoid the following hyperparameters for multiclass because they are not supposed to be relevant for multiclass:

scale_pos_weight
is_unbalance

And use the trick found here (boost_from_average = False):

PR: https://github.com/Microsoft/LightGBM/pull/1940

Laurae2 on 19 Jan 2019

👍1

Thanks, @Laurae2 . This is the best guidelines I have found on this topic. Could you clarify if this is a typo:

* bagging_fraction: only 1, keep as is (otherwise overfitting)

It contradicts what was said earlier:

* subsample: [0.4, 1]

panshi-wang on 27 Jan 2019

@panshi-wang it should be bagging_freq actually. Fixed the typo.

Laurae2 on 27 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

bug/segfault when using add_features_from and somewhat sparse data

ivinogra · 3Comments

LightGBMError: b'Wrong size of feature_names'

ahbon123 · 4Comments

Add "group" to lgb.cv for non-rank objectives

JoshuaC3 · 3Comments

[R-package] Create portable configuration with 'configure' scripts

jameslamb · 3Comments

feval metric score ignored for early stopping with Python API

ClimbsRocks · 3Comments