Optuna: LightGbm - recommendations on hyperparameters tuning

Created on 29 Mar 2020 · 4Comments · Source: optuna/optuna

Hi guys, I followed all of your examples regarding tuning LightGbm, however, I was hoping that perhaps some of you could share or reference some best practices and answer my questions below:

how many trails should the experiment run? Is a 100 typically sufficient to find a set of parameters that are 'good enough'?
is the default learning rate and a hundred fitting rounds 'good' for finding the best hyperparameters (assuming time is not really a huge constraint)? I'm wondering if it should run with a slightly larger learning rate to speed it up and perhaps with more boosting rounds.
should I limit the space search for some of these parameters to help Optuna focus on what most matters?
is the MedianPruner the most appropriate in this case? How many n_warmup_steps to choose?
https://github.com/optuna/optuna/blob/master/examples/lightgbm_tuner_simple.py - instead of running a study, I also came across this example of tuning an LightGbm model. Is there some sort of ongoing hyperparameters optimization going on "on the fly"? I'm not quite sure how best_params get updated?

I would really appreciate advice of some more seasoned Optuna users!

My current implementation looks like this. Ignore the task specific parameters, such as: 'objective':

def objective(trial):

    dtrain = lgb.Dataset(train_x, label = train_y, categorical_feature = feat_cat, free_raw_data = False)
    dtest  = lgb.Dataset(test_x, label = test_y, categorical_feature = feat_cat, free_raw_data = False)

    param = {
        'objective': 'poisson',
        'metric': 'rmse',
        'verbosity': -1,
        'boosting_type': 'gbdt',
        'force_row_wise': True,
        'max_depth': -1,

        'max_bin': trial.suggest_int('max_bin', 1, 512),
        'num_leaves': trial.suggest_int('num_leaves', 2, 512),

        'lambda_l1': trial.suggest_loguniform('lambda_l1', 1e-8, 10.0),
        'lambda_l2': trial.suggest_loguniform('lambda_l2', 1e-8, 10.0),

        'feature_fraction': trial.suggest_uniform('feature_fraction', 0.4, 1.0),
        'bagging_fraction': trial.suggest_uniform('bagging_fraction', 0.4, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 7),

        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 1, 50),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),

        'sub_feature': trial.suggest_uniform('sub_feature', 0.0, 1.0),
        'sub_row': trial.suggest_uniform('sub_row', 0.0, 1.0)
    }

    # Add a callback for pruning
    pruning_callback = optuna.integration.LightGBMPruningCallback(trial, 'rmse')

    gbm = lgb.train(
        param, 
        dtrain, 
        verbose_eval = 20,
        valid_sets = [dtest], 
        callbacks = [pruning_callback], 
        categorical_feature = feat_cat
        )

    preds = gbm.predict(test_x)
    accuracy = sqrt(sklearn.metrics.mean_squared_error(test_y, preds))

    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction = 'minimize', pruner = optuna.pruners.MedianPruner(n_warmup_steps = 10))
    study.optimize(objective, n_trials = 100)

    print("Number of finished trials: {}".format(len(study.trials)))

    print("Best trial:")
    trial = study.best_trial

    print("  Value: {}".format(trial.value))

    print("  Params: ")
    for key, value in trial.params.items():
        print("    {}: {}".format(key, value))

question stale

Source