Lightgbm: What is the right way for hyper parameter tuning for LightGBM classification?

Created on 25 Apr 2018 · 9Comments · Source: microsoft/LightGBM

Environment info

Operating System: Win 7 64-bit
CPU: Intel Core i7
C++/Python/R version: Python 3.5

Problem:

sklearn GridSearchCV for hyper parameter tuning get worse performance on Binary Classification Example

params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': {'binary_logloss', 'auc'},
    'metric_freq': 1,
    'is_training_metric': True,
    'max_bin': 255,
    'learning_rate': 0.1,
    'num_leaves': 63,
    'tree_learner': 'serial',
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'min_data_in_leaf': 50,
    'min_sum_hessian_in_leaf': 5,
    'is_enable_sparse': True,
    'use_two_round_loading': False,
    'is_save_binary_file': False,
    'output_model': 'LightGBM_model.txt',
    'num_machines': 1,
    'local_listen_port': 12400,
    'machine_list_file': 'mlist.txt',
    'verbose': 0,
    # parameters to keep the exactly the same
    'subsample_for_bin': 200000,
    'min_child_samples': 20,
    'min_child_weight': 0.001,
    'min_split_gain': 0.0,
    'colsample_bytree': 1.0,
    'reg_alpha': 0.0,
    'reg_lambda': 0.0
}

df_train = pd.read_csv("binary.train", header=None, sep='\t')
df_test = pd.read_csv("binary.test", header=None, sep='\t')

y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values

lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

Train LightGBM booster results AUC value 0.835

gbm = lgb.train(params,  lgb_train, valid_sets=lgb_eval)

Grid Search with almost the same hyper parameter only get AUC 0.77

gridParams = {
    'learning_rate': [ 0.1],
    'num_leaves': [31],
    'boosting_type' : ['gbdt'],
    'objective' : ['binary']
}

mdl = lgb.LGBMClassifier(
    task = params['task'],
    metric = params['metric'],
    ... ...
)

scoring = {'AUC': 'roc_auc'}
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')

Hyperopt also get worse performance of AUC 0.706

Then I tried to traverse the parameter grid and train LBGM model, it acts as expected.

for param in ParameterGrid(gridParams):
    gbm = lgb.train(param, lgb_train, valid_sets=lgb_eval)
    y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
    print("score is %s, with params: %s" % (roc_auc_score(y_test, y_pred), param))

Why LightGBMClassifer get poor performance with sklearn GridSearchCV and Hyperopt? What's the correct way for hyper parameter tuning?

Source

zxsimple

👍3

Most helpful comment

@zxsimple
how exactly did you observe a difference in the roc_auc score?
With your code I get:

print(roc_auc_score(y_test, gbm.predict(X_test)))
> 0.8348329463364292
print(roc_auc_score(y_test, grid.predict_proba(X_test)[:,1]))
> 0.8348329463364292

Which look identical... Note that Booster.predict() seems to output probabilities right away, while for the GridSearchCV you need to run predict_proba() method to get those for the roc_auc_score.

mlisovyi on 27 Apr 2018

👍4

All 9 comments

@StrikerRUS any ideas about this ?

guolinke on 25 Apr 2018

gridParams = {
    'learning_rate': [ 0.1],
    'num_leaves': [31],
    'boosting_type' : ['gbdt'],
    'objective' : ['binary']
}

mdl = lgb.LGBMClassifier(
    task = params['task'],
    metric = params['metric'],
    ... ...
)

scoring = {'AUC': 'roc_auc'}
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')

If this is the exact code you're using, the only parameter that is being changed during the grid search is 'num_leaves'.
Are you sure that the grid search model has EVERY parameter that was used in the first model with the same value? I can see right away that in youre not calling 'objective': 'binary' in the paramsearch and it looks like if you don't specify that parameter for sklearn then it defaults to regression so I would definitely check that.

Also, if you're trying to use the same parameter names from the core python and applying it to the sklearn version may have parameter name differences? I'm not sure how LightGBM handles this but I remember running into this in XGBoost. I'm guessing there is some variables that you think you are setting but you're really not. I think some parameters MUST follow the scikit 'alias', I could be wrong though.

For example, in your params you have the following variables -

'feature_fraction': 0.8,
 'bagging_fraction': 0.8,
 'bagging_freq': 5,

but in the sklearn documentation, it showing these parameters as -
, subsample=1.0, subsample_freq=1, colsample_bytree=1.0,

So I would maybe recommend trying to verify that you're actually using the parameters that you think you're using. I'm guessing this is where your problem is. It might help to post all of your code too because it looks like there is missing pieces that could be causing the problem.

bbennett36 on 26 Apr 2018

@guolinke Sorry, very busy till June with my thesis.

StrikerRUS on 26 Apr 2018

@bbennett36

I'm sure I specified exactly the same parameters, you see different num_leaves because I had slight change to see whether what's difference.

I did used objectcive: binary, see:

gridParams = {
    'learning_rate': [ 0.1],
    'num_leaves': [31],
    'boosting_type' : ['gbdt'],
    'objective' : ['binary']
}

I debug LightGBM-sklean and see \Python35\Lib\site-packages\lightgbm\sklearn.py, the fit function just set some default value for some of the parameters, not sure whether this is the problem.

    def fit(self, X, y,
            sample_weight=None, init_score=None,
            eval_set=None, eval_names=None, eval_sample_weight=None,
            eval_class_weight=None, eval_init_score=None, eval_metric="logloss",
            early_stopping_rounds=None, verbose=True,
            feature_name='auto', categorical_feature='auto', callbacks=None):

zxsimple on 27 Apr 2018

@zxsimple Provide a fully reproducible example. It is difficult to troubleshoot your issue without because we are comparing Apples with Oranges (parameters list, training data size (grid search), folds (grid search)).

Laurae2 on 27 Apr 2018

@Laurae2 @bbennett36

Here is complete code snippet, you can get the data from https://github.com/Microsoft/LightGBM/tree/master/examples/binary_classification

import pandas as pd
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
import matplotlib.pyplot as plt

# sklearn tools for model training and assesment
from sklearn.model_selection import train_test_split
from sklearn.model_selection import PredefinedSplit
from sklearn.model_selection import GridSearchCV, ParameterGrid
from sklearn.metrics import (roc_curve, auc, accuracy_score)

# specify your configurations as a dict
params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': {'binary_logloss', 'auc'},
    'metric_freq': 1,
    'is_training_metric': True,
    'max_bin': 255,
    'learning_rate': 0.1,
    'num_leaves': 63,
    'tree_learner': 'serial',
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'min_data_in_leaf': 50,
    'min_sum_hessian_in_leaf': 5,
    'is_enable_sparse': True,
    'use_two_round_loading': False,
    'is_save_binary_file': False,
    'output_model': 'LightGBM_model.txt',
    'num_machines': 1,
    'local_listen_port': 12400,
    'machine_list_file': 'mlist.txt',
    'verbose': 0,
    'subsample_for_bin': 200000,
    'min_child_samples': 20,
    'min_child_weight': 0.001,
    'min_split_gain': 0.0,
    'colsample_bytree': 1.0,
    'reg_alpha': 0.0,
    'reg_lambda': 0.0
}

df_train = pd.read_csv("binary.train", header=None, sep='\t')
df_test = pd.read_csv("binary.test", header=None, sep='\t')

y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values

lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# train
gbm = lgb.train(params,
                lgb_train,
                valid_sets=lgb_eval)

gridParams = {
    'learning_rate': [ 0.1],
    'num_leaves': [63],
    'boosting_type' : ['gbdt'],
    'objective' : ['binary']
}

mdl = lgb.LGBMClassifier(
    task = params['task'],
    metric = params['metric'],
    metric_freq = params['metric_freq'],
    is_training_metric = params['is_training_metric'],
    max_bin = params['max_bin'],
    tree_learner = params['tree_learner'],
    feature_fraction = params['feature_fraction'],
    bagging_fraction = params['bagging_fraction'],
    bagging_freq = params['bagging_freq'],
    min_data_in_leaf = params['min_data_in_leaf'],
    min_sum_hessian_in_leaf = params['min_sum_hessian_in_leaf'],
    is_enable_sparse = params['is_enable_sparse'],
    use_two_round_loading = params['use_two_round_loading'],
    is_save_binary_file = params['is_save_binary_file'],
    n_jobs = -1
)

scoring = {'AUC': 'roc_auc'}

# Create the grid
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')
# Run the grid
grid.fit(X_train, y_train)

print('Best parameters found by grid search are:', grid.best_params_)
print('Best score found by grid search is:', grid.best_score_)

zxsimple on 27 Apr 2018

@zxsimple
how exactly did you observe a difference in the roc_auc score?
With your code I get:

print(roc_auc_score(y_test, gbm.predict(X_test)))
> 0.8348329463364292
print(roc_auc_score(y_test, grid.predict_proba(X_test)[:,1]))
> 0.8348329463364292

mlisovyi on 27 Apr 2018

👍4

@mlisovyi thank you very much

As you mentioned Booster.predict return the probability rather than classification.

Also best_score_ of GridSearchCV model means: