Operating System: Win 7 64-bit
CPU: Intel Core i7
C++/Python/R version: Python 3.5
sklearn GridSearchCV for hyper parameter tuning get worse performance on Binary Classification Example
params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': {'binary_logloss', 'auc'},
'metric_freq': 1,
'is_training_metric': True,
'max_bin': 255,
'learning_rate': 0.1,
'num_leaves': 63,
'tree_learner': 'serial',
'feature_fraction': 0.8,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'min_data_in_leaf': 50,
'min_sum_hessian_in_leaf': 5,
'is_enable_sparse': True,
'use_two_round_loading': False,
'is_save_binary_file': False,
'output_model': 'LightGBM_model.txt',
'num_machines': 1,
'local_listen_port': 12400,
'machine_list_file': 'mlist.txt',
'verbose': 0,
# parameters to keep the exactly the same
'subsample_for_bin': 200000,
'min_child_samples': 20,
'min_child_weight': 0.001,
'min_split_gain': 0.0,
'colsample_bytree': 1.0,
'reg_alpha': 0.0,
'reg_lambda': 0.0
}
df_train = pd.read_csv("binary.train", header=None, sep='\t')
df_test = pd.read_csv("binary.test", header=None, sep='\t')
y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
AUC value 0.835gbm = lgb.train(params, lgb_train, valid_sets=lgb_eval)
AUC 0.77gridParams = {
'learning_rate': [ 0.1],
'num_leaves': [31],
'boosting_type' : ['gbdt'],
'objective' : ['binary']
}
mdl = lgb.LGBMClassifier(
task = params['task'],
metric = params['metric'],
... ...
)
scoring = {'AUC': 'roc_auc'}
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')
AUC 0.706 Then I tried to traverse the parameter grid and train LBGM model, it acts as expected.
for param in ParameterGrid(gridParams):
gbm = lgb.train(param, lgb_train, valid_sets=lgb_eval)
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
print("score is %s, with params: %s" % (roc_auc_score(y_test, y_pred), param))
Why LightGBMClassifer get poor performance with sklearn GridSearchCV and Hyperopt? What's the correct way for hyper parameter tuning?
@StrikerRUS any ideas about this ?
gridParams = {
'learning_rate': [ 0.1],
'num_leaves': [31],
'boosting_type' : ['gbdt'],
'objective' : ['binary']
}
mdl = lgb.LGBMClassifier(
task = params['task'],
metric = params['metric'],
... ...
)
scoring = {'AUC': 'roc_auc'}
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')
If this is the exact code you're using, the only parameter that is being changed during the grid search is 'num_leaves'.
Are you sure that the grid search model has EVERY parameter that was used in the first model with the same value? I can see right away that in youre not calling 'objective': 'binary' in the paramsearch and it looks like if you don't specify that parameter for sklearn then it defaults to regression so I would definitely check that.
Also, if you're trying to use the same parameter names from the core python and applying it to the sklearn version may have parameter name differences? I'm not sure how LightGBM handles this but I remember running into this in XGBoost. I'm guessing there is some variables that you think you are setting but you're really not. I think some parameters MUST follow the scikit 'alias', I could be wrong though.
For example, in your params you have the following variables -
'feature_fraction': 0.8,
'bagging_fraction': 0.8,
'bagging_freq': 5,
but in the sklearn documentation, it showing these parameters as -
, subsample=1.0, subsample_freq=1, colsample_bytree=1.0,
So I would maybe recommend trying to verify that you're actually using the parameters that you think you're using. I'm guessing this is where your problem is. It might help to post all of your code too because it looks like there is missing pieces that could be causing the problem.
@guolinke Sorry, very busy till June with my thesis.
@bbennett36
I'm sure I specified exactly the same parameters, you see different num_leaves because I had slight change to see whether what's difference.
I did used objectcive: binary, see:
gridParams = {
'learning_rate': [ 0.1],
'num_leaves': [31],
'boosting_type' : ['gbdt'],
'objective' : ['binary']
}
I debug LightGBM-sklean and see \Python35\Lib\site-packages\lightgbm\sklearn.py, the fit function just set some default value for some of the parameters, not sure whether this is the problem.
def fit(self, X, y,
sample_weight=None, init_score=None,
eval_set=None, eval_names=None, eval_sample_weight=None,
eval_class_weight=None, eval_init_score=None, eval_metric="logloss",
early_stopping_rounds=None, verbose=True,
feature_name='auto', categorical_feature='auto', callbacks=None):
@zxsimple Provide a fully reproducible example. It is difficult to troubleshoot your issue without because we are comparing Apples with Oranges (parameters list, training data size (grid search), folds (grid search)).
@Laurae2 @bbennett36
Here is complete code snippet, you can get the data from https://github.com/Microsoft/LightGBM/tree/master/examples/binary_classification
import pandas as pd
from sklearn.metrics import roc_auc_score
import lightgbm as lgb
import matplotlib.pyplot as plt
# sklearn tools for model training and assesment
from sklearn.model_selection import train_test_split
from sklearn.model_selection import PredefinedSplit
from sklearn.model_selection import GridSearchCV, ParameterGrid
from sklearn.metrics import (roc_curve, auc, accuracy_score)
# specify your configurations as a dict
params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': {'binary_logloss', 'auc'},
'metric_freq': 1,
'is_training_metric': True,
'max_bin': 255,
'learning_rate': 0.1,
'num_leaves': 63,
'tree_learner': 'serial',
'feature_fraction': 0.8,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'min_data_in_leaf': 50,
'min_sum_hessian_in_leaf': 5,
'is_enable_sparse': True,
'use_two_round_loading': False,
'is_save_binary_file': False,
'output_model': 'LightGBM_model.txt',
'num_machines': 1,
'local_listen_port': 12400,
'machine_list_file': 'mlist.txt',
'verbose': 0,
'subsample_for_bin': 200000,
'min_child_samples': 20,
'min_child_weight': 0.001,
'min_split_gain': 0.0,
'colsample_bytree': 1.0,
'reg_alpha': 0.0,
'reg_lambda': 0.0
}
df_train = pd.read_csv("binary.train", header=None, sep='\t')
df_test = pd.read_csv("binary.test", header=None, sep='\t')
y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# train
gbm = lgb.train(params,
lgb_train,
valid_sets=lgb_eval)
gridParams = {
'learning_rate': [ 0.1],
'num_leaves': [63],
'boosting_type' : ['gbdt'],
'objective' : ['binary']
}
mdl = lgb.LGBMClassifier(
task = params['task'],
metric = params['metric'],
metric_freq = params['metric_freq'],
is_training_metric = params['is_training_metric'],
max_bin = params['max_bin'],
tree_learner = params['tree_learner'],
feature_fraction = params['feature_fraction'],
bagging_fraction = params['bagging_fraction'],
bagging_freq = params['bagging_freq'],
min_data_in_leaf = params['min_data_in_leaf'],
min_sum_hessian_in_leaf = params['min_sum_hessian_in_leaf'],
is_enable_sparse = params['is_enable_sparse'],
use_two_round_loading = params['use_two_round_loading'],
is_save_binary_file = params['is_save_binary_file'],
n_jobs = -1
)
scoring = {'AUC': 'roc_auc'}
# Create the grid
grid = GridSearchCV(mdl, gridParams, verbose=2, cv=5, scoring=scoring, n_jobs=-1, refit='AUC')
# Run the grid
grid.fit(X_train, y_train)
print('Best parameters found by grid search are:', grid.best_params_)
print('Best score found by grid search is:', grid.best_score_)
@zxsimple
how exactly did you observe a difference in the roc_auc score?
With your code I get:
print(roc_auc_score(y_test, gbm.predict(X_test)))
> 0.8348329463364292
print(roc_auc_score(y_test, grid.predict_proba(X_test)[:,1]))
> 0.8348329463364292
Which look identical... Note that Booster.predict() seems to output probabilities right away, while for the GridSearchCV you need to run predict_proba() method to get those for the roc_auc_score.
@mlisovyi thank you very much
As you mentioned Booster.predict return the probability rather than classification.
Also best_score_ of GridSearchCV model means:
Mean cross-validated score of the best_estimator
For multi-metric evaluation, this is present only if refit is specified.
Your just point out the key.
@zxsimple feel to close this issue if you think the problem is solved.
Most helpful comment
@zxsimple
how exactly did you observe a difference in the roc_auc score?
With your code I get:
Which look identical... Note that
Booster.predict()seems to output probabilities right away, while for theGridSearchCVyou need to runpredict_proba()method to get those for the roc_auc_score.