Lightgbm: feval metric score ignored for early stopping with Python API

Created on 16 May 2018  路  3Comments  路  Source: microsoft/LightGBM

When using early_stopping_rounds, the feval function is ignored, and early stopping is determined by the default metric (binary logloss in this case). I included a reproducible example below.

I don't see any way of turning off the default metric (I tried manually setting it to 'metric': None in the params dict, but it still exhibited this behavior).

I created a trivial "custom metric" that's just an incrementing counter. I would expect the model to train indefinitely, but instead we see it trains only as long as binary logloss is improving.

[210]   training's binary_logloss: 0.00115081   training's feval_func: -210
[211]   training's binary_logloss: 0.00113343   training's feval_func: -211
[212]   training's binary_logloss: 0.00112853   training's feval_func: -212
[213]   training's binary_logloss: 0.00104986   training's feval_func: -213
[214]   training's binary_logloss: 0.000965577  training's feval_func: -214
[215]   training's binary_logloss: 0.000960396  training's feval_func: -215
[216]   training's binary_logloss: 0.000971713  training's feval_func: -216
[217]   training's binary_logloss: 0.000941262  training's feval_func: -217
[218]   training's binary_logloss: 0.000849178  training's feval_func: -218
[219]   training's binary_logloss: 0.000841552  training's feval_func: -219
[220]   training's binary_logloss: 0.000803302  training's feval_func: -220
[221]   training's binary_logloss: 0.000685291  training's feval_func: -221
[222]   training's binary_logloss: 0.000689522  training's feval_func: -222
[223]   training's binary_logloss: 0.000736344  training's feval_func: -223
[224]   training's binary_logloss: 0.000816562  training's feval_func: -224
[225]   training's binary_logloss: 0.000829289  training's feval_func: -225
[226]   training's binary_logloss: 0.000799258  training's feval_func: -226
[227]   training's binary_logloss: 0.000837895  training's feval_func: -227
[228]   training's binary_logloss: 0.000844479  training's feval_func: -228
[229]   training's binary_logloss: 0.000862614  training's feval_func: -229
[230]   training's binary_logloss: 0.000855565  training's feval_func: -230
[231]   training's binary_logloss: 0.000778359  training's feval_func: -231
Early stopping, best iteration is:
[221]   training's binary_logloss: 0.000685291  training's feval_func: -221

Is feval supposed to only be used for logging, and not early_stopping_rounds? If so, should I look into custom callbacks to determine early_stopping?

Environment info

Operating System: Mac OSX High Sierra
CPU:
C++/Python/R version: python3.5.3

Reproducible example

# coding: utf-8
# pylint: disable = invalid-name, C0111
import json
import lightgbm as lgb
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error

try:
    import cPickle as pickle
except BaseException:
    import pickle

# load or create your dataset
print('Load data...')
df_train = pd.read_csv('../binary_classification/binary.train', header=None, sep='\t')
df_test = pd.read_csv('../binary_classification/binary.test', header=None, sep='\t')
W_train = pd.read_csv('../binary_classification/binary.train.weight', header=None)[0]
W_test = pd.read_csv('../binary_classification/binary.test.weight', header=None)[0]

y_train = df_train[0].values
y_test = df_test[0].values
X_train = df_train.drop(0, axis=1).values
X_test = df_test.drop(0, axis=1).values

num_train, num_feature = X_train.shape

# create dataset for lightgbm
# if you want to re-use data, remember to set free_raw_data=False
lgb_train = lgb.Dataset(X_train, y_train,
                        weight=W_train, free_raw_data=False)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,
                       weight=W_test, free_raw_data=False)

# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': None,
    'num_leaves': 31,
    'learning_rate': 0.9,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': 0
}

# generate a feature name
feature_name = ['feature_' + str(col) for col in range(num_feature)]


counter = 0

def feval_func(preds, train_data):
    global counter
    counter -= 1
    return ('feval_func', counter, False)

print('Start training...')
# feature_name and categorical_feature
gbm = lgb.train(params,
                lgb_train,
                feval=feval_func,
                early_stopping_rounds=10,
                num_boost_round=1000,
                valid_sets=lgb_train,  # eval training data
                feature_name=feature_name,
                categorical_feature=[21])

Most helpful comment

@ClimbsRocks When you use two metrics, the training will stop when one of them met early stopping.
In your case, the binary_logloss met early stopping.

To avoid this, you can use only one metric.
refer to this https://github.com/Microsoft/LightGBM/issues/1318
you can set metric to "None", not None to set the null metric.

All 3 comments

@ClimbsRocks When you use two metrics, the training will stop when one of them met early stopping.
In your case, the binary_logloss met early stopping.

To avoid this, you can use only one metric.
refer to this https://github.com/Microsoft/LightGBM/issues/1318
you can set metric to "None", not None to set the null metric.

Thanks for sharing how to ignore the default metric. I just submitted a PR to add that to the docs.

As always, I'm impressed by how responsive you are

Was this page helpful?
0 / 5 - 0 ratings