Lightgbm: early_stopping_rounds is ignored in python interface when specified via parameters dict

Created on 6 Aug 2017  路  21Comments  路  Source: microsoft/LightGBM

The parameter early_stopping_rounds is ignored when it is set via the parameters dictionary but it works fine when it is explicitly specified in the call lgb.train. I think this should be fixed or the documentation needs a note that this parameter is ignored in the python interface (like the note for num_iterations in http://lightgbm.readthedocs.io/en/latest/Parameters.html)

Here is a code example:

import lightgbm as lgb
import numpy as np
import pandas as pd

n=1000
X = np.random.randint(1000, size=(n,2))
y = X[:,0]+X[:,1] + np.random.normal(size=n)

Xtrain, ytrain = X[:int(.5*n),:], y[:int(.5*n)]
Xval, yval = X[int(.5*n):,:], y[int(.5*n):]

ds_train = lgb.Dataset(Xtrain, ytrain)
ds_val = lgb.Dataset(Xval, yval)

params = {'objective':'regression_l2',  'metric':'rmse', 'learning_rate':.01,'early_stopping_rounds':10}
gbm = lgb.train(params, ds_train, valid_sets=[ds_val, ds_train], categorical_feature=[1],verbose_eval=20,num_boost_round=1000)
print("=============================================")
gbm = lgb.train(params, ds_train, valid_sets=[ds_val, ds_train], categorical_feature=[1],verbose_eval=20,num_boost_round=1000,early_stopping_rounds=10)

Most helpful comment

@wxchan
besides updating docs, I think we also can add some warnings in python/R package if user pass these parameters into dict.
Another solution is reading these parameters from dict.

BTW, there are some other parameters also have this problem, e.g. categorical_ feature

All 21 comments

ok, we will add a note into doc.

@wxchan
besides updating docs, I think we also can add some warnings in python/R package if user pass these parameters into dict.
Another solution is reading these parameters from dict.

BTW, there are some other parameters also have this problem, e.g. categorical_ feature

@wxchan can you help for this ?

@guolinke already working on this.

@guolinke there is a test case called 'test_categorical_handle', you specify 'categorical_column' in dict now, which it's not working right now. Should I keep it in dict?

@wxchan why it cannot work ?
refer to: https://github.com/Microsoft/LightGBM/pull/806/files#diff-c90239f8147ba65eec9e89412fe27c01R130

@guolinke I meant it's not being used before in that test case.

@wxchan BTW, here is another issue (from Gitter):


Hi I am trying to do a manual train/test split in lightGBM.
Example code:
dataset = lgb.Dataset(data, label=labels, silent=True, free_raw_data=False)
lgb.train(
params=LGB_PARAMS,
num_boost_round=10,
train_set=dataset.subset(train_idx),
valid_sets=[dataset.subset(test_idx)],
verbose_eval=True,
)
But I got errors like this:
File "virtualenv_run/local/lib/python2.7/site-packages/lightgbm/engine.py", line 126, in train
valid_data.set_reference(train_set)
File "virtualenv_run/local/lib/python2.7/site-packages/lightgbm/basic.py", line 1011, in set_reference
raise LightGBMError("Cannot set reference after freed raw data, set free_raw_data=False when construct Dataset to avoid this.")
lightgbm.basic.LightGBMError: Cannot set reference after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
It works OK if I skip the valid_sets argument. Is this a bug or I did something wrong?

@wxchan sorry, I didn't get your point. you meant the categorical_column cannot be in dict now ?

@guolinke yes, before #806.

@wxchan
you meant only after #806, we can use categorical_column in dict ?
But it passes the test before #806 .

@guolinke it won't throw an error, it's ignored. Actually in current version, categorical_column only support list or set, use int will throw an error. So it must be ignored.

@wxchan
before #806 , the categorical_column: 0 in dict will be converted to categorical_column=0, and this will be accepted by LightGBM dll. Otherwise the test cannot pass, since the prediction needs to equal to label.

@guolinke ok I got it. So what should be current logic if categorical_feature is set in both dict and argument?

I think use the one in dict, and give a warning ? ( since it has a default value in argument, it is hard to tell is set by user or not. )

BTW, do you have any idea about the issue in this comment: https://github.com/Microsoft/LightGBM/issues/793#issuecomment-321169954 ?

@guolinke I am trying to reproduce it and didn't reproduce it yet. I think it should be fixed after reference chain. Is he using latest version?

@fcchou can you try to use the latest code ?

@guolinke so actually we don't need to check categorical_column from params dict and pull it out of it. It works perfectly fine.

@guolinke The latest code works. Thanks! Looking forward to see the next Python release coming out with all the bug fixes.

@wxchan maybe we can handle both int, str, multi-str(','.join(...)), list of str, list of int .

Was this page helpful?
0 / 5 - 0 ratings