The parameter early_stopping_rounds is ignored when it is set via the parameters dictionary but it works fine when it is explicitly specified in the call lgb.train. I think this should be fixed or the documentation needs a note that this parameter is ignored in the python interface (like the note for num_iterations in http://lightgbm.readthedocs.io/en/latest/Parameters.html)
Here is a code example:
import lightgbm as lgb
import numpy as np
import pandas as pd
n=1000
X = np.random.randint(1000, size=(n,2))
y = X[:,0]+X[:,1] + np.random.normal(size=n)
Xtrain, ytrain = X[:int(.5*n),:], y[:int(.5*n)]
Xval, yval = X[int(.5*n):,:], y[int(.5*n):]
ds_train = lgb.Dataset(Xtrain, ytrain)
ds_val = lgb.Dataset(Xval, yval)
params = {'objective':'regression_l2', 'metric':'rmse', 'learning_rate':.01,'early_stopping_rounds':10}
gbm = lgb.train(params, ds_train, valid_sets=[ds_val, ds_train], categorical_feature=[1],verbose_eval=20,num_boost_round=1000)
print("=============================================")
gbm = lgb.train(params, ds_train, valid_sets=[ds_val, ds_train], categorical_feature=[1],verbose_eval=20,num_boost_round=1000,early_stopping_rounds=10)
ok, we will add a note into doc.
@wxchan
besides updating docs, I think we also can add some warnings in python/R package if user pass these parameters into dict.
Another solution is reading these parameters from dict.
BTW, there are some other parameters also have this problem, e.g. categorical_ feature
@wxchan can you help for this ?
@guolinke already working on this.
@guolinke there is a test case called 'test_categorical_handle', you specify 'categorical_column' in dict now, which it's not working right now. Should I keep it in dict?
@wxchan why it cannot work ?
refer to: https://github.com/Microsoft/LightGBM/pull/806/files#diff-c90239f8147ba65eec9e89412fe27c01R130
@guolinke I meant it's not being used before in that test case.
@wxchan BTW, here is another issue (from Gitter):
Hi I am trying to do a manual train/test split in lightGBM.
Example code:
dataset = lgb.Dataset(data, label=labels, silent=True, free_raw_data=False)
lgb.train(
params=LGB_PARAMS,
num_boost_round=10,
train_set=dataset.subset(train_idx),
valid_sets=[dataset.subset(test_idx)],
verbose_eval=True,
)
But I got errors like this:
File "virtualenv_run/local/lib/python2.7/site-packages/lightgbm/engine.py", line 126, in train
valid_data.set_reference(train_set)
File "virtualenv_run/local/lib/python2.7/site-packages/lightgbm/basic.py", line 1011, in set_reference
raise LightGBMError("Cannot set reference after freed raw data, set free_raw_data=False when construct Dataset to avoid this.")
lightgbm.basic.LightGBMError: Cannot set reference after freed raw data, set free_raw_data=False when construct Dataset to avoid this.
It works OK if I skip the valid_sets argument. Is this a bug or I did something wrong?
@wxchan sorry, I didn't get your point. you meant the categorical_column cannot be in dict now ?
@guolinke yes, before #806.
@wxchan
you meant only after #806, we can use categorical_column in dict ?
But it passes the test before #806 .
@guolinke it won't throw an error, it's ignored. Actually in current version, categorical_column only support list or set, use int will throw an error. So it must be ignored.
@wxchan
before #806 , the categorical_column: 0 in dict will be converted to categorical_column=0, and this will be accepted by LightGBM dll. Otherwise the test cannot pass, since the prediction needs to equal to label.
@guolinke ok I got it. So what should be current logic if categorical_feature is set in both dict and argument?
I think use the one in dict, and give a warning ? ( since it has a default value in argument, it is hard to tell is set by user or not. )
BTW, do you have any idea about the issue in this comment: https://github.com/Microsoft/LightGBM/issues/793#issuecomment-321169954 ?
@guolinke I am trying to reproduce it and didn't reproduce it yet. I think it should be fixed after reference chain. Is he using latest version?
@fcchou can you try to use the latest code ?
@guolinke so actually we don't need to check categorical_column from params dict and pull it out of it. It works perfectly fine.
@guolinke The latest code works. Thanks! Looking forward to see the next Python release coming out with all the bug fixes.
@wxchan maybe we can handle both int, str, multi-str(','.join(...)), list of str, list of int .
Most helpful comment
@wxchan
besides updating docs, I think we also can add some warnings in python/R package if user pass these parameters into dict.
Another solution is reading these parameters from dict.
BTW, there are some other parameters also have this problem, e.g. categorical_ feature