I have been looking at some of the examples for cross validation and looks like you never consider number of boosters as a parameter. Is there a specific reason to it? Will it be of any value of n_estimators is also one dimension in grid search?
Is the param nround
in the R package what you are looking for?
Thanks for replying but no nround is not exactly what I want, nround is an argument to the classifier itself rather than part of params. I wonder why that is. Let me explain it by some code snippets.
Say in python I define by search space using this
param_dist = {
'max_depth': randint(2, 8),
'gamma': uniform(0.2, 0.6),
'subsample': beta(10, 1),
}
and then do a randomized grid search like this
clf = xgb.XGBClassifier(n_estimators = 20)
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)
Now for this grid search my number of estimators are fixed to 20 always. I want to know if it makes sense to make n_estimators as one more dimension in search space. Something like this
param_dist = {
'max_depth': randint(2, 8),
'gamma': uniform(0.2, 0.6),
'subsample': beta(10, 1),
'n_estimators': randint(20,100),
}
clf = xgb.XGBClassifier()
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)
From my point of view it is a bit different from other params. The number of trees is not a param for a single decision tree.
So if {tree parameters, eta} are fixed, we can choose the nround
that maximizing the test performance in cv.
I think the same thing happens for sklearn, etc. nround is tied with other parameter settings and usually a good practice is to fix a nround you think is good, and select other parameters
You can also try early_stopping feature in xgboost which basically select best round for you in a more efficient way than gridCV
I've seen your slides suggest 100-1000 rounds "depending on datasize". How do you select a value for nrounds (and early stopping rounds) based on datasize?
Most helpful comment
Thanks for replying but no nround is not exactly what I want, nround is an argument to the classifier itself rather than part of params. I wonder why that is. Let me explain it by some code snippets.
Say in python I define by search space using this
and then do a randomized grid search like this
Now for this grid search my number of estimators are fixed to 20 always. I want to know if it makes sense to make n_estimators as one more dimension in search space. Something like this