Xgboost: Why is number of boosters (or number of trees) not a CV hyper parameter?

Created on 20 Jun 2015 · 6Comments · Source: dmlc/xgboost

I have been looking at some of the examples for cross validation and looks like you never consider number of boosters as a parameter. Is there a specific reason to it? Will it be of any value of n_estimators is also one dimension in grid search?

Source

ashish01

Most helpful comment

Thanks for replying but no nround is not exactly what I want, nround is an argument to the classifier itself rather than part of params. I wonder why that is. Let me explain it by some code snippets.

Say in python I define by search space using this

param_dist = {
    'max_depth': randint(2, 8),
    'gamma': uniform(0.2, 0.6),
    'subsample': beta(10, 1),
}

and then do a randomized grid search like this

clf = xgb.XGBClassifier(n_estimators = 20)
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)

Now for this grid search my number of estimators are fixed to 20 always. I want to know if it makes sense to make n_estimators as one more dimension in search space. Something like this

param_dist = {
    'max_depth': randint(2, 8),
    'gamma': uniform(0.2, 0.6),
    'subsample': beta(10, 1),
    'n_estimators': randint(20,100),
}

clf = xgb.XGBClassifier()
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)

ashish01 on 20 Jun 2015

👍2

All 6 comments

Is the param nround in the R package what you are looking for?

hetong007 on 20 Jun 2015

Thanks for replying but no nround is not exactly what I want, nround is an argument to the classifier itself rather than part of params. I wonder why that is. Let me explain it by some code snippets.

Say in python I define by search space using this

param_dist = {
    'max_depth': randint(2, 8),
    'gamma': uniform(0.2, 0.6),
    'subsample': beta(10, 1),
}

and then do a randomized grid search like this

clf = xgb.XGBClassifier(n_estimators = 20)
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)

Now for this grid search my number of estimators are fixed to 20 always. I want to know if it makes sense to make n_estimators as one more dimension in search space. Something like this

param_dist = {
    'max_depth': randint(2, 8),
    'gamma': uniform(0.2, 0.6),
    'subsample': beta(10, 1),
    'n_estimators': randint(20,100),
}

clf = xgb.XGBClassifier()
n_iter_search = 100
random_search = RandomizedSearchCV(clf, param_distributions=param_dist, n_iter=n_iter_search, scoring='roc_auc', verbose=10)
random_search.fit(X_train, y_train)

ashish01 on 20 Jun 2015

👍2

From my point of view it is a bit different from other params. The number of trees is not a param for a single decision tree.

So if {tree parameters, eta} are fixed, we can choose the nround that maximizing the test performance in cv.

hetong007 on 20 Jun 2015

I think the same thing happens for sklearn, etc. nround is tied with other parameter settings and usually a good practice is to fix a nround you think is good, and select other parameters

tqchen on 24 Jun 2015

👍1

You can also try early_stopping feature in xgboost which basically select best round for you in a more efficient way than gridCV

tqchen on 24 Jun 2015

I've seen your slides suggest 100-1000 rounds "depending on datasize". How do you select a value for nrounds (and early stopping rounds) based on datasize?