I am not quite sure how xgboost works in theory. But since xgboost is a tree based classifier, is it ok to assume that no normalization of features is needed?
no you do not have to normalize the features
I think I understand that in principle there is no need for normalization when boosting trees.
However, one can see quite some impact when scaling the target y, especially with 'reg:gamma', but also (to a lesser extent) for 'reg:linear' (the default). What is the reason for this?
Example for the Boston Housing dataset:
import numpy as np
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
boston = load_boston()
y = boston['target']
X = boston['data']
for scale in np.logspace(-6, 6, 7):
xgb_model = xgb.XGBRegressor().fit(X, y / scale)
predictions = xgb_model.predict(X) * scale
print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))
2.3432734454908335 (scale=1e-06)
2.343273977065266 (scale=0.0001)
2.3432793874455315 (scale=0.01)
2.290595204136888 (scale=1.0)
2.528513393507719 (scale=100.0)
7.228978353091473 (scale=10000.0)
272.29640759874474 (scale=1000000.0)
The impact of scaling y is really big when using 'reg:gamma':
for scale in np.logspace(-6, 6, 7):
xgb_model = xgb.XGBRegressor(objective='reg:gamma').fit(X, y / scale)
predictions = xgb_model.predict(X) * scale
print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))
591.6509503519147 (scale=1e-06)
545.8298971540023 (scale=0.0001)
37.68688286293508 (scale=0.01)
4.039819858716935 (scale=1.0)
2.505477263590776 (scale=100.0)
198.94093800190453 (scale=10000.0)
592.1469169959003 (scale=1000000.0)
@tqchen Reading your great Introduction to Boosted Trees it's not clear to me why feature scaling it's not necessary in mathematical terms.
Most helpful comment
no you do not have to normalize the features