Xgboost: Is Normalization necessary?

Created on 17 Jun 2015  路  3Comments  路  Source: dmlc/xgboost

I am not quite sure how xgboost works in theory. But since xgboost is a tree based classifier, is it ok to assume that no normalization of features is needed?

Most helpful comment

no you do not have to normalize the features

All 3 comments

no you do not have to normalize the features

I think I understand that in principle there is no need for normalization when boosting trees.

However, one can see quite some impact when scaling the target y, especially with 'reg:gamma', but also (to a lesser extent) for 'reg:linear' (the default). What is the reason for this?

Example for the Boston Housing dataset:

import numpy as np
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston

boston = load_boston()
y = boston['target']
X = boston['data']

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor().fit(X, y / scale)
    predictions = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))

2.3432734454908335 (scale=1e-06)
2.343273977065266 (scale=0.0001)
2.3432793874455315 (scale=0.01)
2.290595204136888 (scale=1.0)
2.528513393507719 (scale=100.0)
7.228978353091473 (scale=10000.0)
272.29640759874474 (scale=1000000.0)

The impact of scaling y is really big when using 'reg:gamma':

for scale in np.logspace(-6, 6, 7):
    xgb_model = xgb.XGBRegressor(objective='reg:gamma').fit(X, y / scale)
    predictions = xgb_model.predict(X) * scale
    print('{} (scale={})'.format(mean_squared_error(y, predictions), scale))

591.6509503519147 (scale=1e-06)
545.8298971540023 (scale=0.0001)
37.68688286293508 (scale=0.01)
4.039819858716935 (scale=1.0)
2.505477263590776 (scale=100.0)
198.94093800190453 (scale=10000.0)
592.1469169959003 (scale=1000000.0)

@tqchen Reading your great Introduction to Boosted Trees it's not clear to me why feature scaling it's not necessary in mathematical terms.

Was this page helpful?
0 / 5 - 0 ratings