Lightgbm: Simple data transformation test

Created on 20 Mar 2019  路  7Comments  路  Source: microsoft/LightGBM

Hello!

I have made several simple feature matrix transformations with the aim of
stability check.
Transformations:

  • add identical constant to all features
  • add different constants to all features
  • multiply features by identical constant
  • multiply features by different constants
  • permute data axis
  • permute columns axis

I have tried to minimise randomisation in training process by setting feature_fraction and bagging_fraction to one. According to "academic" gradient boosting trees these transformations shouldn't change final result. But the predictions between original data and transformed match only after columns permutation.

Can you please explain why lgbm fitting can vary after linear data transformations?

Environment info

Operating System:
RedHat
CPU/GPU model:
CPU
C++/Python/R version:
Python
LightGBM version or commit hash:
2.2.3

Reproducible examples

import lightgbm as lgb
import numpy as np

def add_constant(x_train, x_test, common=True):
    if common:
        rand_constant = np.random.randint(-1000, 1000)
    else:
        rand_constant = np.random.randint(
            -1000, 1000, size=(1, x_train.shape[1]))
    return x_train + rand_constant, x_test + rand_constant

def mult_constant(x_train, x_test, common=True):
    if common:
        rand_constant = np.random.randint(-1000, 1000)
        if rand_constant == 0:
            rand_constant = 1
    else:
        rand_constant = np.random.randint(
            -1000, 1000, size=(1, x_train.shape[1]))
        rand_constant[rand_constant == 0] = 1
    return x_train * rand_constant, x_test * rand_constant

def permute(x_train, x_test, data=True):
    if data:
        indeces = np.random.permutation(np.arange(x_train.shape[0]))
        return x_train[indeces].copy(), x_test.copy()
    else:
        indeces = np.random.permutation(np.arange(x_train.shape[1]))
        return x_train[:, indeces].copy(), x_test[:, indeces].copy()

def original(x_train, x_test):
    return x_train.copy(), x_test.copy()

transformations = [
    original,
    add_constant,
    lambda x,y: add_constant(x, y, False),
    mult_constant,
    lambda x,y: mult_constant(x, y, False),
    permute,
    lambda x,y: permute(x, y, False),
]

n_dots = 10000
n_features = 300
x_train = np.random.randn(n_dots, n_features)
x_test = np.random.randn(n_dots, n_features)

y_train = np.random.randn(n_dots)
y_test = np.random.randn(n_dots)

params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'regression_l2',
    'num_leaves': 32,
    'learning_rate': 0.2,
    'feature_fraction': 1.00,
    'bagging_fraction': 1.00,
    'bagging_freq': 1,
    'verbose': 0,
    'num_threads': 20,
    'min_data_in_leaf': 10,
    'use_missing': False,
    'seed': 0
}

total_gbms = []
total_preds = []
for trans in transformations:
    print(trans.__name__)
    x_tr_, x_te_ = trans(x_train, x_test)
    lgb_train = lgb.Dataset(
        x_tr_,
        label=y_train,
        free_raw_data=True,
    )
    lgb_test = lgb.Dataset(
        x_te_, label=y_test, reference=lgb_train, free_raw_data=True)

    evals_result = {}
    gbm = lgb.train(
        params,
        lgb_train,
        num_boost_round=100,
        valid_sets=[lgb_train, lgb_test],
        evals_result=evals_result,
        verbose_eval=False)
    total_gbms.append(gbm)
    total_preds.append((gbm.predict(x_tr_), gbm.predict(x_te_)))
    del lgb_train, lgb_test
for i in range(1, len(total_preds)):
    print(np.std(total_preds[0][0] - total_preds[i][0]))

question

Most helpful comment

@Qiuyan918
The binning algorithm in LightGBM is the greedy equally density solution, from smallest values to the largest values.
Therefore, as you reverse the values, it will produce different binning results, and result in different split points.

All 7 comments

The first think I would check is whether you've set num_boost_round high enough to achieve convergence in all cases? I'm not actually that familiar with lightgbm yet, I've been mostly using catboost but 100 doesn't seem like many iterations. If lgb.train() has early stopping but in some (all) scenarios it's still hitting the limit of 100 that would be a red flag.

The first think I would check is whether you've set num_boost_round high enough to achieve convergence in all cases? I'm not actually that familiar with lightgbm yet, I've been mostly using catboost but 100 doesn't seem like many iterations. If lgb.train() has early stopping but in some (all) scenarios it's still hitting the limit of 100 that would be a red flag.

The problem is that you should get exactly the same results if one adds constant to features vector, regardless of num_boost_round. On each boosting step exact gbtd algorithm looks for a split that gives minimal var, so if you don't change mutual order in features, split stays the same from the very first iteration. So my question is what kind of heuristic is used in lgbm that can give such difference.

@qashqay654

  1. for permute in row, you should permute the label as well.
  2. for the add/mul constant, the root cause the "binning" algorithm in LightGBM. refer to https://github.com/Microsoft/LightGBM/blob/7a89d0054bcf0db9f96af3c53a078b6aa3152efe/src/io/bin.cpp#L207-L401

More specifically, as there are many sparse data (zero values), LightGBM will always treat the zero feature value as one unique bin. Therefore, the binning algorithm firstly processes the negative values, then zero values, and then positive values. And add/mul constant will change the values ranges, and result in the different binning values.

In short, to test add_constant, you should ensure the original values are all positive(or all negative), and the constants are the same sign as original values.
For the mul_constant, you should ensure the constants are positive.

@guolinke
Hi, thanks.
According to your responses, I have made several changes in the reproducible examples as follows:

  1. The values in train, constants in add_constant and mult_constant are all positive.
  2. I add another function named reverse.
def reverse(x_train, x_test):
    return 1/x_train, 1/x_test

Indeed, the prediction after add_constant and mult_constant transformation match the original prediction. However, using reverse function changed original prediction. May I ask why would this happen?
In my view, after reverse transformation, the order of values in X is reversed, but this would not change the split points of features. Also since all the values are positive, reversing the order of values would not affect binning values.

Edited Reproducible Examples

import lightgbm as lgb
import numpy as np

def add_constant(x_train, x_test, common=True):
    if common:
        rand_constant = np.random.randint(0, 1000)
    else:
        rand_constant = np.random.randint(
            0, 1000, size=(1, x_train.shape[1]))
    return x_train + rand_constant, x_test + rand_constant

def mult_constant(x_train, x_test, common=True):
    if common:
        rand_constant = np.random.randint(0, 1000)
        if rand_constant == 0:
            rand_constant = 1
    else:
        rand_constant = np.random.randint(
            0, 1000, size=(1, x_train.shape[1]))
        rand_constant[rand_constant == 0] = 1
    return x_train * rand_constant, x_test * rand_constant

def permute(x_train, x_test, data=True):
    if data:
        indeces = np.random.permutation(np.arange(x_train.shape[0]))
        return x_train[indeces].copy(), x_test.copy()
    else:
        indeces = np.random.permutation(np.arange(x_train.shape[1]))
        return x_train[:, indeces].copy(), x_test[:, indeces].copy()

def reverse(x_train, x_test): # newly added
    return 1/x_train, 1/x_test

def original(x_train, x_test):
    return x_train.copy(), x_test.copy()

transformations = [
    original,
    add_constant,
    lambda x,y: add_constant(x, y, False),
    mult_constant,
    lambda x,y: mult_constant(x, y, False),
    reverse
]

n_dots = 10000
n_features = 300
x_train = np.absolute(np.random.randn(n_dots, n_features))
x_test = np.absolute(np.random.randn(n_dots, n_features))

y_train = np.random.randn(n_dots)
y_test = np.random.randn(n_dots)

params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'regression_l2',
    'num_leaves': 32,
    'learning_rate': 0.2,
    'feature_fraction': 1.00,
    'bagging_fraction': 1.00,
    'bagging_freq': 1,
    'verbose': 0,
    'num_threads': 20,
    'min_data_in_leaf': 10,
    'use_missing': False,
    'seed': 0
}

total_gbms = []
total_preds = []
for trans in transformations:
    print(trans.__name__)
    x_tr_, x_te_ = trans(x_train, x_test)
    lgb_train = lgb.Dataset(
        x_tr_,
        label=y_train,
        free_raw_data=True,
    )
    lgb_test = lgb.Dataset(
        x_te_, label=y_test, reference=lgb_train, free_raw_data=True)

    evals_result = {}
    gbm = lgb.train(
        params,
        lgb_train,
        num_boost_round=100,
        valid_sets=[lgb_train, lgb_test],
        evals_result=evals_result,
        verbose_eval=False)
    total_gbms.append(gbm)
    total_preds.append((gbm.predict(x_tr_), gbm.predict(x_te_)))
    del lgb_train, lgb_test

for i in range(1, len(total_preds)):
    print(np.std(total_preds[0][0] - total_preds[i][0]))

@Qiuyan918
The binning algorithm in LightGBM is the greedy equally density solution, from smallest values to the largest values.
Therefore, as you reverse the values, it will produce different binning results, and result in different split points.

Therefore, as you reverse the values, it will produce different binning results, and result in different split points.

Hi @guolinke

Is binning applied per feature, or all feature per observation? Aka. if i have 200 features, will there be 200 different binning schemas?

@vishalbajaj2000 the binning is column-wise and independent among different features.

Was this page helpful?
0 / 5 - 0 ratings