Lightgbm: Not using all available features

Created on 4 Jun 2018 · 2Comments · Source: microsoft/LightGBM

Environment info

Operating System: Linux Mint Serena 18.1
CPU: AMD A10-7300, x86_64
Python version: 3.5.2
lightgbm.__version__: '2.1.1'

Issue:

LightGBM is not using all available features

print(train.shape, test.shape, y.shape)
(307493, 439) (48744, 439) (307493,)

As seen above, there are 439 features in train data but it is using only 428 features as shown in the log message:

[LightGBM] [Info] Number of positive: 24823, number of negative: 282670
[LightGBM] [Info] Total Bins 57256
[LightGBM] [Info] Number of data: 307493, number of used features: 428

Params and relevant code

cat_feature_names = ['A', 'B'... around 40 cat features]
params = {
    'task': 'train',
    'num_leaves':32,
    'min_data_in_leaf': 420,
    'application': 'binary',
    'boosting': 'gbdt',
    'metric': 'auc',
    'learning_rate': 0.01,
    'min_child_weight': 18,
    'lambda_l1': 1.5,
    'lambda_l2': 1,
    'num_threads': 3,
    }
dataset = lgb.Dataset(train, y)
model = lgb.train(params, dataset, verbose_eval=1000, categorical_feature=cat_feature_names )

The feature_fraction defaults to 1, so why is it not using all features ?

Source

bhaskar-c

Most helpful comment

@quakig LightGBM will auto disable the feature that cannot be splitted, like the feature with almost all values are zeros (or the same). And min_data_in_leaf can control this.

guolinke on 5 Jun 2018

👍4

All 2 comments

The number of features used seems to be inversely related to the size of min_data_in_leaf.

If i reduce it all the way down to 1, it is using 438 features (1 less than actual). Any thing greater than 1 and I am loosing more features. But it never uses all the features.

bhaskar-c on 4 Jun 2018

@quakig LightGBM will auto disable the feature that cannot be splitted, like the feature with almost all values are zeros (or the same). And min_data_in_leaf can control this.

guolinke on 5 Jun 2018

👍4

Was this page helpful?

0 / 5 - 0 ratings