Catboost: Categorical variables dominating feature importance

Created on 16 Aug 2018  路  3Comments  路  Source: catboost/catboost

Problem: {I am solving a regression problem using Catboost regressor. Everytime categorical features are dominating the feature importance. Numerical variables are getting shadowed. If I discretize a numerical variable into some categories then that feature also starts coming at the top of feature importance. Can anyone explain why this might be happening. Is there any parameter that I can tune so that other numerical features don't get shadowed? }
catboost version: {0.9.1.1}
Operating System: {Windows 10}
CPU: {CPU}

All 3 comments

Try setting the L2 parameter in the range of 30-50 and see if helps to decrease your categorical features importance and improve the score on your validation set.

I tried L2 parameter but it did not have any affect. But when I used one_hot_max_size parameter then my numerical features importance increased and score on validation set also improved.

dlihhats , can you answer on this ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sosmond picture sosmond  路  4Comments

davydof picture davydof  路  4Comments

payala-pi picture payala-pi  路  4Comments

beloteloff picture beloteloff  路  4Comments

abdullahalsaidi16 picture abdullahalsaidi16  路  3Comments