Xgboost: What means the error "colsample_bytree is too small that no feature can be included" ?

Created on 20 Jan 2015  Â·  11Comments  Â·  Source: dmlc/xgboost

I have an error on one of my dataset which is [R] :
"colsample_bytree is too small that no feature can be included"

I think it is related to that:
https://github.com/tqchen/xgboost/blob/9b3a601ede47aa156789a96b6346f0bc4e784f56/src/tree/updater_colmaker-inl.hpp#L156

In my dataset, there are 593 000 lines (18 categorical features -> thousands of dummy feature).
Output column have 903 true and all other lines false.

Does the error mean there are too few true regarding the dataset?

I use R package compiled with yesterday source code. For other output column on the same dataset, it works.

Kind regards,
Michaël

Most helpful comment

Hi could you maybe let us know what the problem was?

I'm facing the same issue and, it would be nice to know.

Kind regards,
Theodore.

All 11 comments

This means colsample_bytree times number of active features in your dataset is smaller than 1

I am not sure to understand it.

I see that in the code:

// whether to subsample columns during tree construction
  float colsample_bytree;

So I suppose this variable is used to subsample data (which according to my understanding is a way to increase the difference between trees and reduce the global variance of the model).

O have not used this option in my R command and nothing indicate that another default value is used in the xgboost R source code. So it is still to one.

I don't know what is active feature. Is it the one I use in my dataset? If yes I have plenty of feature.

So how 1 times anything>1 could be <1 ?
Is my understanding of active feature correct?

Just to explain my issue, the code is executed in a for loop. For each iteration a new output column is used, but the dataset stay the same. Only for one of these output, which is not all False or all True I have this error message. So I know my dataset works (because of the other iteration in the for loop), and there is nothing special about the not working output, I mean nothing I understand.

If you have more than one feature in the dataset, this should not happen. Do you have some script to reproduce the error? You can try to minimize the dataset you have.

You can also change that line to

        utils::Check(n > 0, "colsample_bytree=%g is too small that no feature can be included", param.colsample_bytree);

To see if the content of colsample_bytree at the time.

Tianqi

Thanks for the idea (I am not very familiar with C/C++)

I get that
colsample_bytree=1 is too small that no feature can be included

Any other idea?

I have no idea so far.. It would be great if you can try to make a script to reproduce the error.

Can you give me your email address? I can send you a script + the matrix in R format so you can have a look by yourself. Dataset is confidential, can't share it in public unfortunately.

tqchen at cs dot washington dot edu

On Tue, Jan 20, 2015 at 12:02 PM, Michaël Benesty [email protected]
wrote:

Can you give me your email address? I can send you a script + the matrix
in R format so you can have a look by yourself. Dataset is confidential,
can't share it in public unfortunately.

—
Reply to this email directly or view it on GitHub
https://github.com/tqchen/xgboost/issues/150#issuecomment-70722279.

Sincerely,

Tianqi Chen
Computer Science & Engineering, University of Washington

Ok thanks for the email. I will clean the script and include data loading and so on.

BTW, won t fix that point but after displaying the message R is frozen and you need to crash R to get it back. I don't know how utils::Check works but it seems to me that if check is not Ok xgboost should stop by itself instead of waiting forever.

Ok found the error. Was not linked to Xgboost. Sorry for the false alarm.

no problem, i am glad that you find it

Hi could you maybe let us know what the problem was?

I'm facing the same issue and, it would be nice to know.

Kind regards,
Theodore.

Was this page helpful?
0 / 5 - 0 ratings