As far as I understand, the random forest (rf) mode differs from a genuine rf in three key aspects:
How realistic would it be to add a "bagging_with_replacement" option? If set to True, then the rows would be subsampled with replacement, mimicking the idea of bagging. This might even be an interesting option for non-rf application.
Related issue #883.
bootstrap seems to be a suitable name. Any form of row subsampling would thus be required if either bagging_fraction < 1 or bootstrap = True.
It is not trivial to have this in the core algorithm.
However, a simple solution is using weight, that is, giving weight 0 to the no-sampled data, 1 to the āone-sampleā data, and k to the āk-sampleā data...
It is easy to have this in python package, since you can change the weight on each iteration.
Good hint. I was actually not aware that case weights could be updated during training. The Poisson distribution with mean 1 will provide an efficient and approximately correct weight distribution.
I am reopening this as
I am still interested in this feature in order to be able to emulate random forests. Together with the relatively new "colsample_bynode", it would be very close to a native random forest.
Sampling with replacement should be computationally more efficient than without.
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
Hi @guolinke,
However, a simple solution is using weight, that is, giving weight 0 to the no-sampled data, 1 to the āone-sampleā data, and k to the āk-sampleā data...
It is easy to have this in python package, since you can change the weight on each iteration.
I understand that this is an old and closed issue, but may I ask you to elaborate on this solution a little bit more? How can one change sample weights for each tree in the random forest?
One solution could be using callbacks I suppose, is it the only way?
hi @rdbuf , yeah, callback is the most convenient way to do this, I think.
I see, thanks :)