For some objective types, LightGBM transforms the raw score (sum of tree outputs) to produce the model output. For example, for Poisson, Gamma, and Tweedie regressions, the raw score is passed through the exp function. However, when the boosting type is rf (random forest), these transforms don't appear to be applied, instead the output of the model is just the average of the tree outputs.
Similarly, when reg_sqrt is true, the raw score is squared to produce the model output. However, when either the boosting type is rf OR the objective type is Poisson, Gamma, Tweedie, or Huber, this output transform does not appear to be applied.
For binary classification, again if the boosting type is rf, it appears that the model outputs the average of the tree outputs, but does not apply the 1 / Math.Exp(-Sigmoid * RawScore) transform to produce a probability.
Is this to be expected? Shouldn't the choice of reg_sqrt and whether or not boosting type rf is used be independent of the model output meaning?
PS: Just a guess, but looking at the generated C++ code for the tree:
void GBDT::Predict(const double* features, double *output, const PredictionEarlyStopInstance* early_stop) const {
PredictRaw(features, output, early_stop);
if (average_output_) {
for (int k = 0; k < num_tree_per_iteration_; ++k) {
output[k] /= num_iteration_for_pred_;
}
}
else if (objective_function_ != nullptr) {
objective_function_->ConvertOutput(output, output);
}
}
It seems like the else if should just be an if, as otherwise if averaging is done (as is the case for random forest), the ConvertOutput methods are never called?
@guolinke Any comment on this, please?
@mjmckp sorry, I am in the vacation.
In this PR: https://github.com/Microsoft/LightGBM/pull/1637, RF will directly fit the targets by regression. Therefore, it is no need for label transform.
And I think reg_sqrt is not used in RF mode as well.
I think we may need a Better solution for RF mode, by using the gradients from objective function here : https://github.com/Microsoft/LightGBM/blob/master/src/boosting/rf.hpp#L80-L110, and enable the ConvertOutput in prediction.
@mjmckp I am afraid I cannot fix this for RF recently.
Would you like to contribute this?
I could give it a go, if you could provide a rough description of what changes are required and where?
@mjmckp thanks so much!
you need to:
ping @mjmckp
any progress?
@mjmckp if you don't have time for this, I can fix it recently.
@guolinke That would be great, thanks, as this isn't something that I desperately need, but just up when I was writing the unit tests for https://github.com/rca22/LightGBM.Net However, I would like to understand the LightGBM codebase better in order to contribute in the future.