Lightgbm: Implement easy access to single-tree prediction in fitted LGBM model

Created on 8 May 2020  Â·  6Comments  Â·  Source: microsoft/LightGBM

This has been mentioned in #845. However, the suggested solution there is not working. Here I would like to re-emphasize the need and elaborate on the desired feature.

Summary

In sklearn it is super easy to get via "model.estimators_" access to the prediction of every single tree in the ensemble. I mean the single prediction regardless of all other trees (no cumulative prediction). In LightGBM (I am mainly concerned with regression) this is difficult to achieve or even impossible so far (In #845 it was suggested to achieve that via booster.dump_model, leaf_index prediction etc..., but I have not managed to make that work, the values associated with the leaves also seem to be mean-corrected or just reflect the incremental change to the previous tree... but even taking all this into consideration it still is a cumulative prediction and hence super narrow prediction distributions).

Motivation

It would be very useful to have this feature because it certain use cases it is important to get an idea of the distribution of predictions of all the trees (is it wide or narrow; is it skewed). In some way it may be interpreted as a posterior distribution on the metric variable that is to be predicted (in LGBM regression). This is relevant for both classical GBM regression and classical RF regression.

Description

Like in sklearn there should be a .estimators_ object, with a .predict(X) method that gives out the prediction of every single tree for every row in X. It should be easily accessible and not hidden. It should handle whether boost_from_average was used or not automatically. There should be made a clear distinction between cumulative prediction (which is currently implemented with .predict(num_iteration=i) and "iid" prediction (i.e. every single tree on its own), which I suggest to implement as a new feature. One could imagine to have for the .predict() function a flag "cumulative=True" and when set to false, the trees will answer independently from one another.

References

https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/tree/_classes.py#L395

Most helpful comment

I've done the implementation as @StrikerRUS suggested. If boost_from_average is enabled, the average score will be integrated into the first tree. So booster.predict(data, start_iteration=0, num_iteration=1), it will provide the score of the first tree with average value added. Does that meet your request? @pransito

All 6 comments

Any update on this? We are facing similar issues.

@shiyu1994 can you help to check this?

@shiyu1994 can you help to check this?

Maybe we can add a predict_with_tree(tree_id=i) method for Booster. I'll handle this.

Will adding start_iteration parameter to the existing predict method be enough? I think then it will be possible to select one tree with the help of num_iteration and start_iteration. Also, it will be consistent with the API of save_model method (and some others).

https://github.com/microsoft/LightGBM/blob/b299de3f7248627f43c956ea60e5dc8d9bbcba1b/python-package/lightgbm/basic.py#L2809-L2851

https://github.com/microsoft/LightGBM/blob/b299de3f7248627f43c956ea60e5dc8d9bbcba1b/python-package/lightgbm/basic.py#L2611-L2633

I've done the implementation as @StrikerRUS suggested. If boost_from_average is enabled, the average score will be integrated into the first tree. So booster.predict(data, start_iteration=0, num_iteration=1), it will provide the score of the first tree with average value added. Does that meet your request? @pransito

Hello Juline,

There was some response from LighGBM developers, see email below and ticket
in GitHub.

Is this what we were missing?

Hope you are doing well!

Regards

On Tue, 4 Aug 2020 at 07:57, shiyu1994 notifications@github.com wrote:

I've done the implementation as @StrikerRUS
https://github.com/StrikerRUS suggested. If boost_from_average is
enabled, the average score will be integrated into the first tree. So booster.predict(data,
start_iteration=0, num_iteration=1), it will provide the score of the
first tree with average value added. Does that meet your request?
@pransito https://github.com/pransito

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/microsoft/LightGBM/issues/3058#issuecomment-668396064,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABIGNMU2DRHDOP4O4SDVEVDR66POFANCNFSM4M4DHCXQ
.

>

Francisco J. Navarro-Brull

Was this page helpful?
0 / 5 - 0 ratings

Related issues

heroxrq picture heroxrq  Â·  3Comments

jameslamb picture jameslamb  Â·  3Comments

zanemarkson picture zanemarkson  Â·  3Comments

hack-r picture hack-r  Â·  4Comments

tbenthompson picture tbenthompson  Â·  3Comments