Lightgbm: Probabilistic Forecasting

Created on 1 Jul 2020  ·  11Comments  ·  Source: microsoft/LightGBM

We know that LightGBM currently supports quantile regression, which is great, However, quantile regression can be an inefficient way to gauge prediction uncertainty because a new model needs to be built for every quantile, and in theory each of those models may have their own set of optimal hyperparameters, which becomes unwieldy from a production standpoint if you're interested in multiple quantiles as you can end up with many models. One of the main issues with machine learning is point predictions, as businesses are often interested to know what the probability distribution is for a given prediction. There are various methods to do this with neural networks, and only recently there have been new ways to address this with tree based models.

The NGBoost library has attempted to do this as per below

https://stanfordmlgroup.github.io/projects/ngboost/

Additionally there has been a paper on adapting XGBoost to be able to do this (in a different manner) although the author has not yet posted an implementation.

https://github.com/StatMixedML/XGBoostLSS

Something to consider as a feature as it would make LightGBM infinitely more valuable in regression scenarios.

Most helpful comment

FWIW - Catboost has recently rolled out support for something like this as well, in version 0.24 via RMSEWithUncertainty. I don't know how they implemented it yet.

Thanks to GitHub we can find the corresponding commit: https://github.com/catboost/catboost/commit/af88523dcb4dbaececac1891b26316a9cc23c384.

All 11 comments

Seconded - I think this would make LightGBM incredibly more useful for the same regression problems it is used to tackle now, as well as additional problems that require a more probabilistic approach.

Some of the ngboost team's ideas for next steps, on predicting joint probability distributions, as they mention in their slides ( https://drive.google.com/file/d/183BWFAdFms81MKy6hSku8qI97OwS_JH_/view ), are particularly interesting as well:

Demonstrate use for joint-outcomes regression (e.g. ”what’s the probability that it rains >3 inches and is >15C tomorrow?”)

Thanks @MotoRZR for referring to my repo https://github.com/StatMixedML. In fact, I am currently also working on an extension of LightGBM to probabilistic forecasting, see the repo here https://github.com/StatMixedML/LightGBMLSS

This would be a wonderful addition. FWIW - Catboost has recently rolled out support for something like this as well, in version 0.24 via RMSEWithUncertainty. I don't know how they implemented it yet.

FWIW - Catboost has recently rolled out support for something like this as well, in version 0.24 via RMSEWithUncertainty. I don't know how they implemented it yet.

Thanks to GitHub we can find the corresponding commit: https://github.com/catboost/catboost/commit/af88523dcb4dbaececac1891b26316a9cc23c384.

@kmedved Thanks for pointing towards the RMSEWithUncertainty! Very interesting, even though I am not sure how it is implemented exactly.

The fact that the RMSE is used as a loss function makes me doubt that it is truly a probabilistic approach. The reason is that, as splitting procedures that are internally used to construct trees can detect changes in the mean only, standard implementations of machine learning models are not able to recognize any distributional changes (e.g., change of
variance), even if these can be related to covariates. Since RMSE is a loss function that is minimal if the estimator is the mean, I am not sure how it deals with changes in variance, skewness etc.

That's a good note @StatMixedML, although I am actually not totally sure they're using plain RMSE as a loss function (despite the name). An explainer notebook is coming, but if you try it out, the validation loss on the model does not match (or even resemble RMSE). I've put together an example Colab notebook here., where on the CA housing dataset, without any tuning, the RMSEloss is 0.5166607793912465, while the RMSEWithUncertainty loss is 0.06206563547371406. So they seem to be using some custom scoring for RMSEWithUncertainty loss.

@kmedved Thanks for the interesting comparison! Indeed, it seems as if RMSEWithUncertainty != RMSE.

Anyways, I am not sure how one would evaluate RMSEWithUncertainty within a probabilistic framework; say how well the forecast uncertainty is calibrated, using something like a scoringRule. I agree that mostly point forecasts are used, but having a good estimation of the uncertainty is at least as, if not even more important, than point estimates. I can't remember where if found the quote, but it is a nice one

It’s better to be approximately right than exactly wrong.

The first half, of course, relates to a probabilistic forecast, whereas the second half aims at point forecasts.

Keep in mind that LightGBM already includes Quantile Regression. Even though it may not enjoy the probabilistic properties of a true bayesian forecast, it is still the most used method nowadays for variance forecast estimation.

Keep in mind that LightGBM already includes Quantile Regression. Even though it may not enjoy the probabilistic properties of a _true_ bayesian forecast, it is still the most used method nowadays for variance forecast estimation.

Quantile regression is fine if you're only interested in specific quantiles. If you want to the full distribution it's not as useful. Also, with quantile regression it's inefficient to have a model for each quantile with different sets of hyperparameters.

With neural nets there are various ways to do this like:
-variational inference i.e., have two targets for the network, the mean and standard deviation
-bayesian regression
-dropout based approach http://mlg.eng.cam.ac.uk/yarin/PDFs/NIPS_2015_deep_learning_uncertainty.pdf

With GBDT not as many tools, only recently has NGBoost has come on the scene. Seems like StatMixedML also has something in the works. Quantile regression is ok, but not the magic bullet solution for the reasons mentioned.

This article also points out some of the flaws of quantile regression. Not specifically LightGBM related but still relevant.

https://medium.com/@qucit/a-simple-technique-to-estimate-prediction-intervals-for-any-regression-model-2dd73f630bcb

@MotoRZR yes, I recall reading that article a while ago.

IMO, the most interesting approach is the one where the parameters of a distribution are estimated (this is, the first one you mentioned two messages above). In fact, that distribution parameter estimation method is what Amazon uses in their DeepAR paper (which happens to be the default model in their AWS Forecaster service).

However, I am not sure whether this should be added as a new objective. It is relatively easy to get it up and running with the existing API.

MC dropout for boosted trees is something I have been thinking about. However, at least in neural nets MC dropout generates distributions that usually end up having low variance (and therefore narrow prediction intervals) compared to more conventional bayesian inference fitting methods. But perhaps it is worth exploring for GBDT...

Was this page helpful?
0 / 5 - 0 ratings