I noticed there's an objective for root mean square error but not for mean absolute error? I'm happy to put together a PR for this if you think this would be useful.
Actually what I think would be even more useful is to generalize this to quantile loss 鈥撀爃appy to include that too. My use case is I want to predict a 90% interval. This could be done by training two separate predictors, quantile loss for 0.05 and 0.95.
Let me know if you think this is a bad idea and I won't attempt it :)
This will definitely be useful. In scikit-learn's implementation, it has squared, absolute, huber and quantile loss for regression tasks. It would be nice if xgboost also support them.
I started adding this and it seems pretty straightforward. There is one problem though. xgboost computes the second derivative and for quantile loss it's a Dirac delta, i.e. zero everywhere except an infinite spike around x=0.
I'm not sure how the second derivative (Hessian) is used 鈥撀爄t doesn't seem to be used in sklearn's implementation
Closing this issue for now 鈥撀爓ill try to put together a pull request with the implementation. But would love to hear if you have any thoughts about what to do with the hessian
I looked at this some more and I have implemented something.
The problem is again with the Hessian. From what I understand, xgboost uses Newton's method to find the optimum so that's why the second derivative is needed.
I see several options
log(exp(-x) + exp(x))
which is similarI can put together a PR for Huber loss for now, but I'm not super excited about it since it requires a parameter delta
. But maybe let's start there?
closing for now, also note #736 refactor gives an plugin system https://github.com/dmlc/xgboost/tree/brick/plugin to add in new loss and metrics
@erikbern I was looking for this exactly. Is there a Hubert loss equivalent for the tilted absolute loss case (i.e. when you want other quantiles besides the median?).
Even if you did get a second derivative is there a batch processing method in XgBoost, otherwise taking the Hessian might be prohibitive.
I implemented Huber but it has the same problems with convergence since the derivative isn't continuous.
I think the only solution would be to support a first-order method that doesn't require the Hessian
The twice differentiable log-cosh loss seems to be a reasonably good option for the xgboost's 2nd order framework.
Hello @tqchen,
Are plugins still the recommended way to add custom loss functions?
Are plugins only usable through the C++ frontend?
If yes, I guess I won't be able to use my custom loss in R.
I assume I should be using the example of a custom objective for R instead.
The log-cosh objective still suffers from the gradient and hessian for very large off-target predictions being constant, therefore resulting in absence of splits. Here is a blog post on the topic, a possible solution and implementation in Python: http://www.bigdatarepublic.nl/regression-prediction-intervals-with-xgboost/
Most helpful comment
The log-cosh objective still suffers from the gradient and hessian for very large off-target predictions being constant, therefore resulting in absence of splits. Here is a blog post on the topic, a possible solution and implementation in Python: http://www.bigdatarepublic.nl/regression-prediction-intervals-with-xgboost/