Lightgbm: [Improvement] [Python] Improve docs relating to custom objective functions

Created on 4 Jun 2020 · 7Comments · Source: microsoft/LightGBM

The current docs regarding using custom objective functions with the scikit-learn API, as can be found here, state:

A custom objective function can be provided for the objective parameter.
In this case, it should have the signature
objective(y_true, y_pred) -> grad, hess or
objective(y_true, y_pred, group) -> grad, hess:
y_true : array-like of shape = [n_samples]
    The target values.
y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
    The predicted values.
group : array-like
    Group/query data, used for ranking task.
grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
    The value of the first order derivative (gradient) for each sample point.
hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
    The value of the second order derivative (Hessian) for each sample point.
For binary task, the y_pred is margin.
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]
and you should group grad and hess in this way as well.

As far as I can tell, there are two issues with this:

There is no notion of what the gradient is with respect to. I assume it is with respect to the outputs of the model, i.e. with respect to P(sample i belongs to class j) for classification or with respect to E[sample i] for regression.
We do not need to provide the hessian, but the diagonal of the hessian.

good first issue

Source

hsorsky

All 7 comments

If I am correct in my thinking, I am happy to pick this up

hsorsky on 4 Jun 2020

@hsorsky The gradient is the derivative of the loss with respect to the prediction. There is an example here: https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
For classification tasks, the preds argument represents the _logit_ of the final predictions (i.e. it should have values in the interval [-infinity, infinity], not [0,1]).

The hessian is the second derivative of the loss with respect to the prediction (so it is a vector, not a matrix). Note that "hessian" is sometimes used differently elsewhere in machine learning, to mean the matrix of second derivatives of the loss function with respect to the _model parameters_ (e.g. the weights in a neural network), whereas here we mean the second derivative of the loss function with respect to the _predicted value_. (See also https://github.com/microsoft/LightGBM/issues/1230#issuecomment-581577525.)

PR to improve the docs is welcome!

btrotta on 15 Jun 2020

👀2

Is this the case in multi-class classification? Or is it assumed that soft-max is applied to the output values that live in (-inf, inf)?

hsorsky on 18 Jun 2020

Multi-class classification is similar to binary, the preds argument that LightGBM will pass to the custom objective is the raw score without softmax applied.

btrotta on 19 Jun 2020

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!