The current docs regarding using custom objective functions with the scikit-learn API, as can be found here, state:
A custom objective function can be provided for the
objectiveparameter.
In this case, it should have the signature
objective(y_true, y_pred) -> grad, hessor
objective(y_true, y_pred, group) -> grad, hess:y_true : array-like of shape = [n_samples] The target values. y_pred : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) The predicted values. group : array-like Group/query data, used for ranking task. grad : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) The value of the first order derivative (gradient) for each sample point. hess : array-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task) The value of the second order derivative (Hessian) for each sample point.For binary task, the y_pred is margin.
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i]
and you should group grad and hess in this way as well.
As far as I can tell, there are two issues with this:
If I am correct in my thinking, I am happy to pick this up
@hsorsky The gradient is the derivative of the loss with respect to the prediction. There is an example here: https://github.com/microsoft/LightGBM/blob/e83042f20633d7f74dda0d18624721447a610c8b/examples/python-guide/advanced_example.py#L136
For classification tasks, the preds argument represents the _logit_ of the final predictions (i.e. it should have values in the interval [-infinity, infinity], not [0,1]).
The hessian is the second derivative of the loss with respect to the prediction (so it is a vector, not a matrix). Note that "hessian" is sometimes used differently elsewhere in machine learning, to mean the matrix of second derivatives of the loss function with respect to the _model parameters_ (e.g. the weights in a neural network), whereas here we mean the second derivative of the loss function with respect to the _predicted value_. (See also https://github.com/microsoft/LightGBM/issues/1230#issuecomment-581577525.)
PR to improve the docs is welcome!
Is this the case in multi-class classification? Or is it assumed that soft-max is applied to the output values that live in (-inf, inf)?
Multi-class classification is similar to binary, the preds argument that LightGBM will pass to the custom objective is the raw score without softmax applied.
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM!
Removed awaiting-response tag to prevent bot closing the issue. Reopened for hacktoberfest.