Xgboost: Missing example of multiclass custom objective

Created on 16 Mar 2017  路  10Comments  路  Source: dmlc/xgboost

I am trying to implement my own objective function for multiclass classification.

For binary classification, I found help here https://github.com/dmlc/xgboost/issues/15.

My loss function, however, does not have only one partial derivative.

What is the expected format of grad and hess returned for multiclass case?

Most helpful comment

You can look at the C++ implementation for hints: https://github.com/dmlc/xgboost/blob/master/src/objective/multiclass_obj.cc. To your specific question, grad and hess are both N*K vectors where N=#data, K=#classes.

I created a gist with an implementation of multiclass classification via a custom objective, see https://gist.github.com/ytsaig/a596feec53d2a024ac69f5ae5a83d8f7

All 10 comments

You can look at the C++ implementation for hints: https://github.com/dmlc/xgboost/blob/master/src/objective/multiclass_obj.cc. To your specific question, grad and hess are both N*K vectors where N=#data, K=#classes.

I created a gist with an implementation of multiclass classification via a custom objective, see https://gist.github.com/ytsaig/a596feec53d2a024ac69f5ae5a83d8f7

ytsaig Great thanks for your sharing of gist code. But, I want to design the custom obj function by adding the cost matrix. For example, when the model labels class 1 to class 0 , it will receive great penalty like weight 100. But, when the model labels class 0 to class 1, it will just receive little penalty like weight 1. But I do not know how to do it, Would you mind helping me to solve this problem?

@ytsaig

@ytsaig My email is [email protected]. Hope to get you appreciated help and response.

@binghesam you could probably use a weight vector that assigns different weights to different classes, so you can assign more weights to classification errors of class 0, similar to https://stackoverflow.com/questions/42191362/loss-function-design-to-incorporate-different-weight-for-false-positive-and-fals. Or you could look for a softmax-type loss that is not symmetric about, though I don't know of one.

@ytsaig Great thanks for your response, in the past days, I have followed your answers and advice to try it. But, currently, by viewing your gist code in the link of
https://gist.github.com/ytsaig/a596feec53d2a024ac69f5ae5a83d8f7
you just mention the grad and hessi, i cannot quite get your original cost function without conducting any grad and hess operation.
Would you mind showing the custom cost function in this issue or sending it to my email address.

I have been stuck in this problem for a few days. Sorry for this troubling.
@ytsaig

The loss function in that example is the softmax function, often used for classification in multi-class settings. It's implemented in the gist starting in line 8. For more details on the calculation of its gradient, see for example https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/.

Am I correct in understanding that the derivatives are with respect to z (like in the softmax function), even though the softprob example here directly gives the p-values?

To clarify:

The gradient is dL / dz = dL / dp * dp/dz? With L the loss function and z the value that goes into the softmax function

(this leads to grad = preds - labels like here https://stackoverflow.com/questions/39093683/how-is-the-gradient-and-hessian-of-logarithmic-loss-computed-in-the-custom-objec?noredirect=1&lq=1)

I'm asking because my link deals with binary classification (and logloss), and it seems unlikely that the multiclass derivative (and softmax) should be exactly the same. Plus I don't know what the code expects exactly (i.e. dL/dz or dL/dp).

Thanks in advance,

JR

@binghesam I also try to custom mlogloss obj function by adding a cost matrix. If you already have any implementation you can share, it will be very helpful. thanks in advance

@ShaniCohen

I eventually came up with:

COST_MATRIX = np.matrix([[10, 10, 20],
                         [10, 1, 10],
                         [20, 10, 10]])

def custom_obj(preds, dtrain):
    labels = dtrain.get_label()
    labels = OneHotEncoder(sparse=False).fit_transform(labels.reshape(-1, 1))

    grad = np.asarray(np.multiply((preds - labels), np.dot(labels, COST_MATRIX)))
    hess = np.asarray(np.multiply((2.0 * preds * (1.0 - preds)), np.dot(labels, COST_MATRIX)))

    return grad.flatten(), hess.flatten()

Where my COST_MATRIX is formatted the same as the confusion matrix (hence labels dot COST_MATRIX and not the other way around).

Good luck!

Johan
PS: It's still unclear to me where the 2 comes from in the hessian

Was this page helpful?
0 / 5 - 0 ratings