Catboost: How to define a custom loss function?

Created on 22 Sep 2017  路  3Comments  路  Source: catboost/catboost

How to define a custom loss function in catboost? For example, to define the squred error as the cost function to optimize.

eg: 'SE' represnets Squred Error, if i write:
CatBoostRegressor(learning_rate=1, depth=6, loss_function='RMSE', custom_loss = 'SE')
then the loss_function='RMSE' param will not take effect during model training, right?

Most helpful comment

loss_function - this is the name of optimized function. custom_loss - this is the list of functions which values you can look on, or run overfitting detector for example.
So the optimized value function is always the one that is written in loss_function field.

https://tech.yandex.com/catboost/doc/dg/concepts/loss-functions-docpage/ - here is the list of supported loss functions. SE is not in the list, so it will not be able to calculate it, it will fail.
But if you would write MAE there, then it would optimize RMSE, but also calc MAE value and draw graphs for that and output that to a file (see https://tech.yandex.com/catboost/doc/dg/concepts/output-data_error-functions-docpage/).

There is also a possibility to define your own function to look on or to optimize, here is an example for that:
https://tech.yandex.com/catboost/doc/dg/concepts/python-usages-examples-docpage/#custom-objective-function

Please read the documentation, you can find all of that in there.

All 3 comments

loss_function - this is the name of optimized function. custom_loss - this is the list of functions which values you can look on, or run overfitting detector for example.
So the optimized value function is always the one that is written in loss_function field.

https://tech.yandex.com/catboost/doc/dg/concepts/loss-functions-docpage/ - here is the list of supported loss functions. SE is not in the list, so it will not be able to calculate it, it will fail.
But if you would write MAE there, then it would optimize RMSE, but also calc MAE value and draw graphs for that and output that to a file (see https://tech.yandex.com/catboost/doc/dg/concepts/output-data_error-functions-docpage/).

There is also a possibility to define your own function to look on or to optimize, here is an example for that:
https://tech.yandex.com/catboost/doc/dg/concepts/python-usages-examples-docpage/#custom-objective-function

Please read the documentation, you can find all of that in there.

@annaveronika I'm sorry. I read the document about the custom function, but I was not sure. What is approxes? The approxes[0] in the following code seems to be outputting a value different from the value predicted by the model.

import math
from catboost import Pool, CatBoostClassifier

class LoglossMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return True

    def evaluate(self, approxes, target, weight):
        # approxes is list of indexed containers (containers with only __len__ and __getitem__ defined), one container
        # per approx dimension. Each container contains floats.
        # weight is one dimensional indexed container.
        # target is float.

        # weight parameter can be None.
        # Returns pair (error, weights sum)

        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        for i in xrange(len(approx)):
            w = 1.0 if weight is None else weight[i]
            weight_sum += w
            error_sum += w * (target[i] * approx[i] - math.log(1 + math.exp(approx[i])))

        return error_sum, weight_sum

@annaveronika Sorry for frequent comments. I solved it with reference to the link below. I found approxes to be the predicted value. In the case of normalized gini, the following function seems to have no problem.

Start with the 馃惐boost | Kaggle https://www.kaggle.com/hireme/start-with-the-boost/code

def gini(actual, pred, cmpcol=0, sortcol=1):
    assert (len(actual) == len(pred))
    all = np.asarray(np.c_[actual, pred, np.arange(len(actual))], dtype=np.float)
    all = all[np.lexsort((all[:, 2], -1 * all[:, 1]))]
    totalLosses = all[:, 0].sum()
    giniSum = all[:, 0].cumsum().sum() / totalLosses

    giniSum -= (len(actual) + 1) / 2.
    return giniSum / len(actual)


def gini_normalized(a, p):
    return gini(a, p) / gini(a, a)


class GiniMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return True

    def evaluate(self, approxes, target, weight):
        # approxes is list of indexed containers (containers with only __len__ and __getitem__ defined), one container
        # per approx dimension. Each container contains floats.
        # weight is one dimensional indexed container.
        # target is float.

        # weight parameter can be None.
        # Returns pair (error, weights sum)

        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 1.0

        error_sum = gini_normalized(target, approx)

        return error_sum, weight_sum
Was this page helpful?
0 / 5 - 0 ratings