How to define a custom loss function in catboost? For example, to define the squred error as the cost function to optimize.
eg: 'SE' represnets Squred Error, if i write:
CatBoostRegressor(learning_rate=1, depth=6, loss_function='RMSE', custom_loss = 'SE')
then the loss_function='RMSE' param will not take effect during model training, right?
loss_function - this is the name of optimized function. custom_loss - this is the list of functions which values you can look on, or run overfitting detector for example.
So the optimized value function is always the one that is written in loss_function field.
https://tech.yandex.com/catboost/doc/dg/concepts/loss-functions-docpage/ - here is the list of supported loss functions. SE is not in the list, so it will not be able to calculate it, it will fail.
But if you would write MAE there, then it would optimize RMSE, but also calc MAE value and draw graphs for that and output that to a file (see https://tech.yandex.com/catboost/doc/dg/concepts/output-data_error-functions-docpage/).
There is also a possibility to define your own function to look on or to optimize, here is an example for that:
https://tech.yandex.com/catboost/doc/dg/concepts/python-usages-examples-docpage/#custom-objective-function
Please read the documentation, you can find all of that in there.
@annaveronika I'm sorry. I read the document about the custom function, but I was not sure. What is approxes? The approxes[0] in the following code seems to be outputting a value different from the value predicted by the model.
import math
from catboost import Pool, CatBoostClassifier
class LoglossMetric(object):
def get_final_error(self, error, weight):
return error / (weight + 1e-38)
def is_max_optimal(self):
return True
def evaluate(self, approxes, target, weight):
# approxes is list of indexed containers (containers with only __len__ and __getitem__ defined), one container
# per approx dimension. Each container contains floats.
# weight is one dimensional indexed container.
# target is float.
# weight parameter can be None.
# Returns pair (error, weights sum)
assert len(approxes) == 1
assert len(target) == len(approxes[0])
approx = approxes[0]
error_sum = 0.0
weight_sum = 0.0
for i in xrange(len(approx)):
w = 1.0 if weight is None else weight[i]
weight_sum += w
error_sum += w * (target[i] * approx[i] - math.log(1 + math.exp(approx[i])))
return error_sum, weight_sum
@annaveronika Sorry for frequent comments. I solved it with reference to the link below. I found approxes to be the predicted value. In the case of normalized gini, the following function seems to have no problem.
Start with the 馃惐boost | Kaggle https://www.kaggle.com/hireme/start-with-the-boost/code
def gini(actual, pred, cmpcol=0, sortcol=1):
assert (len(actual) == len(pred))
all = np.asarray(np.c_[actual, pred, np.arange(len(actual))], dtype=np.float)
all = all[np.lexsort((all[:, 2], -1 * all[:, 1]))]
totalLosses = all[:, 0].sum()
giniSum = all[:, 0].cumsum().sum() / totalLosses
giniSum -= (len(actual) + 1) / 2.
return giniSum / len(actual)
def gini_normalized(a, p):
return gini(a, p) / gini(a, a)
class GiniMetric(object):
def get_final_error(self, error, weight):
return error / (weight + 1e-38)
def is_max_optimal(self):
return True
def evaluate(self, approxes, target, weight):
# approxes is list of indexed containers (containers with only __len__ and __getitem__ defined), one container
# per approx dimension. Each container contains floats.
# weight is one dimensional indexed container.
# target is float.
# weight parameter can be None.
# Returns pair (error, weights sum)
assert len(approxes) == 1
assert len(target) == len(approxes[0])
approx = approxes[0]
error_sum = 0.0
weight_sum = 1.0
error_sum = gini_normalized(target, approx)
return error_sum, weight_sum
Most helpful comment
loss_function - this is the name of optimized function. custom_loss - this is the list of functions which values you can look on, or run overfitting detector for example.
So the optimized value function is always the one that is written in loss_function field.
https://tech.yandex.com/catboost/doc/dg/concepts/loss-functions-docpage/ - here is the list of supported loss functions. SE is not in the list, so it will not be able to calculate it, it will fail.
But if you would write MAE there, then it would optimize RMSE, but also calc MAE value and draw graphs for that and output that to a file (see https://tech.yandex.com/catboost/doc/dg/concepts/output-data_error-functions-docpage/).
There is also a possibility to define your own function to look on or to optimize, here is an example for that:
https://tech.yandex.com/catboost/doc/dg/concepts/python-usages-examples-docpage/#custom-objective-function
Please read the documentation, you can find all of that in there.