Keras: Getting nan for custom metric while training

Created on 19 Jan 2017  路  5Comments  路  Source: keras-team/keras

I implemented Mean Average Precision (MAP@all) in tensorflow like this:

def mean_avg_prec_tf(y_true, y_pred):
    dims = tf.shape(y_true)
    n = dims[0]
    k = dims[1]

    _, top_idx = tf.nn.top_k(y_pred, k)

    y_true = tf.to_float(y_true)
    top_idx = tf.to_float(top_idx)

    label_idx = tf.concat(1, [y_true, top_idx])
    label_idx = tf.reshape(label_idx, [n, 2, k])

    def avg_prec(label_idx):
        label = label_idx[0]
        idx = label_idx[1]
        idx = tf.to_int32(idx)
        ordered_pred = tf.gather(label, idx)
        prec = ordered_pred * tf.cumsum(ordered_pred)
        prec /= tf.to_float(tf.range(1, k + 1))
        prec = tf.reduce_sum(prec) / tf.reduce_sum(ordered_pred)
        return prec

    precs = tf.map_fn(avg_prec, label_idx)
    return tf.reduce_mean(precs)

This gives me a nan on training set during training but the correct value for the validation set. Any idea how I can fix this?

All 5 comments

Make sure you have taken care of every denominators. Zero division might be the root cause.

Thanks, yes that makes sense. I'll see if I can fix that.

Added a buffer to the denomicator:

def mean_avg_prec_tf(y_true, y_pred):
    dims = tf.shape(y_true)
    n = dims[0]
    k = dims[1]

    _, top_idx = tf.nn.top_k(y_pred, k)

    y_true = tf.to_float(y_true)
    top_idx = tf.to_float(top_idx)

    label_idx = tf.concat(1, [y_true, top_idx])
    label_idx = tf.reshape(label_idx, [n, 2, k])

    def avg_prec(label_idx):
        label = label_idx[0]
        idx = label_idx[1]
        idx = tf.to_int32(idx)
        ordered_pred = tf.gather(label, idx)
        prec = ordered_pred * tf.cumsum(ordered_pred)
        prec /= tf.to_float(tf.range(1, k + 1))
        s = tf.reduce_sum(ordered_pred) + 1e-12
        prec = tf.reduce_sum(prec) / s
        return prec

    precs = tf.map_fn(avg_prec, label_idx)
    return tf.reduce_sum(precs) / (tf.to_float(tf.count_nonzero(precs)) + 1e-12)

This is working for now. Not sure if there is a clever way to solve this.

Closing the issue. Thanks for the help.

you can use K.epsilon() for 1e-12

Thanks,

My K.epsilon() is set at 1e-07, plus I wanted to keep this entirely in TensorFlow.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kylemcdonald picture kylemcdonald  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

yil8 picture yil8  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments