Keras: Using outputs with missing values

Created on 3 Feb 2016  路  14Comments  路  Source: keras-team/keras

I'm trying to train a multi-task regression model, but my outputs are not complete (in fact I only have on average <1% of the values per training instance). I expected mean squared error for the non-null outputs to be a reasonable objective, however, obviously using keras' mean squared error objective, the cost comes out as nan, as the nans will propagate.

Is there any plans for supporting this sort of thing (or is it already supported somehow and I missed it?)

If not, anyone have an idea of a hack? I tried writing a new cost function, like:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true)[(1 - T.isnan(y_true)).nonzero()], axis=-1)

(sorry for mixing keras and theano!)

This evaluates correctly on 1D vectors with nans in the y_true, but this doesn't work with keras, even with batch size set to 1. My next plan was to set the NaNs in y_true to equal y_pred, but I'm not experienced with theano.

stale

Most helpful comment

Did any one try to use something like this:

def custom_error_function(y_true, y_pred):
    bool_finite = T.is_finite(y_true)
    return K.mean(K.square(T.boolean_mask(y_pred, bool_finite) - T.boolean_mask(y_true, bool_finite)), axis=-1)

It works for me on a small test dataset with np.nan values in y_true.

All 14 comments

Why not using a if in your cost function, like if output contains a nan then loss is 0 (so no gradient, and keeps weights intact) ?

Hi all,

I came back to this after spending a little time with theano and working on other things. I think I fixed it for this use case using switch:

# perhaps something like this could be added to the backend API?
K.is_nan = T.isnan                       # tf.is_nan
K.logical_not = lambda x: 1 - x     # tf.logical_not

def squared_error_mv(y_true, y_pred):
    return K.sum(K.switch(K.logical_not(K.is_nan(y_true)), K.square(y_pred - y_true), 0), axis=-1)

Allowed me to train a model with missing values.

This unfortunately didn't work with tensorflow:
ValueError: Shapes (?, ?) and () must have the same rank

I don't think this sort of behaviour should be added to the cost functions by default, but perhaps there is a sensible way of including it somewhere in the library?

Ah @pommedeterresautee, just saw you commented (I've been without internet, so I couldn't submit this comment before), thanks! This is basically what you were suggesting, I initially couldn't find the correct theano method to do this (or couldn't get them to work correctly!). I got the slicing technique working, but was crazy slow, as one would expect.

@richlewis42 : Could it be that you're getting the value error because your if and else options are not of the same size?

K.square(y_pred - y_true), 0
The zero will only have size (), and the calculated value will be (?, ?). Perhaps something like tf.zeros() to make a list of zeros to use in the else case?

I am interested in this too, but in recurrent network (return_sequences=True). I do not have labels for some elements in sequences.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Did any one try to use something like this:

def custom_error_function(y_true, y_pred):
    bool_finite = T.is_finite(y_true)
    return K.mean(K.square(T.boolean_mask(y_pred, bool_finite) - T.boolean_mask(y_true, bool_finite)), axis=-1)

It works for me on a small test dataset with np.nan values in y_true.

2019: This would still be amazing to have :) Did anyone manage to implement this for TensorFlow?

In keras you can set different weights for each output.
A solution would be to set weight to zero for the output with missing value .
(And probably replace the nan by zero just in case)

@organic-chemistry I'm not aware this functionality exists in Keras... There's only class_weights and sample_weights. Do you have a code sample?

Yes you are right, I was thinking that it was a loss_weights.
You can do it this way by creating a loss that depends on a weight loss,
rather than using a switch: You define two models:

from keras.models import Model
from keras.layers import Input, Dense
import keras.backend as K

inp = Input(shape=(3,))

w1 = Input(shape=(1,))
w2 = Input(shape=(1,))

out1 = Dense(1)(inp)
out2 = Dense(1)(inp)

def weighted_loss(weight):

    def loss(y_true,y_pred):
        return K.mean(K.square(y_pred - y_true) * weight , axis=-1)
    return loss

model = Model(inputs=inp, outputs=[out1,out2])

modelw = Model(inputs=[inp,w1,w2],outputs=[out1,out2])
modelw.compile(optimizer='rmsprop',
              loss=[weighted_loss(w1),weighted_loss(w2)])

Then for example to train it:

import numpy as np
inp_v = np.array([[0,0,0],[1,1,1],[0,1,1],[1,0,0],[1,1,0]])

out1_v = np.array([0,0,0,0,0])
out2_v = np.array([1,np.nan,np.nan,1,1])

w1_v = np.array(~np.isnan(out1_v),dtype=np.int)
w2_v = np.array(~np.isnan(out2_v),dtype=np.int)

#Replace nan by 0 or anything just in case
out1_v[np.isnan(out1_v)] = 10000
out2_v[np.isnan(out2_v)] = 10000

modelw.fit([inp_v,w1_v,w2_v],[out1_v,out2_v],epochs=1000,verbose=False);

And then to predict: model.predict(inp_v)

[array([[0.00049998],
        [0.00199989],
        [0.00149992],
        [0.00099995],
        [0.00149993]], dtype=float32), array([[1.0005   ],
        [0.8288187],
        [0.8283187],
        [1.0009999],
        [1.0014999]], dtype=float32)]

Cool, thanks for that example! When we're already writing a custom loss function, we could also detect nans directly right? Basically like your code, but create the weights dynamically based on a nan-mask.

AFAIK, that's what previous posters here tried to build and so far nobody has succeeded for TensorFlow.

Hey @cpury , sorry for not replying until now!

In the end I did get this working with TF (back in 2016!). I don't have access to the code right now, but I think it worked mostly like above:

>>> import tensorflow as tf
>>> from numpy import array, nan

>>> def mse_mv(y_true, y_pred):
...     per_instance = tf.where(tf.is_nan(y_true),
...                             tf.zeros_like(y_true),
...                             tf.square(tf.subtract(y_pred, y_true)))
...     return tf.reduce_mean(per_instance, axis=0)

>>> y_true = array([[ 1.,  2.],
...                 [ 2.,  3.],
...                 [nan,  4.],
...                 [ 5.,  6.]])

>>> y_pred = array([[ 1.,  2.],
...                 [ 2.,  4.],
...                 [ 42,  4.],
...                  [ 3.,  7.]])

>>> loss = mse_mv(y_true, y_pred)
>>> with tf.Session().as_default():
...    loss.eval()
array([1., 0.5.])

Dunno if this will work with keras but I guess so.

Edit: fixed it to me mean squared

Hey @cpury , sorry for not replying until now!

In the end I did get this working with TF (back in 2016!). I don't have access to the code right now, but I think it worked mostly like above:

>>> import tensorflow as tf
>>> from numpy import array, nan

>>> def mse_mv(y_true, y_pred):
...     per_instance = tf.where(tf.is_nan(y_true),
...                             tf.zeros_like(y_true),
...                             tf.square(tf.subtract(y_pred, y_true)))
...     return tf.reduce_mean(per_instance, axis=0)

>>> y_true = array([[ 1.,  2.],
...                 [ 2.,  3.],
...                 [nan,  4.],
...                 [ 5.,  6.]])

>>> y_pred = array([[ 1.,  2.],
...                 [ 2.,  4.],
...                 [ 42,  4.],
...                  [ 3.,  7.]])

>>> loss = mse_mv(y_true, y_pred)
>>> with tf.Session().as_default():
...    loss.eval()
array([1., 0.5.])

Dunno if this will work with keras but I guess so.

Edit: fixed it to me _mean_ squared

This solution did not work for me in the 2.2, so inspired by the previous solution and assuming targets true values have missing values, I suggest this simple code, in which I suggest replacing NaN values with the prediction values:

import tensorflow as tf
from tensorflow.keras import losses

class MeanSquaredErrorLossThatIgnoresNaN(losses.MeanSquaredError):
    def __init__(self, *args, **kwargs):
        losses.MeanSquaredError.__init__(self, *args, **kwargs)

    def __call__(self, y_true, y_pred, sample_weight=None):
        y_true = tf.where(tf.math.is_nan(y_true), y_pred, y_true)
        return losses.MeanSquaredError.__call__(self, y_true, y_pred, sample_weight=sample_weight)

Ussage:

model.compile(..., loss=MeanSquaredErrorLossThatIgnoresNaN())

Am I doing something terribly wrong in your opinion?

Was this page helpful?
0 / 5 - 0 ratings