Keras: Using outputs with missing values

Created on 3 Feb 2016 · 14Comments · Source: keras-team/keras

I'm trying to train a multi-task regression model, but my outputs are not complete (in fact I only have on average <1% of the values per training instance). I expected mean squared error for the non-null outputs to be a reasonable objective, however, obviously using keras' mean squared error objective, the cost comes out as nan, as the nans will propagate.

Is there any plans for supporting this sort of thing (or is it already supported somehow and I missed it?)

If not, anyone have an idea of a hack? I tried writing a new cost function, like:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true)[(1 - T.isnan(y_true)).nonzero()], axis=-1)

(sorry for mixing keras and theano!)

This evaluates correctly on 1D vectors with nans in the y_true, but this doesn't work with keras, even with batch size set to 1. My next plan was to set the NaNs in y_true to equal y_pred, but I'm not experienced with theano.

stale

Source

lewisacidic

Most helpful comment

Did any one try to use something like this:

def custom_error_function(y_true, y_pred):
    bool_finite = T.is_finite(y_true)
    return K.mean(K.square(T.boolean_mask(y_pred, bool_finite) - T.boolean_mask(y_true, bool_finite)), axis=-1)

It works for me on a small test dataset with np.nan values in y_true.

kcotar on 25 Sep 2017

👍3

All 14 comments

Why not using a if in your cost function, like if output contains a nan then loss is 0 (so no gradient, and keeps weights intact) ?

pommedeterresautee on 21 Feb 2016

Hi all,

I came back to this after spending a little time with theano and working on other things. I think I fixed it for this use case using switch:

# perhaps something like this could be added to the backend API?
K.is_nan = T.isnan                       # tf.is_nan
K.logical_not = lambda x: 1 - x     # tf.logical_not

def squared_error_mv(y_true, y_pred):
    return K.sum(K.switch(K.logical_not(K.is_nan(y_true)), K.square(y_pred - y_true), 0), axis=-1)

Allowed me to train a model with missing values.

This unfortunately didn't work with tensorflow:
ValueError: Shapes (?, ?) and () must have the same rank

I don't think this sort of behaviour should be added to the cost functions by default, but perhaps there is a sensible way of including it somewhere in the library?

lewisacidic on 22 Feb 2016

Ah @pommedeterresautee, just saw you commented (I've been without internet, so I couldn't submit this comment before), thanks! This is basically what you were suggesting, I initially couldn't find the correct theano method to do this (or couldn't get them to work correctly!). I got the slicing technique working, but was crazy slow, as one would expect.

lewisacidic on 22 Feb 2016

@richlewis42 : Could it be that you're getting the value error because your if and else options are not of the same size?

K.square(y_pred - y_true), 0
The zero will only have size (), and the calculated value will be (?, ?). Perhaps something like tf.zeros() to make a list of zeros to use in the else case?

kurofuneparry on 14 Oct 2016

I am interested in this too, but in recurrent network (return_sequences=True). I do not have labels for some elements in sequences.

djstrong on 12 Mar 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 11 Jun 2017

Did any one try to use something like this:

def custom_error_function(y_true, y_pred):
    bool_finite = T.is_finite(y_true)
    return K.mean(K.square(T.boolean_mask(y_pred, bool_finite) - T.boolean_mask(y_true, bool_finite)), axis=-1)

It works for me on a small test dataset with np.nan values in y_true.

kcotar on 25 Sep 2017

👍3

2019: This would still be amazing to have :) Did anyone manage to implement this for TensorFlow?

cpury on 29 Jul 2019

In keras you can set different weights for each output.
A solution would be to set weight to zero for the output with missing value .
(And probably replace the nan by zero just in case)

organic-chemistry on 1 Aug 2019

@organic-chemistry I'm not aware this functionality exists in Keras... There's only class_weights and sample_weights. Do you have a code sample?

cpury on 1 Aug 2019

Yes you are right, I was thinking that it was a loss_weights.
You can do it this way by creating a loss that depends on a weight loss,
rather than using a switch: You define two models:

from keras.models import Model
from keras.layers import Input, Dense
import keras.backend as K

inp = Input(shape=(3,))

w1 = Input(shape=(1,))
w2 = Input(shape=(1,))

out1 = Dense(1)(inp)
out2 = Dense(1)(inp)

def weighted_loss(weight):

    def loss(y_true,y_pred):
        return K.mean(K.square(y_pred - y_true) * weight , axis=-1)
    return loss

model = Model(inputs=inp, outputs=[out1,out2])

modelw = Model(inputs=[inp,w1,w2],outputs=[out1,out2])
modelw.compile(optimizer='rmsprop',
              loss=[weighted_loss(w1),weighted_loss(w2)])

Then for example to train it:

import numpy as np
inp_v = np.array([[0,0,0],[1,1,1],[0,1,1],[1,0,0],[1,1,0]])

out1_v = np.array([0,0,0,0,0])
out2_v = np.array([1,np.nan,np.nan,1,1])

w1_v = np.array(~np.isnan(out1_v),dtype=np.int)
w2_v = np.array(~np.isnan(out2_v),dtype=np.int)

#Replace nan by 0 or anything just in case
out1_v[np.isnan(out1_v)] = 10000
out2_v[np.isnan(out2_v)] = 10000

modelw.fit([inp_v,w1_v,w2_v],[out1_v,out2_v],epochs=1000,verbose=False);

And then to predict: model.predict(inp_v)

[array([[0.00049998],
        [0.00199989],
        [0.00149992],
        [0.00099995],
        [0.00149993]], dtype=float32), array([[1.0005   ],
        [0.8288187],
        [0.8283187],
        [1.0009999],
        [1.0014999]], dtype=float32)]

organic-chemistry on 1 Aug 2019

Cool, thanks for that example! When we're already writing a custom loss function, we could also detect nans directly right? Basically like your code, but create the weights dynamically based on a nan-mask.

AFAIK, that's what previous posters here tried to build and so far nobody has succeeded for TensorFlow.

cpury on 1 Aug 2019

Hey @cpury , sorry for not replying until now!

In the end I did get this working with TF (back in 2016!). I don't have access to the code right now, but I think it worked mostly like above:

>>> import tensorflow as tf
>>> from numpy import array, nan

>>> def mse_mv(y_true, y_pred):
...     per_instance = tf.where(tf.is_nan(y_true),
...                             tf.zeros_like(y_true),
...                             tf.square(tf.subtract(y_pred, y_true)))
...     return tf.reduce_mean(per_instance, axis=0)

>>> y_true = array([[ 1.,  2.],
...                 [ 2.,  3.],
...                 [nan,  4.],
...                 [ 5.,  6.]])

>>> y_pred = array([[ 1.,  2.],
...                 [ 2.,  4.],
...                 [ 42,  4.],
...                  [ 3.,  7.]])

>>> loss = mse_mv(y_true, y_pred)
>>> with tf.Session().as_default():
...    loss.eval()
array([1., 0.5.])

Dunno if this will work with keras but I guess so.

Edit: fixed it to me mean squared

lewisacidic on 1 Aug 2019

Hey @cpury , sorry for not replying until now!

In the end I did get this working with TF (back in 2016!). I don't have access to the code right now, but I think it worked mostly like above:

>>> import tensorflow as tf
>>> from numpy import array, nan

>>> def mse_mv(y_true, y_pred):
...     per_instance = tf.where(tf.is_nan(y_true),
...                             tf.zeros_like(y_true),
...                             tf.square(tf.subtract(y_pred, y_true)))
...     return tf.reduce_mean(per_instance, axis=0)

>>> y_true = array([[ 1.,  2.],
...                 [ 2.,  3.],
...                 [nan,  4.],
...                 [ 5.,  6.]])

>>> y_pred = array([[ 1.,  2.],
...                 [ 2.,  4.],
...                 [ 42,  4.],
...                  [ 3.,  7.]])

>>> loss = mse_mv(y_true, y_pred)
>>> with tf.Session().as_default():
...    loss.eval()
array([1., 0.5.])

Dunno if this will work with keras but I guess so.

Edit: fixed it to me _mean_ squared

This solution did not work for me in the 2.2, so inspired by the previous solution and assuming targets true values have missing values, I suggest this simple code, in which I suggest replacing NaN values with the prediction values:

import tensorflow as tf
from tensorflow.keras import losses

class MeanSquaredErrorLossThatIgnoresNaN(losses.MeanSquaredError):
    def __init__(self, *args, **kwargs):
        losses.MeanSquaredError.__init__(self, *args, **kwargs)

    def __call__(self, y_true, y_pred, sample_weight=None):
        y_true = tf.where(tf.math.is_nan(y_true), y_pred, y_true)
        return losses.MeanSquaredError.__call__(self, y_true, y_pred, sample_weight=sample_weight)

Ussage:

model.compile(..., loss=MeanSquaredErrorLossThatIgnoresNaN())

Am I doing something terribly wrong in your opinion?