Keras: SSIM as objective

Created on 4 Nov 2016 · 42Comments · Source: keras-team/keras

Please make sure that the boxes below are checked before you submit your issue. Thank you!

[X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[X] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Hi guys,
I am wondering if any of you have implemented the SSIM (structural similarity index) to be used as objective. It is often used for measuring the similarity between two images x and y. Its formulation is as follow:

I think this must be easy to implement using generic functions of the backends (Theano os TF), but I am not familiarized with them enough.

stale

Source

bernardohenz

Most helpful comment

Thanks @Dref360 for the code sample and the point towards the comparable Theano function.

Below is the implementation for Theano that follows your code, or so I believe. However, I've seen little movement with the Keras optimizers and DSSIM as implemented below. It seems the loss fluctuates so little that as it backpropagates it goes almost directly to zero. Hundreds of thousands of examples barely get it to budge.

If tested this with multiple optimizers (AdaDelta and SGD, mostly) with the same result. I ended up having to explode the loss from the DSSIM function via either a power function or a multiplier of 10^6 or greater to get any movement in minimizing the loss.

Have you seen this as well, or do you have any thoughts on this?

from theano import tensor as T # for NNET module
import keras.backend as K

def loss_DSSIM_theano(y_true, y_pred):

    # expected net output is of shape [batch_size, row, col, image_channels]
    # e.g. [10, 480, 640, 3] for a batch of 10 640x480 RGB images
    # We need to shuffle this to [Batch_size, image_channels, row, col]
    y_true = y_true.dimshuffle([0, 3, 1, 2])
    y_pred = y_pred.dimshuffle([0, 3, 1, 2])


    # There are additional parameters for this function
    # Note: some of the 'modes' for edge behavior do not yet have a gradient definition in the Theano tree
    #   and cannot be used for learning
    patches_true = T.nnet.neighbours.images2neibs(y_true, [4, 4])
    patches_pred = T.nnet.neighbours.images2neibs(y_pred, [4, 4])

    u_true = K.mean(patches_true, axis=-1)
    u_pred = K.mean(patches_pred, axis=-1)
    var_true = K.var(patches_true, axis=-1)
    var_pred = K.var(patches_pred, axis=-1)
    std_true = K.sqrt(var_true)
    std_pred = K.sqrt(var_pred)
    c1 = 0.01 ** 2
    c2 = 0.03 ** 2
    ssim = (2 * u_true * u_pred + c1) * (2 * std_pred * std_true + c2)
    denom = (u_true ** 2 + u_pred ** 2 + c1) * (var_pred + var_true + c2)

    ssim /= K.clip(denom, K.epsilon(), np.inf)
    #ssim = tf.select(tf.is_nan(ssim), K.zeros_like(ssim), ssim)

    return K.mean((1.0 - ssim) / 2.0)

patyork on 25 Nov 2016

👍3

All 42 comments

hi!
I made my custom loss function but it requires tf 0.11rc. This is the DSSIM so (1- SSIM) / 2.
https://gist.github.com/Dref360/a48feaecfdb9e0609c6a02590fd1f91b

I got great result with it on image analysis task. (background extraction for example)

Dref360 on 6 Nov 2016

🎉2

Thank you very much, I am currently using Theano, but I can use your code as basis for mine.
I didn't understand why the need of the tf.extract_image_patches, couldn't I just take the means and variances of the true and predicted imgs on the batch?

bernardohenz on 6 Nov 2016

Yeah but it would be less precise. The mean of a 80x80 image is a lot less meaningfull than the mean of small 5x5 images. Theano has something like findNeighbors that replace this operation.

Dref360 on 7 Nov 2016

Thanks @Dref360 for the code sample and the point towards the comparable Theano function.

Have you seen this as well, or do you have any thoughts on this?

from theano import tensor as T # for NNET module
import keras.backend as K

def loss_DSSIM_theano(y_true, y_pred):

    # expected net output is of shape [batch_size, row, col, image_channels]
    # e.g. [10, 480, 640, 3] for a batch of 10 640x480 RGB images
    # We need to shuffle this to [Batch_size, image_channels, row, col]
    y_true = y_true.dimshuffle([0, 3, 1, 2])
    y_pred = y_pred.dimshuffle([0, 3, 1, 2])


    # There are additional parameters for this function
    # Note: some of the 'modes' for edge behavior do not yet have a gradient definition in the Theano tree
    #   and cannot be used for learning
    patches_true = T.nnet.neighbours.images2neibs(y_true, [4, 4])
    patches_pred = T.nnet.neighbours.images2neibs(y_pred, [4, 4])

    u_true = K.mean(patches_true, axis=-1)
    u_pred = K.mean(patches_pred, axis=-1)
    var_true = K.var(patches_true, axis=-1)
    var_pred = K.var(patches_pred, axis=-1)
    std_true = K.sqrt(var_true)
    std_pred = K.sqrt(var_pred)
    c1 = 0.01 ** 2
    c2 = 0.03 ** 2
    ssim = (2 * u_true * u_pred + c1) * (2 * std_pred * std_true + c2)
    denom = (u_true ** 2 + u_pred ** 2 + c1) * (var_pred + var_true + c2)

    ssim /= K.clip(denom, K.epsilon(), np.inf)
    #ssim = tf.select(tf.is_nan(ssim), K.zeros_like(ssim), ssim)

    return K.mean((1.0 - ssim) / 2.0)

patyork on 25 Nov 2016

👍3

Your imp is so much nicer than mine :+1:
I had the issue with the loss not moving at all and getting stuck in a local minima. I fixed it by adding a simple weighted (y-yi)^2. It solved my problem. Now with your problem for the exploding value, it's weird because DSSIM should be between 0 and 0.5. Maybe the output from images2neibs is not the same as the tf function...

Dref360 on 26 Nov 2016

I meant that I had to explode the loss (in the same way you did, with an additional factor) to get movement - not that the loss exploded on it's own, as it is indeed locked between [0, .5].

I've started thinking that the DSSIM isn't great for the issue I'm trying to solve which would explain the lack of optimization, but I think this implementation should be correct.

patyork on 26 Nov 2016

@fchollet Do you have any requirements for adding loss functions? In other words, are you wanting to stick with the loss functions you have so far in Keras, with no additions?; or is there a chance to add something like this, where SSIM (DSSIM loss) is pretty heavily used in image comparison, moreso than MSE pixel differences for many applications?

patyork on 15 Dec 2016

Relevant article
https://arxiv.org/pdf/1511.08861v2.pdf

Dref360 on 15 Dec 2016

@Dref360 Thanks for the link, very interesting article.
Currently I am working on other project, but I still wanna try some SSIM-based metrics on training (somewhat similar to what the authors reported in the article).

Btw, I was wondering, how difficult would it be to use a combination of different loss functions (similar to what is done in the article), for example:

def loss_mix(y_true, y_pred):
    return alpha*mean_absolute_error(y_true, y_pred) + beta* mean_squared_error(y_true, y_pred) + gamma * loss_DSSIM(y_true,y_pred)

just combining three losses and using different weights for each one?!

bernardohenz on 16 Dec 2016

That's what I did, (DSSIM + L1) / 2 if you want your weight to be user define, you could create a functor.

class superLoss():
def __init__(self,a,b,c):
....
def __call__(self,y_true,y_pred):
.....

Dref360 on 16 Dec 2016

@patyork I've tried the exact code you've shared, but I am always getting 'nan' when computing the loss.
Maybe some of the functions have changed in recent versions of Theano?

bernardohenz on 2 Feb 2017

The patches_* variables probably need to be reshaped to (.., 4, 4) in this case; I think it's currently (..., 16) which will probably mess up the variance calculations. Hopefully the supporting functions for this will be merged into the Keras backend soon, and this loss will be available for both backends via the contrib library.

patyork on 2 Feb 2017

@patyork Thanks for the code! Would like to add one little thing - you can reference nnet directly from K like this:

K.T.nnet.neighbours.images2neibs

Therefore you don't need this extra import:

from theano import tensor as T

ogurets on 4 Feb 2017

@ogurets That is true, however I prefer not to use K functions that are not available in both Theano and TensorFlow. The import makes it explicit that this is Theano only, whereas K.T appears to be backend independent if you don't know that T.nnet... is a part of Theano.

As I said, hopefully the "neighbors" code in #5248 is merged and this can become backend independent.

patyork on 4 Feb 2017

@patyork There was not an error on the shapes, I don't know why, but the K.sqrt was returning 'nan's. I've tried to take the abs of the var, and it didn't work. The solution I've found was adding an eps before taking the sqrt. The code I am currently using is:

def loss_DSSIM_theano(y_true, y_pred):
    # There are additional parameters for this function
    # Note: some of the 'modes' for edge behavior do not yet have a gradient definition in the Theano tree
    #   and cannot be used for learning

    patches_true = T.nnet.neighbours.images2neibs(y_true, [4,4])
    patches_pred = T.nnet.neighbours.images2neibs(y_pred, [4,4])
    u_true = K.mean(patches_true, axis=-1)
    u_pred = K.mean(patches_pred, axis=-1)
    var_true = K.var(patches_true, axis=-1)
    var_pred = K.var(patches_pred, axis=-1)
    eps = 1e-9
    std_true = K.sqrt(var_true+eps)
    std_pred = K.sqrt(var_pred+eps)
    c1 = 0.01 ** 2
    c2 = 0.03 ** 2
    ssim = (2 * u_true * u_pred + c1) * (2 * std_pred * std_true + c2)
    denom = (u_true ** 2 + u_pred ** 2 + c1) * (var_pred + var_true + c2)
    ssim /= denom #no need for clipping, c1 and c2 make the denom non-zero
    return K.mean((1.0 - ssim) / 2.0)

bernardohenz on 7 Feb 2017

That's pretty weird. Don't really see how the root could cause nan/inf's in this case.

Note: you can use K.epsilon() for consistency (there's already an epsilon defined in keras).

patyork on 7 Feb 2017

I've had this issue with sqrt, I believe the issue is the gradient at zero. Adding K.epsilon() is the best solution I think.

the-moliver on 7 Feb 2017

That's an issue I would say. I think the issue here though is that the variance might become a very small negative number in float32 representation where it should be zero. sqrt(-n) pops out a NaN. Adding an epsilon pushes it back into the positive domain (given that the epsilon is large enough 1e-10 or so).

Good catch, anyways. This is planned to go into the contrib repo once the PR adding a backend-agnostic "extract_patches" method is merged.

Edit: actually, if the nans happen when/after training, it probably is the gradient. I had assumed the nans were occurring in general, before any training. Regardless, adding epsilon is the correct solution.

patyork on 7 Feb 2017

SSIM has been merged to keras-contrib. You may close your issue

Dref360 on 22 Mar 2017

Hello, I want achieve complex wavelet SSIM(CW-SSIM) using keras, anyone has idea to help me?

kobeee on 4 Apr 2017

can any one has the MS-SSIM implementation in keras?

generallc on 29 May 2017

@patyork, sigma_xy is the covariance of x and y, and it is not equal to sigma_x multiplied by sigma_y.
The equation for covariance is Cov(x, y)=E(xy)-E(x)E(y). So the implementation should be:

`
from theano import tensor as T # for NNET module
import keras.backend as K
def loss_DSSIM_theano(y_true, y_pred):

# expected net output is of shape [batch_size, row, col, image_channels]
# e.g. [10, 480, 640, 3] for a batch of 10 640x480 RGB images
# We need to shuffle this to [Batch_size, image_channels, row, col]
y_true = y_true.dimshuffle([0, 3, 1, 2])
y_pred = y_pred.dimshuffle([0, 3, 1, 2])


# There are additional parameters for this function
# Note: some of the 'modes' for edge behavior do not yet have a gradient definition in the Theano tree
#   and cannot be used for learning
patches_true = T.nnet.neighbours.images2neibs(y_true, [4, 4])
patches_pred = T.nnet.neighbours.images2neibs(y_pred, [4, 4])

u_true = K.mean(patches_true, axis=-1)
u_pred = K.mean(patches_pred, axis=-1)
var_true = K.var(patches_true, axis=-1)
var_pred = K.var(patches_pred, axis=-1)
covar_true_pred = K.mean(y_true*y_pred, axis=-1) - u_true*u_pred

c1 = 0.01 ** 2
c2 = 0.03 ** 2
ssim = (2 * u_true * u_pred + c1) * (2 * covar_true_pred + c2)
denom = (u_true ** 2 + u_pred ** 2 + c1) * (var_pred + var_true + c2)

ssim /= K.clip(denom, K.epsilon(), np.inf)
#ssim = tf.select(tf.is_nan(ssim), K.zeros_like(ssim), ssim)

return K.mean((1.0 - ssim) / 2.0)

temperatemarine on 8 Jun 2017

@temperatemarine good catch, but I believe the covariance should be on patches, like the means and variances:

covar_true_pred = K.mean(patches_true*patches_pred, axis=-1) - u_true*u_pred

the-moliver on 8 Jun 2017

Yes, covariance should be consistent with means and variances.

temperatemarine on 8 Jun 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 6 Sep 2017

@bernardohenz, I am just wondering for using SSIM as a loss layer, we should not define it's gradient manually?

fqassemi on 19 Oct 2017

@fqassemi as long as you use basic operators (+ - * / ) and/or backend operators, the backend (Theano of TF) computes the gradient by itself.
Also, as indicated by @Dref360 , SSIM is implemented on keras-contrib (https://github.com/farizrahman4u/keras-contrib/blob/master/keras_contrib/losses/dssim.py)

bernardohenz on 19 Oct 2017

I am trying to use dssim, however, what is the right shape of y_pred or y_true? For the start, I just want to try this dssim between two input images.

fqassemi on 20 Oct 2017

@fqassemi There is a sample code comparing DSSIM.

import numpy as np
from keras_contrib.losses import DSSIMObjective
from keras import backend as K

#Shape should be (batch,x,y,channels)
imga = np.random.normal(size=(1,256,256,3))
imgb = np.random.normal(size=(1,256,256,3))

loss_func = DSSIMObjective()

resulting_loss1 = K.eval(loss_func(K.variable(imga),K.variable(imgb)))
resulting_loss2 = K.eval(loss_func(K.variable(imga),K.variable(imga)))

print ("Loss for different images: %.2f" % resulting_loss1)
print ("Loss for same image: %.2f" % resulting_loss2)

It is quite tricky because the function is implemented by the backend, so you must evaluate it.

bernardohenz on 20 Oct 2017

@bernardohenz Thanks. I forgot about eval part! BTW, I think it should be also mentioned shape should be (b, w, h, cc). Additionally, @ line 39, y_pred should be replaced with y_true in "self.__int_shape(y_pred)[1:])". (it probably does not change anything, however, it is more consistent)

fqassemi on 21 Oct 2017

I am going to combine dssim and some other function in loss function. To see the values and effect of each components, I define each one as metric as well. But it leads to compute each one twice, once in loss function and another in metric function. Is there a way to compute each component once and use the values of them as metric and in loss function?
Thanks

sjsy on 21 May 2018

@sjsy I've never written a metric function, but I think they are handle like two different things. In other words, you will compute the loss function in order to backpropagate the gradients, and you'll compute the metric function in order to show to the user the desired metric.

I don't think you can compute them only once (unless you change a bit of the workflow of keras).

Obs: If your 'other function' is quite heavy, I suggest you to remove it from metrics, you do not need to compute it on each interation, you can compute it on the test phase.

bernardohenz on 4 Jun 2018

@bernardohenz Thanks. Computing metrics in test phase is a good idea sometimes.
Another solution is using multi-output model, as suggested in this link.

sjsy on 13 Jun 2018

Hi, everyone, I'm a beginner at Deep learning and I have to use the "SSIM Loss Function", Could you please help me? I use Keras

alirezaIzd on 7 Jan 2019

Is it possible to use this implementation with tensorflow code?
Thanks.

pity2003 on 3 Apr 2019

I am also trying to customise loss function, for that I need SSIM and MS-SSIM as objective loss function.
Kindly send me the code

DIPTIMISHRA on 16 Jun 2019

@DIPTIMISHRA I suggest you to take a look in the DSSIM implementation ([DSSIM])(https://github.com/keras-team/keras-contrib/blob/master/keras_contrib/losses/dssim.py). This implementation is up to date and works on current versions of keras.

bernardohenz on 16 Jun 2019

@bernardohenz but this implementation uses K.reshape(y_true, [-1] + list(self.__int_shape(y_pred)[1:])) which prevent any use with a "None" size (for example, in order to use your convnet on any image size). So, can't use it sadly. If someone could fix it, that would be really nice :smile: (I'm too lazy to do it myself).

aviallon on 27 Aug 2019

@aviallon actually, I don't know why this line is needed. You could try to comment them and reinstall the package.

bernardohenz on 27 Aug 2019

Yeah. I actually ended up taking the answers from this issue and commented
out the first four lines, since I use Tensorflow with ROCm (and it sure
rocks 👌)
Thanks. I think I'll try to modify it, and then make a nice patch for
others.

aviallon on 27 Aug 2019

Why not use tensorflow's ssim function?