Keras: maximum likelihood as an objective function

Created on 8 Mar 2017 · 7Comments · Source: keras-team/keras

In the documentation, the objective functions below are given. I wonder if Keras has maximum likelihood as an objective function. If not what is the easiest way to implement our own objective function?

mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, mean_squared_logarithmic_error, squared_hinge, hinge, binary_crossentropy, categorical_crossentropy, sparse_categorical_crossentropy, kullback_leibler_divergence, poisson, cosine_proximity

Thanks.

stale

Source

bsafacicek

👍3

Most helpful comment

I disagree with @CorySimon; indeed, is evident that you can implement a maximum likelihood objective function for any important distribution (e.g. gaussian distribution).
Notice that the MSE is not the likelihood function of a gaussian distribution: it is the maximum likelihood distribution of a fixed variance, diagonal gaussian.
In many cases, you want to learn a distribution (e.g. gaussian mlp, variational autoencoders) and you would like to have an objective function that has a single target (your data) and multiple outputs, e.g. in the case of a gaussian MLP mean and variance.
I don't think this is straightforward in keras, right now, if i'm not wrong. If I'm, please suggest a way to implement this in a clean way.
As far as I know, keras needs that targets and model outputs have the same dimensionality.
The only way I managed to solve this issue is either select the mean as output of the network and pass as class member the tensor representing the standard deviation, or to add a sampling layer as output, and pass both mean and standard deviation as class members.
Needless to say that on tensorflow it is easy to do so...

boris-il-forte on 28 Jun 2017

👍4

All 7 comments

Many objective functions built into keras are obtained by maximum likelihood. e.g. the mean squared error loss function is the maximum likelihood under the assumption that the data is normally distributed about its mean with variance the same everywhere. Maximum likelihood estimation yields a different objective function depending on how you assume (i) the mean is related to the feature $x$ and (ii) how the data is distributed about the mean.

That is, there is no such thing as a general "maximum likelihood objective function".

Implement your own loss as e.g.

def custom_loss(y_true, y_pred):
     return tf.square(y_true - y_pred)

and pass while compiling model.compile(loss=custom_loss, ...)

CorySimon on 26 May 2017

boris-il-forte on 28 Jun 2017

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 27 Sep 2017

👎2

https://www.quora.com/What-are-the-differences-between-maximum-likelihood-and-cross-entropy-as-a-loss-function
"for a probability model to be trained on i.i.d training examples, maximize likelihood → minimize negative log likelihood → minimize cross entropy."

liangbright on 19 Apr 2018

Is it even possible in principle with the Keras API? Since all you can give the objective function is y_pred and y_true while for MLE you need to give extra parameters. For example, Normal residual MLE requires y_pred y_true sigma for your current estimate of the standard deviation.

I really want to be able to make my own custom loss functions with distributions. Just imagine the possibilities!!!

arose13 on 26 May 2018

👍1

@arose13 yes it is totally possible, as @liangbright has pointed out. Actually there's already package to do that called edward. Check out examples here: http://edwardlib.org/tutorials/supervised-regression

hesenp on 26 Oct 2018

I guess you can do it like below

from keras.layers import Input, Dense, Layer
from keras.models import Model
import tensorflow_probability as tfp
tfd = tfp.distributions

class NegativeLogLikelihood(Layer):

    ...
    def call(self, inputs):
        y_true = inputs[0]
        y_pred = inputs[1]
        dist = tfd.Normal(loc=y_pred, scale=self.sigma)
        return -1.0 * dist.log_prob(y_true)
    ...

def identity_loss(y_true, loss):
    return loss

input_x = Input(shape=...)
input_y = Input(shape=(1,))
out = Dense(1)(net)
loss = NegativeLogLikelihood(1)([input_y, out])
model = Model(inputs=[input_x, input_y], outputs=[out, loss])
model.compile(loss=identity_loss, loss_weights=[0., 1.], ...)
model.fit([x, y.reshape((-1, 1))], [y, y]...)

Just calculate negative log likelihood in your network, pass it to loss function (does nothing)
If you don't like 2 outputs (out for predict / loss for fit), maybe 2 network (for fit/predict) with shared weight will work?