Keras: Variational Autoencoder example not working correctly

Created on 1 Aug 2016  路  6Comments  路  Source: keras-team/keras

I am having issues with trying to get the sample variational autoencoder working. Running the script verbatim from the site but keep getting results that look like all locations from the latent space are producing the same output and the distribution of test images over the latent space is no where near as spread out. Running on the latest version of Theano and Keras.

generated
latent_space

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • [x] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • [x] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • [x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
stale

Most helpful comment

I believe this is broken when the commit f6bcaffe is brought in, which makes the reconstruct loss extensively small by removing the multiplication of original_dim in Line 42. The problem should be fixed by bringing the multiplication back as

xent_loss = original_dim * objectives.binary_crossentropy(x, x_decoded_mean)

or updating the whole calculation as

xent_loss = K.sum(K.binary_crossentropy(x_decoded_mean, x), axis=-1)

I feel like the current implementation has a redundancy by taking the mean of the loss and multiplying it by the number of dimensions again? Why not just taking the sum of the whole loss?

Thanks,

All 6 comments

This behaviour also struct me for a while. Since I found that the standard autoencoder with linear encoder works (meaning that the 'recognition model' in VAE becomes Gaussian with zero std), my suspicion went to the gaussian sampling strategy.

So I changed the code line generating the epsilon by decreasing the standard deviation as follows
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., std=0.1)

and it makes the VAE behaves correctly in this particular case -- it might need different value for other network parameters, learning settings, and / or datasets.

Oh, you would also need to rescale the z_sample when generating the digit samples:
z_sample = np.array([[xi, yi]]) * 0.1

I believe this is broken when the commit f6bcaffe is brought in, which makes the reconstruct loss extensively small by removing the multiplication of original_dim in Line 42. The problem should be fixed by bringing the multiplication back as

xent_loss = original_dim * objectives.binary_crossentropy(x, x_decoded_mean)

or updating the whole calculation as

xent_loss = K.sum(K.binary_crossentropy(x_decoded_mean, x), axis=-1)

I feel like the current implementation has a redundancy by taking the mean of the loss and multiplying it by the number of dimensions again? Why not just taking the sum of the whole loss?

Thanks,

Thanks @tushuhei for the better answer. Didn't realise that before. I've reverted back to the default std and used your suggested loss function. It works like charm !

Bug can be closed then.

@stuartlynn @ghif In line 69 of the example, shouldn't the epsilon (noise) be set to zero for plotting test examples into the latent space? (Or plot them many times to show the spread.)

This behaviour also struct me for a while. Since I found that the standard autoencoder with linear encoder works (meaning that the 'recognition model' in VAE becomes Gaussian with zero std), my suspicion went to the gaussian sampling strategy.

So I changed the code line generating the epsilon by decreasing the standard deviation as follows
epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., std=0.1)

and it makes the VAE behaves correctly in this particular case -- it might need different value for other network parameters, learning settings, and / or datasets.

Oh, you would also need to rescale the z_sample when generating the digit samples:
z_sample = np.array([[xi, yi]]) * 0.1

Thank you! I meet the same problem in my own project, and this method really helps!!!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oweingrod picture oweingrod  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments

fredtcaroli picture fredtcaroli  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

kylemcdonald picture kylemcdonald  路  3Comments