Keras: No effect of pretraining using autoencoder

Created on 12 Feb 2016  路  16Comments  路  Source: keras-team/keras

Hi, I may have doing something wrong here, but I am trying to follow steps in documentation.

Consider this code:

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)

Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

topology = [len(X_train[0]), 200, 100, len(Y_train[0])]
batch_size = 128
nb_pretraining_epochs = 00
nb_epoch = 2



autoencoder_0 = Sequential()

encoder_0 = Sequential([Dense(output_dim = 200, input_dim = 784, activation = "softplus")])
decoder_0 = Sequential([Dense(output_dim = 784, input_dim = 200)])

autoencoder_0.add(AutoEncoder(encoder = encoder_0, decoder = decoder_0, output_reconstruction = True))
autoencoder_0.compile(optimizer = Adam(lr = 0.001), loss = 'mse')
autoencoder_0.fit(X_train, X_train, batch_size = batch_size, nb_epoch = nb_pretraining_epochs)

temp_0 = Sequential()
temp_0.add(encoder_0)
temp_0.compile(loss = 'mse', optimizer = Adam(lr = 0.001))

X_train_1 = temp_0.predict(X_train)

autoencoder_1 = Sequential()
encoder_1 = Sequential([Dense(input_dim = 200, output_dim = 100, activation = "softplus")])
decoder_1 = Sequential([Dense(input_dim = 100, output_dim = 200)])

autoencoder_1.add(AutoEncoder(encoder = encoder_1, decoder = decoder_1, output_reconstruction = True))
autoencoder_1.compile(optimizer = Adam(0.001), loss = 'mse')
autoencoder_1.fit(X_train_1, X_train_1, batch_size = batch_size, nb_epoch =  nb_pretraining_epochs)

model = Sequential()
model.add(encoder_0)
model.add(encoder_1)
model.add(Dense(input_dim = 100, output_dim = 10, init = 'zero', activation = 'softmax'))


adam = Adam(lr = 0.001)
model.compile(optimizer = adam, loss = "categorical_crossentropy")

model.fit(X = X_train,
          y = Y_train,
          batch_size = batch_size,
          nb_epoch = nb_epoch,
          show_accuracy = True,
          verbose = 1,
          validation_data = (X_test, Y_test),
          shuffle = True)

now, no matter how I setup nb_pretraining_epochs, the accuracies are the same after 2 fine-tuning epochs. What am I doing wrong? Do I pass weights from encoders correctly?

Most helpful comment

Why did you expect pre-training to help at all?

All 16 comments

Why did you expect pre-training to help at all?

I was checking some earlier issues and I've found out that you are not fan of pretraining.

You don't like greedy layer-wise approach? how do you train deep models then?

@snurkabill have you verified that the autoencoder actually learns something useful? I.e. does the loss drop accordingly during training? You could also try to plot the "receptive fields" of the hidden units and verify that it looks somehow like the weights here. (Or more like "strokes" in case of the MNIST set).
However, if you later want to train a full scale CNN on some real _image_ dataset, just using a pretrained VGGnet/GoogLeNet and finetune that might end up with a better result.

@stes I know how autoencoders works and I can model them. Also I was able to larn many layers with them and then tine-tune the whole machine - but not in keras. here, no matter how many pretraining epochs I run, two fine-tuning epochs end up the same. It's weird. Not sure If I am doing something wrong or not.

I suspect that somewhere in the course of adding your encoder to a new model, build gets called, which re-initializes the weights. If #1721 gets accepted, this will definitely be the case (it was somewhat inconsistent before). You might try extracting the weights from the trained encoder, then restoring them after you've added the encoder to your next model.

I had some time today so I've dig deeper and found error:

autoencoder_0 = Sequential()
encoder_0 = Dense(output_dim = 10, input_dim = 784, activation = "sigmoid")
decoder_0 = Dense(output_dim = 784, input_dim = 10)
print "initialized first_encoder: "
print encoder_0.get_weights()[0]
print "----------------------------------------------------"
print encoder_0.get_weights()[1]
print "\n\n\n"
autoencoder_0.add(encoder_0)
autoencoder_0.add(decoder_0)
autoencoder_0.compile(optimizer = Adam(lr = 0.001), loss = 'mse')
autoencoder_0.fit(X_train, X_train, batch_size = batch_size, nb_epoch = nb_pretraining_epochs)
print "trained first_encoder: "
print autoencoder_0.get_weights()[0]
print "----------------------------------------------------"
print autoencoder_0.get_weights()[1]

line:
print encoder_0.get_weights()[0]
returns same values as line:
print autoencoder_0.get_weights()[0]

weird, that
print autoencoder_0.get_weights()[1] (biases) works.

@fchollet , this is serious issue. I can't find where the bug is.

@snurkabill
You should set

encoder_0.build = lambda: None
encoder_1.build = lambda: None

after autoencoder training.
Otherwise when you call add, mentioned by @neggert , the weights are re-initialized.

by problem is not add method. I have reproducer for this. It clearly shows that very first weights are not updated.

from itertools import izip
import numpy as np

from keras.datasets import mnist
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import Adam
from keras.utils import np_utils

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)

Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

batch_size = 128
nb_pretraining_epochs = 1
nb_epoch = 1

autoencoder_0 = Sequential()
encoder_0 = Dense(output_dim = 10, input_dim = 784, activation = "sigmoid")
decoder_0 = Dense(output_dim = 784, input_dim = 10)
print "initialized first_encoder: "
print encoder_0.get_weights()[0]
print "----------------------------------------------------"
print encoder_0.get_weights()[1]
print "\n\n\n"
autoencoder_0.add(encoder_0)
autoencoder_0.add(decoder_0)
autoencoder_0.compile(optimizer = Adam(lr = 0.001), loss = 'mse')
print "initialized: "
weights = autoencoder_0.get_weights()
print weights[0]
print "----------------------------------------------------"
print weights[1]
print "----------------------------------------------------"
print weights[2]
print "----------------------------------------------------"
print weights[3]
print "----------------------------------------------------"
print "\n\n\n"

autoencoder_0.fit(X_train, X_train, batch_size = batch_size, nb_epoch = nb_pretraining_epochs)
weights = autoencoder_0.get_weights()
print weights[0]
print "----------------------------------------------------"
print weights[1]
print "----------------------------------------------------"
print weights[2]
print "----------------------------------------------------"
print weights[3]
print "----------------------------------------------------"
print "\n\n\n"

topology: 784 - 10 - 784
weights[0] are input weights, weights[1] are hidden biases,
weights[2] are output weights and weights[3] are output biases.

you can clearly see, that weights[1] and weights[3] are at start zeros (zero biases) and after there are some different numbers. the same can be said about weights[2]. Weights[0] stays the same for the whole time.

problem stands even for very simple example of MNIST:

autoencoder_0 = Sequential()
autoencoder_0.add(Dense(10, init = 'zero', input_shape = (784, )))
autoencoder_0.add(Activation('softmax'))
autoencoder_0.compile(optimizer = Adam(lr = 0.001), loss = 'categorical_crossentropy')
print "initialized: "
weights = autoencoder_0.get_weights()
print weights[0]
print "----------------------------------------------------"
print weights[1]
print "----------------------------------------------------"
print "\n\n\n"

autoencoder_0.fit(X = X_train,
                  y = Y_train,
                  validation_data = (X_test, Y_test),
                  show_accuracy = True,
                  batch_size = batch_size,
                  nb_epoch = nb_pretraining_epochs)
weights = autoencoder_0.get_weights()
print weights[0]
print "----------------------------------------------------"
print weights[1]
print "----------------------------------------------------"
print "\n\n\n"

Output:

initialized: 
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
----------------------------------------------------
[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
----------------------------------------------------




Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 0s - loss: 0.6616 - acc: 0.8541 - val_loss: 0.3786 - val_acc: 0.9035
Epoch 2/5
60000/60000 [==============================] - 0s - loss: 0.3590 - acc: 0.9034 - val_loss: 0.3174 - val_acc: 0.9143
Epoch 3/5
60000/60000 [==============================] - 0s - loss: 0.3177 - acc: 0.9123 - val_loss: 0.2971 - val_acc: 0.9158
Epoch 4/5
60000/60000 [==============================] - 0s - loss: 0.2982 - acc: 0.9171 - val_loss: 0.2845 - val_acc: 0.9203
Epoch 5/5
60000/60000 [==============================] - 0s - loss: 0.2867 - acc: 0.9206 - val_loss: 0.2771 - val_acc: 0.9230
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]
----------------------------------------------------
[-0.24177188  0.37566799 -0.00340281 -0.20320819  0.10031621  0.48326358
 -0.05111289  0.28642136 -0.63981533 -0.1139671 ]
----------------------------------------------------

how could I get 92% success rate with zero weights? :) I believe that the first weights are not reloaded from back-end updates (so back-end has right values, but not front-end itself)

You can output the value of weights[0].mean(), weights[0].max(), weights[0].min()

weights[0].mean()
-0.053590346
weights[0].max()
0.73021531
weights[0].min()
-0.71965355

you are right. thanks. Now I understand why printed values hasn't changed.

Hi @snurkabill !

Sorry for coming back to this topic. But I am having a similar problem with my autoencoder that seems to not help at all on the final result. I am doing an unsupervised training step in order to get some weights and then using those weights on my supervised model, but the result seems to be same with and without the pre-training step. Were you able to make it work?

Thanks in advance.
Rafael

Hello @rafaelpossas ,

yes, I was. We can discuss this offline if you wish.

Hi @snurkabill !

I have EXACTLY the same problem described by you and more detailed by @rafaelpossas. Can you give me any idea how to fix this?

Thanks!

i not speek english but you are allowed to do watever you want to help me my phone number 554899809-2063

@mayaramorais @gabrielam2018 Hello there, the problem was in printing. The prettyPrint prints only first 5 or something columns of first 5 or something rows (considering MNIST instance), but on such pixels are only white pixels. so no derivation on such weights is needed. That was the problem of mine.

Was this page helpful?
0 / 5 - 0 ratings