Keras: Classification using stacked autoencoders

Created on 25 May 2017 · 23Comments · Source: keras-team/keras

hello,

I have been using sklearn but I want to build a classifier using stacked autoencoders to compare the results with my already implemented "classical" deep classifier (3 deep layers with Relu and output layer with sofhtmax...). sklearn does not have much support with autoencoders ...

I saw this https://blog.keras.io/building-autoencoders-in-keras.html link and is well explained.
But something critical is missing: the classification part. They explain how to encode/decode but how to introduce classification into that methodology is not presented.

As far as I know, to use classification with autoencoders we must :

1- pre-train the autoencoder NN - unsupervised (input is the output)
2- slice the autoencoder NN in half on the last encoder layer (before the decode starts - higher abstraction layer )
3- freeze the the weights of the encoders (so pos train does not mess them )
4- add/concat a new NN in front of the last encoding layer (I think a pipeline would be applied here..)
5- (pos) train this concat NN with a softmax output layer for classification ...

How could we do this with keras ? (using "high code" so it can be readable :-) )

One example with MNIST digits (real classification not just encode/decode) would be TOP.

I believe using high abstraction info , extracted after encoding raw inputs, to do classification will improve results (mainly in problems with lots of inputs).

Thanks in advance.

stale

Source

rjpg

Most helpful comment

well ... I have change the reduced.compile(...) to be above model.fit(...), above auto encoder train, and the results are better It seams to learn ...

but if we comment the model.fit(...) - do not train the autoencoder - the results are similar ...

I think when we do model.compile it builds a completely new model and when we do reduced.compile() the encoding layers will not be connected in any way with the model(autoencoder layers).

If anyone can answer this question it would be nice ?

from keras.datasets import mnist 
from keras.models import Model 
from keras.layers import Input, Dense 
from keras.utils import np_utils 
import numpy as np
from tensorflow.python.ops.variables import trainable_variables

num_train = 60000
num_test = 10000

height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32') 
X_test = X_test.astype('float32')

X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

input_img = Input(shape=(height * width,))

x = Dense(height * width, activation='relu')(input_img)

encoded1 = Dense(height * width//2, activation='relu')(x)
encoded2 = Dense(height * width//8, activation='relu')(encoded1)

y = Dense(height * width//256, activation='relu')(encoded2)

decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)

z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)

out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)


model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy

#x.trainable=False
#encoded1.trainable=False
#encoded2.trainable=False
#y.trainable=False
reduced.compile(loss='categorical_crossentropy',
          optimizer='adam', 
          metrics=['accuracy']) 


model.fit(X_train, X_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, X_test))

#for layer in model.layers:
#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)

mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)



reduced.fit(X_train, Y_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, Y_test))

#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)


scores = reduced.evaluate(X_test, Y_test, verbose=1) 
print("Accuracy: ", scores[1])

rjpg on 3 Jan 2018

👍2

All 23 comments

I would be also interested.

lstappen on 17 Jul 2017

Love an example for this?

tmsbn on 4 Aug 2017

I have made this in dl4j. but using keras I would feel more "secure" ...

here it is in dl4j :

https://feupload.fe.up.pt/get/goR1aE5OkrFR5y5

rjpg on 4 Aug 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 2 Nov 2017

bump. Anyone ever did any progress on this? I'm sure a lot of people might be interested.

Regards,
Theodore.

TheodoreGalanos on 1 Jan 2018

Hello,

Here it is one simple generic (dense layers) example with MNIST (for images it is ideal to use Conv2D autoencoders ... ):

https://github.com/rjpg/bftensor/blob/349f0c3aad5e2768e94f4f7b741f5f7ff4328d64/Autoencoder/src/AutoEncoderMNIST.py

Also after the split you may want to set the encoding layers to trainable=false ... if you want to use the encoder in "pure state" not refining in the 2nd training stage ...

rjpg on 3 Jan 2018

Thanks @rjpg I'll take a look.

I am using a 'manual' way right now with actually training the autoencoders one after the other and then bringing weights into a new model. It works but it'd be amazing to automate this a bit. I'd love it if there were plans for a class in keras for this. Not sure if there is value to it for other people though.

Edit: you seem to not be using the encoded layers. Is that by design or should 'encoded' be passed on y?

Kind regards,
Theodore.

TheodoreGalanos on 3 Jan 2018

Yes It seams to have a bug. Thanks for looking in to it .

I will commit the correct version

rjpg on 3 Jan 2018

Hello again,

I have put the supposed right line on the midle of the encoder :

y = Dense(height * width//256, activation='relu')(encoded)

but doing this the accuracy drops a lot ... 0.11 (??)

Maybe I am doing something wrong (?)

Or autoencoders for classification are not that good (?????? probably I am doing something wrong on this script ... )

rjpg on 3 Jan 2018

well ... I have change the reduced.compile(...) to be above model.fit(...), above auto encoder train, and the results are better It seams to learn ...

but if we comment the model.fit(...) - do not train the autoencoder - the results are similar ...

I think when we do model.compile it builds a completely new model and when we do reduced.compile() the encoding layers will not be connected in any way with the model(autoencoder layers).

If anyone can answer this question it would be nice ?

from keras.datasets import mnist 
from keras.models import Model 
from keras.layers import Input, Dense 
from keras.utils import np_utils 
import numpy as np
from tensorflow.python.ops.variables import trainable_variables

num_train = 60000
num_test = 10000

height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32') 
X_test = X_test.astype('float32')

X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

input_img = Input(shape=(height * width,))

x = Dense(height * width, activation='relu')(input_img)

encoded1 = Dense(height * width//2, activation='relu')(x)
encoded2 = Dense(height * width//8, activation='relu')(encoded1)

y = Dense(height * width//256, activation='relu')(encoded2)

decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)

z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)

out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)


model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy

#x.trainable=False
#encoded1.trainable=False
#encoded2.trainable=False
#y.trainable=False
reduced.compile(loss='categorical_crossentropy',
          optimizer='adam', 
          metrics=['accuracy']) 


model.fit(X_train, X_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, X_test))

#for layer in model.layers:
#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)

mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)



reduced.fit(X_train, Y_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, Y_test))

#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)


scores = reduced.evaluate(X_test, Y_test, verbose=1) 
print("Accuracy: ", scores[1])

rjpg on 3 Jan 2018

👍2

hello,

Here is the new correct version :

https://github.com/rjpg/bftensor/blob/master/Autoencoder/src/AutoEncoderMNIST.py

we just need to be sure the encoder = Model(input_img, y) definition is before autoencoder.compile()

Reach Accuracy: 0.9747 ... not bad considering this is just a generic example using Dense instead of Conv2D, without dropouts and without reduce learning rate schedule..

rjpg on 3 Jan 2018

Hi @rjpg thanks for looking into this! I will give it a try on my dataset. It looks much better than how I'm using the functional API so far. As for accuracy, I haven't been able to get much so far but my images are quite large (still uncertain how much one can resize and how the input size affects layer sizes).

Kind regards,
Theodore.

TheodoreGalanos on 4 Jan 2018

If you are using images instead of using dense use conv2d autoencoder then for classification do flatten to use dense. You also should compare the results without autoencoder (with direct application of LeNet ... )

rjpg on 4 Jan 2018

Yes that is the second step. I am learning through example and wanted to see how the simple autoencoder can do. Not that good so far, might be my way of modelling or just a limitation for larger images. I tried feeding it features extracted from the images with pretty bad results as well.

I already had a mini run with a CAE, it seemed to reconstruct the image quite nicely so I would think accuracy will be much higher.

TheodoreGalanos on 5 Jan 2018

I've had to deal with this problem so I defined a class to help me do it. It saves the encoding layers' weights between encoders. You can use every layer as a separate model.

import numpy as np
from keras.layers import Input, Dense, GaussianNoise
from keras.models import Model

"""
Import as ->
from define_VAE import VAE
deep_VAE = VAE(args and kwargs)
deep_VAE.fit(training data, bath size, epochs per layer)

After training the last encoder will be in
deep_VAE.encoder
And the decoder in
deep_VAE.decoder
All other encoders will be in the lists
deep_VAE.encoder_models, as compiled models
deep_VAE.encoders as tensors.

Tweak with the noise_stdev parameter if the mean squared error is too high.
Add BatchNormalization or Convolution layers if you need to.
I've commented where it's safe to do so.

"""

class VAE(object):

def __init__(self, 
                layer_sizes=[26, 16, 8, 4], 
                input_shape=32, 
                noise_stdev=0.1,
                batch_size=1024,
                optimizer='rmsprop', activation='sigmoid'):

    self.layer_sizes = [input_shape] + layer_sizes
    self.encoders = []; self.decoders = []; self.models = []
    self.encoder_models = []; self.decoder_memory = {}
    self._deflayers(input_shape, noise_stdev,
                    opt=optimizer, act=activation)


def _deflayers(self, input_shape, 
                noise_stdev, opt, act):
    """
    Defines and compiles models, 
    this function is called when the class is instantiated.
    The part of this function before the loop defines the Input and adds noise.
        If you need to handle the input differently add it here.
    In the loop an encoder is a

    """

    input_layer = Input(shape=(input_shape,))
    # If you're image processing, add a convolution layer here.
    self.encoders.append(
                GaussianNoise(noise_stdev)(input_layer)
        )
    for i, ls in enumerate(self.layer_sizes[1:], 1):

        added_layer = Dense(ls, activation=act, kernel_initializer='random_uniform',
                        bias_initializer='zeros', name='encoder_N{}'.format(i))(self.encoders[-1])
        # Add batch normalization/custom layers/activations for the encoder here
        self.encoders.append(added_layer)
        new_decoder = self._define_decoder(added_layer, i, act=act)

        self.decoders.append(
                    Dense(input_shape, activation='linear', kernel_initializer='random_uniform',
                        bias_initializer='zeros', name='decoder_out')(new_decoder)
                        )
        model = Model(inputs=input_layer, outputs=self.decoders[-1])
        # This is where you can change the optimizer/error/regularzer
        model.compile(optimizer=opt,
                        loss='mean_squared_error',
                        metrics=['accuracy'])
        self.models.append(model)
        self.encoder_models.append(Model(inputs=input_layer, outputs=self.encoders[-1]))
    self.encoder = Model(inputs=input_layer, outputs=self.encoders[-1])
    self.encoder.compile(optimizer=opt,
                        loss='mean_squared_error',
                        metrics=['accuracy'])

    self.decoder = self.models[-1]

    self.input_layer = input_layer

def _define_decoder(self, decoder, i, act):
    """ 
    Takes in the first decoding layer
    and adds layers to it until the last layer.
    for each layer, save it, then:
        compare the current layer, if it is there already -> use it
        if not, save it in a dictionary
    """
    for k in range(1, i):
        dcd = self.decoder_memory.get((self.layer_sizes[i-k], i), False)

        if dcd is False:
            dcd = Dense(self.layer_sizes[i-k], activation=act, kernel_initializer='random_uniform',
                        bias_initializer='zeros', name='decoder_N{}'.format(i-k))
            # Add batch normalization/custom layers/activations for the decoder here
            self.decoder_memory.update({(self.layer_sizes[i-k], i): dcd})

        decoder = dcd(decoder)

    return decoder

def fit(self, X, batch_size=1024, epochs=[60, 60, 60, 100], v=0):
    for model, epoch in zip(self.models, epochs):
        model.fit(X, X, epochs=epoch, shuffle=True,
                    verbose=v, batch_size=batch_size)

def compress(self, X):
    return self.encoder.predict(X)

def evaluate(self, X):
    return self.encoder.evaluate(X, X)

def set_encoder_models(self):
    self.encoder_models = [Model(
                            inputs=self.input_layer,
                            outputs=enc) for enc in self.encoders]

if __name__ == '__main__':

deep_VAE = VAE()

for model in deep_VAE.models:
    print(model.summary())

print([type(model) for model in deep_VAE.models])

SimeonGeorgiev on 17 Apr 2018

👍1

Thanks for sharing Simeon! Will definitely give this a try (maybe right now).

Edit: Sorry for the late question but is this any reason this is called VAE? I don't see variation inference in there, or maybe I'm missing something.

TheodoreGalanos on 17 Apr 2018

Thank you. Tell me if you find it useful so I can clean it up and share it
in a repo.

On 17 April 2018 at 10:01, TheodoreGalanos notifications@github.com wrote:

Thanks for sharing Simeon! Will definitely give this a try (maybe right
now).

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/6758#issuecomment-381909160,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AjrGOGdS9zFyNXxMewUwQgNUBUkO8JrOks5tpa9MgaJpZM4NmbwB
.

SimeonGeorgiev on 17 Apr 2018

No, there is no specific reason to call it a VAE, sorry. I was using an
auto-encoder for pre-training with this code

On 17 April 2018 at 10:01, TheodoreGalanos notifications@github.com wrote:

Thanks for sharing Simeon! Will definitely give this a try (maybe right
now).

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/6758#issuecomment-381909160,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AjrGOGdS9zFyNXxMewUwQgNUBUkO8JrOks5tpa9MgaJpZM4NmbwB
.

SimeonGeorgiev on 19 Apr 2018

I think, I tried to find a similar answer here.
https://stackoverflow.com/questions/49565638/how-to-initialise-weights-of-a-mlp-using-an-autoencoder-2nd-part-deep-autoenc

deeplearner87 on 1 Jun 2018

well ... I have change the reduced.compile(...) to be above model.fit(...), above auto encoder train, and the results are better It seams to learn ...

but if we comment the model.fit(...) - do not train the autoencoder - the results are similar ...

I think when we do model.compile it builds a completely new model and when we do reduced.compile() the encoding layers will not be connected in any way with the model(autoencoder layers).

If anyone can answer this question it would be nice ?

from keras.datasets import mnist 
from keras.models import Model 
from keras.layers import Input, Dense 
from keras.utils import np_utils 
import numpy as np
from tensorflow.python.ops.variables import trainable_variables

num_train = 60000
num_test = 10000

height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32') 
X_test = X_test.astype('float32')

X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

input_img = Input(shape=(height * width,))

x = Dense(height * width, activation='relu')(input_img)

encoded1 = Dense(height * width//2, activation='relu')(x)
encoded2 = Dense(height * width//8, activation='relu')(encoded1)

y = Dense(height * width//256, activation='relu')(encoded2)

decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)

z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)

out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)


model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy

#x.trainable=False
#encoded1.trainable=False
#encoded2.trainable=False
#y.trainable=False
reduced.compile(loss='categorical_crossentropy',
          optimizer='adam', 
          metrics=['accuracy']) 


model.fit(X_train, X_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, X_test))

#for layer in model.layers:
#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)

mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)



reduced.fit(X_train, Y_train,
      epochs=2,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, Y_test))

#weights = model.layers[1].get_weights() # list of numpy arrays
#print(weights)


scores = reduced.evaluate(X_test, Y_test, verbose=1) 
print("Accuracy: ", scores[1])

Its a Stack auto-encoder?