Keras: Multiple outputs, same hidden structure

Created on 21 Dec 2015 · 6Comments · Source: keras-team/keras

Hello people,

I have been working with a regression problem with LSTMs in which I predict a vector of output coefficients from an input sequence. What I want to do is to build multiple output layers on top of the same hidden structure, so that I will be training the different layers depending on the input sequence (inputs always have the same dimension). To sum up, when I backprop one output layer all other output layers will remain static. Is there any way to freeze the N-1 output layers and only fit a batch of data into the Nth layer, iterating the process for many batches and the N layers? Can I have any clue please?

Thank you

Source

santi-pdp

Most helpful comment

Hi,
EDIT: Depending on what exactly you are trying to achieve, the Graph() model could be the preferred way to go:

# The graph model is capable of handling multiple inputs and outputs:
model = Graph()
model.add_input("sequence_input", input_shape=(sequence_length, element_dimensionality))

# add all the layers that are shared across your N models
# for readability I add them into the Sequential() container first
shared_model = containers.Sequential()
shared_model.add(LSTM(output_dim=100, input_dim=element_dimensionality, return_sequences=False))
shared_model.add(Dropout(0.5))
# other shared layers ...

# add these layers to the graph model and connect them to the sequence input
model.add_node(shared_model, name="shared_layers", input="sequence_input")

# now add your output layers
model.add_node(Dense(10, activation="softmax"), name="output1", input="shared_layers", create_output=True)
model.add_node(Dense(2, activation="linear"), name="output2", input="shared_layers", create_output=True)
# ...

# compile the model, potentially with different loss functions per output
model.compile("rmsprop", {"output1": "categorical_crossentropy", "output2": "mse"})

# and train them all at once:
for X_batch, Y1_batch, Y2_batch in your_data:
    model.train_on_batch({"sequence_input": X_batch, "output1": Y1_batch, "output2": Y2_batch})

Optionally, my previous answer:
as far as I know, you could build _N_ different models that share the same layers up until your output layers, e.g. like:

Note: I did not run the code but I think something along these lines is possible with keras (at least with the Graph() model).

# compose a "submodel" that contains all the layers that are shared across your N models
shared_model = Sequential()
# add all layers up until you last layers here, like so:
shared_model.add(LSTM(output_dim=100, input_dim=20, return_sequences=False))
shared_model.add(Dropout(0.5))
# other shared layers ...

# now combine the shared "submodel" with the individual output
# layers by creating separate "final" models:
model1 = Sequential()
model1.add(shared_model)
model1.add(Dense(10, activation="softmax"))

model2 = Sequential()
model2.add(shared_model)
model2.add(Dense(2, activation="linear"))
# ...

# build your N models
model1.compile("rmsprop", "categorical_crossentropy")
model2.compile("rmsprop", "mse")
# ...

# and train them alternatingly:
for X_batch, Y1_batch, Y2_batch, ... in your_data:
    model1.train_on_batch(X_batch, Y1_batch)
    model2.train_on_batch(X_batch, Y2_batch)
    # ...

Since the first layers are the same in all models, their weights are essentially shared across these models.
This alternative allows you to train the individual layers independently, i.e. with different amounts of training batches and in an order of your choice.

Is this what you were looking for?

sjebbara on 22 Dec 2015

👍6

All 6 comments

Hi,
EDIT: Depending on what exactly you are trying to achieve, the Graph() model could be the preferred way to go:

# The graph model is capable of handling multiple inputs and outputs:
model = Graph()
model.add_input("sequence_input", input_shape=(sequence_length, element_dimensionality))

# add all the layers that are shared across your N models
# for readability I add them into the Sequential() container first
shared_model = containers.Sequential()
shared_model.add(LSTM(output_dim=100, input_dim=element_dimensionality, return_sequences=False))
shared_model.add(Dropout(0.5))
# other shared layers ...

# add these layers to the graph model and connect them to the sequence input
model.add_node(shared_model, name="shared_layers", input="sequence_input")

# now add your output layers
model.add_node(Dense(10, activation="softmax"), name="output1", input="shared_layers", create_output=True)
model.add_node(Dense(2, activation="linear"), name="output2", input="shared_layers", create_output=True)
# ...

# compile the model, potentially with different loss functions per output
model.compile("rmsprop", {"output1": "categorical_crossentropy", "output2": "mse"})

# and train them all at once:
for X_batch, Y1_batch, Y2_batch in your_data:
    model.train_on_batch({"sequence_input": X_batch, "output1": Y1_batch, "output2": Y2_batch})

Optionally, my previous answer:
as far as I know, you could build _N_ different models that share the same layers up until your output layers, e.g. like:

Note: I did not run the code but I think something along these lines is possible with keras (at least with the Graph() model).

# compose a "submodel" that contains all the layers that are shared across your N models
shared_model = Sequential()
# add all layers up until you last layers here, like so:
shared_model.add(LSTM(output_dim=100, input_dim=20, return_sequences=False))
shared_model.add(Dropout(0.5))
# other shared layers ...

# now combine the shared "submodel" with the individual output
# layers by creating separate "final" models:
model1 = Sequential()
model1.add(shared_model)
model1.add(Dense(10, activation="softmax"))

model2 = Sequential()
model2.add(shared_model)
model2.add(Dense(2, activation="linear"))
# ...

# build your N models
model1.compile("rmsprop", "categorical_crossentropy")
model2.compile("rmsprop", "mse")
# ...

# and train them alternatingly:
for X_batch, Y1_batch, Y2_batch, ... in your_data:
    model1.train_on_batch(X_batch, Y1_batch)
    model2.train_on_batch(X_batch, Y2_batch)
    # ...

Is this what you were looking for?

sjebbara on 22 Dec 2015

👍6

Well the second example is how I was thinking to proceed, but from the first you've put I have a doubt: when you specify multiple outputs in the graph and you feed the model with the dictionary of inputs/outputs, are the outputs trained separately? (thus freezing the other outputs) or are they all back-propagated at once, computing the weights for all outputs? I have used the Graph previously to have probabilistic outputs and regression outputs with different objective functions, but not to train different output layers separately.
Thanks a lot!

santi-pdp on 23 Dec 2015

Hi, @santi-pdp I have the exact the same question as you do. Which model (sequential or graph) did you finally pick? Do you have a good solution of freezing the the rest of output and only back-propagate 1 output? Thank you!

snakeztc on 14 Mar 2016

👍1

Here is how I did it. Is this correct?

    # hidden layer
    out = Dense(128, activation='relu', name='fcc')(out)

    # regression output
    out_reg = Dense(1, name='out_reg')(out)

    # classification output
    out_cls = Dense(n_cls, activation='softmax', name='out_cls')(out)

    # model trained on both outputs
    model = Model(input=inp, output=[out_reg, out_cls])
    model.compile(loss=['mean_squared_error', 'categorical_crossentropy'], optimizer='adam', metrics=['accuracy'])

    # regression model
    model_reg = Model(input=inp, output=out_reg)
    model_reg.compile(loss='mean_squared_error', optimizer='adam')

    # classification model
    model_cls = Model(input=inp, output=out_cls)    
    model_cls.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])