I am still confused about the difference between Dense
and TimeDistributedDense
even though there are already some similar questions asked here and here. People are discussing a lot but no common-agreed conclusions.
And even though, here, @fchollet stated that:
TimeDistributedDense
applies a sameDense
(fully-connected) operation to every timestep of a 3D tensor.
I still need detailed illustration about what exactly the difference between them.
The typical use case of TimeDistributedDense is for processing the output of an Embedding layer or a recurrent layer with return_sequences=True. Then you can transform the hidden representation at each timestep before applying further processing (like pooling or another recurrent layer).
I got an example from here. I am wondering what is the difference if I change the following TimeDistributedDense
into Dense
model = Sequential()
model.add(LSTM(hidden_neurons, input_dim=in_out_neurons, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(in_out_neurons))
model.add(Activation("linear"))
It won't compile, because the dimensions don't match up. Dense expects a 2-dimensional input (batch_size, features), whereas the output of LSTM with return_sequences is 3 dimensional (batch_size, timesteps, features).
I did not mean purely changing the term from TimeDistributedDense
to Dense
. Can we just change the layer from TimeDistributedDense
to Dense
with changing the dimension as well? What could be the real difference in structure and in results?
that is
Dense
layer deal with 2D tensor, and output a 2D tensor
TimeDistributedDense
layer deal with 3D tensor, and output a 3D tensor.
The inner operation is the same y=f(Wx+b)
where f(x)
is activation function, W
and b
are weight and bias. While in TimeDistributedDense
, the operation is applied to every timestep.
every timestep, do you mean every unfolded unit of the recurrent layer?
Or, actually, each time step means each char
on one sentence
? Then, what are the pros and cons of TimeDistributedDense
. Does it increase a lot of computation time?
Yes, timestep mean every unfolded unit of RNN.
for e.g.
sequence = [A,B,C,D,E]
so,
A is in time=1
B is in time=2
...
E is in time=5
If you are going to apply y=f(Wx+b)
in each timestep of your input (i.e. your input should be in 3D tensor shape), TimeDistributedDense
is your only choice, and there is no pros and cons of it.
I thought y=f(Wx+b)
is done in each timestep even using Dense
since they are fully connected. So Dense
actually only apply activation function to the last time step?
Can I say that: for Dense
, it is used in Many-to-One
or One-to-One
cases. And TimeDistributedDense
is used in Many-to-Many
and One-to-Many
cases?
I think you should take a look at the Keras documentation carefully, and perhaps also theano documentation.
Because there is a big difference of Dense
and TimeDistribudedDense
.
Dense
only receives 2D tensor, which means that there is NO time dimension. i.e. an 2D -> 2D conversion.
TimeDistributedDense
only receives 3D tensor, which includes time dimension. i.e. an 3D -> 3D conversion.
Q: So Dense actually only apply activation function to the last time step?
A: No, there is no time dimension in Dense
layer
Q: for Dense , it is used in Many-to-One or One-to-One cases
A: it is one-to-one
Q: And TimeDistributedDense is used in Many-to-Many and One-to-Many cases?
A: it is many-to-many
So, the lstm_text_generation example is actually a one-to-one
case.
print('Build model...')
model = Sequential()
model.add(LSTM(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))
if you say the Dense
layer, that is one-to-one
case, as the previous layer LSTM
will return a 2D tensor type, which is the final state of LSTM. And the Dense
layer will output a 2D tensor, which is a probability distribution (softmax
) of whole vocabulary.
Thanks a lot. @ymcui I am wondering could you please take a look at my another post here Some interesting results of using this lstm_text_generation example. Need reasonable explanations.. That will be very helpful.
Hi,
I want to train simple neural network with a data of shape (11,501,40)
I set the input_shape of dense layer also (11,501,40) but is is not working
kindly guide me .
The code and error is given below
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
path="D:/DECASE2017/CNN/all_data_partial.npy"
data=np.load(open(path,'rb'))
X=np.array((data[:11,:501,:40]))# all channels,all rows and 40 columns
Y=np.array((data[:11,:501,40]))# all channels, all rows and only one column no. 40 (e.g class_label)
model = Sequential()
model.add(Dense(12,input_shape=(11,501,40),init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
#compile a model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#fit the model
model.fit(X,Y,nb_epoch=1, batch_size=10)
#evaluate
score = model.evaluate(X,Y)
print("%s: %.2f%%" %(model.metrics_names[1], score[1]*100))
ValueError: Error when checking input: expected dense_80_input to have 4 dimensions, but got array with shape (11, 501, 40)
Thanking you
@fluency03 In your model, why do you add an activation layer with linear activations at the end, i.e. model.add(Activation("linear"))
? Does it have any effect?
For a regression type problem what dimension should i used to run my code and
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import tensorflow as tf
from matplotlib import pyplot
from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.preprocessing import StandardScaler
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD
seed = 7
np.random.seed(seed)
from scipy.io import loadmat
dataset = loadmat('matlab2.mat')
Bx=basantix[:, 50001:99999]
Bx=np.transpose(Bx)
Fx=fx[:, 50001:99999]
Fx=np.transpose(Fx)
from sklearn.cross_validation import train_test_split
Bx_train, Bx_test, Fx_train, Fx_test = train_test_split(Bx, Fx, test_size=0.2, random_state=0)
scaler = StandardScaler() # Class is create as Scaler
scaler.fit(Bx_train) # Then object is created or to fit the data into it
Bx_train = scaler.transform(Bx_train)
Bx_test = scaler.transform(Bx_test)
model = Sequential()
def base_model():
keras.layers.Dense(Dense(49999, input_shape=(20,), activation='relu'))
model.add(Dense(20))
model.add(Dense(49998, init='normal', activation='relu'))
model.add(Dense(49998, init='normal'))
model.compile(loss='mean_squared_error', optimizer = 'adam')
return model
scale = StandardScaler()
Bx = scale.fit_transform(Bx)
Bx = scale.fit_transform(Bx)
clf = KerasRegressor(build_fn=base_model, nb_epoch=100, batch_size=5,verbose=0)
clf.fit(Bx,Fx)
res = clf.predict(Bx)
clf.score(Fx,res)
kindly provide exact solution
@around1991 "It won't compile... Dense expects a 2-dimensional input..." This snippet compiles fine:
from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model
inputs = Input(shape=(10, 30))
x = Dense(20)(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)
print(model.summary())
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
and here is the model summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 10, 30) 0
_________________________________________________________________
dense_1 (Dense) (None, 10, 20) 620
_________________________________________________________________
conv1d_1 (Conv1D) (None, 6, 40) 4040
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 40) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 123
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________
This version uses the TimeDistributed and looks like it has the same model summary:
from keras.layers import TimeDistributed, Dense, Input, Conv1D, MaxPooling1D, Flatten
from keras.models import Model
inputs = Input(shape=(10, 30))
x = TimeDistributed(Dense(20))(inputs)
x = Conv1D(40, 5)(x)
x = MaxPooling1D(5)(x)
x = Flatten()(x)
x = Dense(3)(x)
model = Model(inputs, x)
print(model.summary())
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
Here is the summary:
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 10, 30) 0
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 20) 620
_________________________________________________________________
conv1d_1 (Conv1D) (None, 6, 40) 4040
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 1, 40) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 40) 0
_________________________________________________________________
dense_2 (Dense) (None, 3) 123
=================================================================
Total params: 4,783
Trainable params: 4,783
Non-trainable params: 0
_________________________________________________________________
@fluency03 Did you figure out the answer to your question, I am still confused with the above example! Dense
does accept 3D input, It is simply a matrix multiplication (and adding a bias term) nothing wrong with (?, 10, 30) x (30, 20) ---> (?, 10, 20) (matrix is 30x20=600 params) This matrix multiplication is nothing but applying a fully connected (30x20) layer to each of the 10 30-dimensional vectors of the input, which seems to be the same as what TimeDistributed
does!
@rmanak I think although Dense
layer accepts 3D input, it flattens the first two dimensions, but TimeDistributed Dense
won't flatten the first 2 dimensions (batch_size, time_steps)
, thus the temporal information is reserved and not mixed.
Most helpful comment
that is
Dense
layer deal with 2D tensor, and output a 2D tensorTimeDistributedDense
layer deal with 3D tensor, and output a 3D tensor.The inner operation is the same
y=f(Wx+b)
wheref(x)
is activation function,W
andb
are weight and bias. While inTimeDistributedDense
, the operation is applied to every timestep.