Keras: How to implement a conv layer with different filter sizes (Zhang & Wallace 2015) ?

Created on 9 May 2017 · 12Comments · Source: keras-team/keras

Hello,

I'm trying to reproduce the CNN architecture proposed in this paper, which has the following 1-CNN-layer architecture with two of each of the varying filter sizes (+ global maxpooling and dropout):
screen shot 2017-05-09 at 00 49 10

Is there a way to implement this architecture in Keras?

Best,
ben0it8

stale

Source

ben0it8

❤2

Most helpful comment

Here's what I ended up doing, which appears to be doing the right thing, but I'm still new enough to Keras that I haven't figured out how to introspect this properly to make sure...

submodels = []
for kw in (3, 4, 5):    # kernel sizes
    submodel = Sequential()
    submodel.add(Embedding(len(word_index) + 1,
                           EMBEDDING_DIM,
                           weights=[embedding_matrix],
                           input_length=MAX_SEQUENCE_LENGTH,
                           trainable=False))
    submodel.add(Conv1D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1))
    submodel.add(GlobalMaxPooling1D())
    submodels.append(submodel)
big_model = Sequential()
big_model.add(Merge(submodels, mode="concat"))
big_model.add(Dense(HIDDEN_DIMS))
big_model.add(Dropout(P_DROPOUT))
big_model.add(Activation('relu'))
big_model.add(Dense(1))
big_model.add(Activation('sigmoid'))
print('Compiling model')
big_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

fmailhot on 22 May 2017

👍9

All 12 comments

Apply different convolutional layers on the same input and merge their outputs?

kgrm on 9 May 2017

I'm in the middle of figuring this out myself. Here's what I think is necessary.

1) You'll need to replicated your inputs across each of the input "channels" (i.e. for each filter width).
2) You're doing a "concatenate" merge after the GlobalMaxPooling1D on the Conv1D layer outputs (it looks like there are 2 "merges" happening in the diagram, but I don't believe it's necessary.

Have a look at the following for inspiration:
https://gist.github.com/ameasure/944439a04546f4c02cb9
https://statcompute.wordpress.com/2017/01/08/an-example-of-merge-layer-in-keras/

Let me know if you've made any progress, and I'll do the same.

fmailhot on 22 May 2017

Here's what I ended up doing, which appears to be doing the right thing, but I'm still new enough to Keras that I haven't figured out how to introspect this properly to make sure...

submodels = []
for kw in (3, 4, 5):    # kernel sizes
    submodel = Sequential()
    submodel.add(Embedding(len(word_index) + 1,
                           EMBEDDING_DIM,
                           weights=[embedding_matrix],
                           input_length=MAX_SEQUENCE_LENGTH,
                           trainable=False))
    submodel.add(Conv1D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1))
    submodel.add(GlobalMaxPooling1D())
    submodels.append(submodel)
big_model = Sequential()
big_model.add(Merge(submodels, mode="concat"))
big_model.add(Dense(HIDDEN_DIMS))
big_model.add(Dropout(P_DROPOUT))
big_model.add(Activation('relu'))
big_model.add(Dense(1))
big_model.add(Activation('sigmoid'))
print('Compiling model')
big_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

fmailhot on 22 May 2017

👍9

I was trying to fit your implementation but got:
ValueError: The model expects 3 input arrays, but only received one array. Found: array with shape (48943, 300)

Any idea?

ben0it8 on 23 May 2017

Yes, this is what I meant about "replicating the inputs"...sorry, I should have included the fit() call to clarify.

hist = big_model.fit([x_train, x_train, x_train],
                     y_train,
                     batch_size=BATCH_SIZE,
                     epochs=EPOCHS,
                     validation_data=([x_val, x_val, x_val], y_val),
                     callbacks=callbacks)

You can see...I have x_train and x_val as my training/validation inputs...because I'm using three different filter sizes, it's like the net is expecting 3 different input streams. By turning my inputs into a list of NUM_KERNEL_SIZES times the inputs, that gets handled.

fmailhot on 23 May 2017

thank you for sharing that!

ben0it8 on 23 May 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 22 Aug 2017

The same problem seems to be addressed and solved in this issue using the Graph model.

nikicc on 11 Sep 2017

👍2

stale[bot] on 10 Dec 2017

Closing as this is resolved

wt-huang on 13 Nov 2018

Here's what I ended up doing, which _appears_ to be doing the right thing, but I'm still new enough to Keras that I haven't figured out how to introspect this properly to make sure...

submodels = []
for kw in (3, 4, 5):    # kernel sizes
    submodel = Sequential()
    submodel.add(Embedding(len(word_index) + 1,
                           EMBEDDING_DIM,
                           weights=[embedding_matrix],
                           input_length=MAX_SEQUENCE_LENGTH,
                           trainable=False))
    submodel.add(Conv1D(FILTERS,
                        kw,
                        padding='valid',
                        activation='relu',
                        strides=1))
    submodel.add(GlobalMaxPooling1D())
    submodels.append(submodel)
big_model = Sequential()
big_model.add(Merge(submodels, mode="concat"))
big_model.add(Dense(HIDDEN_DIMS))
big_model.add(Dropout(P_DROPOUT))
big_model.add(Activation('relu'))
big_model.add(Dense(1))
big_model.add(Activation('sigmoid'))
print('Compiling model')
big_model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

In the above code as @fmailhot mentioned. When I tried to compile it says there is no Layer named Merge().