Keras: Conditional Random Field on top of output layer?

Created on 13 Oct 2015  路  5Comments  路  Source: keras-team/keras

Does anybody try to put conditional random field (CRF) on top of layers for sequence modeling using Keras?

Suggestions or advices are welcome!

Most helpful comment

@hugman, did you have any success? If I understand RNNs and CRFs correctly, it is the condition on the last (output) label that CRFs do, but RNNs (usually) don't.

All 5 comments

Can you give more details on what you are thinking? Are you talking about using a Linear Chain CRF on the outputs of a CNN (or RNN)? Or something more complicated? The goals of a LC-CRF and RNN are very similar. Moreover under certain conditions an RNN is almost the same as a continuous latent variable CRF.

@colincsl Sorry for lack of information.

What I am trying to do is applying sequence-wise likelihood on top op RNN layers just like SENNA.
It is quite similar to Linear chain CRF.

Check the ideas from here : SENTENCE-LEVEL LOG-LIKELIHOOD - section 3.4.2

Probably the Keras code looks like

model = Sequential()
model.add(Embedding(vocab_size, 256, input_length=maxlen))
model.add(LSTM(output_dim=128, return_sequences=True))
model.add(CHAIN(output_dim=num_class))  
model.add(Activation('softmax'))  

Any easy approach to implement it using Keras API only?

If I understand correctly you would have to implement this using a new layer (e.g. reference SimpleRNN). It shouldn't be too hard but you will have to write a small amount of Theano code in the _step function.

Note that the CHAIN function you use isn't too different than a typical RNN layer. You might want to look into the Bidirectional RNN stuff that people are working on instead.

One of the other differences with typical CRFs is how they are trained. In the linear chain case you first define an energy function that models your unary and pairwise terms. You then optimize your weights using this energy function. The details are beyond what i'm willing to write about here :).

Thank you for comments.

As you pointed out, probably RNN is possibly able to capture output sequences (such as B-I-O tags) as well as linear-chain CRF, may be better.

I will compare two approaches.

  • RNN or Bidirectional RNN to capture sequence-wise outputs
  • Add sequence-wise cost to normal RNN model to capture sequence-wise outputs

@hugman, did you have any success? If I understand RNNs and CRFs correctly, it is the condition on the last (output) label that CRFs do, but RNNs (usually) don't.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zygmuntz picture zygmuntz  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

KeironO picture KeironO  路  3Comments

fredtcaroli picture fredtcaroli  路  3Comments

snakeztc picture snakeztc  路  3Comments