Keras: Sequence To Sequence w/ Beam Search

Created on 13 Dec 2015  路  15Comments  路  Source: keras-team/keras

Currently RNN's with return_sequences=True and TimeDistributedDense take the greedy decision for each step in the sequence path. During prediction, this is usually not optimal as the best path may not be the greedy path. Beam search is often used to improve the accuracy of sequence generating recurrent neural networks. See http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf for reference.

I'm interested in implementing a Beam Search mechanism/option for the predict method (or to make a new method) for these sequence-generating models. Checking in to see if there's been work around this and if anyone can suggest good general starting places.

A rough outline is as follows (if we're predicting character sequences): Predict the probability distribution for the first predicted character slot, predict the probabilities of the 2nd character slot for each of all of the first character possibilities, calculate the total path probabilities by summing each's log probabilities, then prune down to the top K total paths + probabilities. If an end character is predicted, add the solution + probability to an answers table. Then predict the probabilities of the 3rd character slot using the paths of length 2 as starting points. And so on. Continue this process till the max possible length is reached. Take the argmax of the answers table and return it.

The api could be model.predict(x, 'beam_search'=True) or model.predict_beam_search(x). The one uncertainty I have is how to know which layer to grab the probabilities from. Maybe the user has to specify which layer?

Would love some thoughts + discussion!

stale

Most helpful comment

Did anyone ever make this work with only minor modifications to Keras?

All 15 comments

Thanks you for pointing that out, i'm also interested in this since i'm currently working on sequence labelling tasks and it is interesting and useful to apply beam_search/viterbi decoding during training and prediction.
My current results uses the greedy search that is not really correct/optimal for these tasks. I would say the we need a predict_sequence api that may be used with different type of search operation and objectives. To be honest if we are going to develop that it would be interesting to have a flexible api that can be extended to do beam search / viterbi decoding or the CTC for speech recognition.
I'm interested in thought and discussions in particular from people that worked on https://github.com/fchollet/keras/issues/395 in particular from @farizrahman4u that has a nice implementation of seq to seq models https://github.com/farizrahman4u/seq2seq .

Thanks for the response! I agree that it should be a flexible api with a different options.

I'm digging in to the internals of TimeDistributedDense now. Trying to get an initial first pass working in the next couple days, will post updates to this thread.

Using the addition_rnn.py example as a playground to test it. Do I have to reinstall keras after every change I made to test it? My current workflow is this and its very slow. Any suggestions?
EDIT: Found out about python setup.py develop. Workflow is much better.

@nicholaslocascio Do you have any updates on this? I'm highly interested in this as well and also consider digging into the code to implement this, but it would be helpful to know what you have already tried or found out.

@nicholaslocascio pinged you and @farizrahman4u on the seq2seq thread as well, and just found this one. Would love to follow updates if any.

In searches, I found an implementation of the concept (almost exactly that) here: https://github.com/ryankiros/skip-thoughts/blob/master/decoding/search.py

And TF issue here: https://github.com/tensorflow/tensorflow/issues/654

any updates on this?

My beam search for Keras RNN https://gist.github.com/udibr/67be473cf053d8c38730
I started with https://github.com/ryankiros/skip-thoughts/blob/master/decoding/search.py and made it more readble (at least to myself)

@udibr thanks for contributing! Though I believe a model using your code would have to predict just one character/word at a time, with 'return_sequences=False', instead of a traditional sequence-to-sequence model that would have 'return_sequences=True'.

But I guess this brings up an interesting question - is it really necessary to have viterbi beam search with 'return_sequences=True'?

The main pro of viterbi beam search with 'return_sequences=True' is speed. Instead of processing the entire input sequence over and over with the one-char at a time approach, we just feed in the previous processed state and only have to process how the new character changes the previously stored state. There's also potential speed ups from implementing viterbi beam search in the theano graph instead of Python.

Can anyone bring up some real pros other than speed? I was mostly interested because every sequence to sequence paper mentions using beam search so I was wondering if this is done within the sequence processing (for Keras 'predict_sequence=True') or ad-hoc mechanism just having the model predict 1 single char/word at a time (ala 'predict_sequence=False').

Nice, this looks like really promissing

@nicholaslocascio most of the code which i have seen predicts one single word at a time. Especially with Keras since you would have to make modifications to the provided LSTM() so that the previous state can be used at the next timestep.

Did anyone ever make this work with only minor modifications to Keras?

I am interested in using Beam search for my seq2seq task. Do we have a running code that can be directly plugged in?
Thanks for the information. Great work!

I implemented a toy beam search for referenced only, https://gist.github.com/xylcbd/0abee09de5ca6a0364c4de2aa46ef90f

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

No news since 2017?

Any updates on this? @nicholaslocascio ?

Was this page helpful?
0 / 5 - 0 ratings