Keras: I trained my model on the IMDB data set and saved it, how do I take new text fields and have the model predict the sentiment?

Created on 28 Apr 2017 · 6Comments · Source: keras-team/keras

I trained the model off the IMDB data set as seen in the example code https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py

and saved the produced model with model.save as sent_model.h5

I load the model with model = load_model("sent_model.h5"), print "model loaded" to screen to see that it has been done without error.

and now can not find what I need to do to import text from a date.csv and be able to run a predict to get a sentiment analysis output. the 1st col in the csv is a natural language string and I would like to have the model attempt to analyse the sentiment using the training from the imdb data set but I can not find any guides or tutorials on how to take the text input and process it so that the model can predict from it. Can anyone please help me out?

stale

Source

DenymW

Most helpful comment

Since the imdb dataset stores the words as sequences of word indexes, you need to download the word index data, to convert your data into word index sequences. You can do this by doing word_index = imdb.get_word_index() (assuming you've already done from keras.datasets import imdb). The word index is a python dictionary that maps words to their indexes.

Now, you can take your text, clean it up (using keras text preprocessing to turn it into a list of lowercase words), and then convert the word list into a numpy array of word indexes, with 0 for unknown words. You could do this with something like np.array([word_index[word] if word in word_index else 0 for word in words]), for each string in your data file. After that, you should be able to just run model.predict on a numpy array of all the index sequences in your dataset.

superMDguy on 28 Apr 2017

👍3

All 6 comments

superMDguy on 28 Apr 2017

👍3

Thank you, I could not find how to call the index, thank you so much :)

DenymW on 28 Apr 2017

@superMDguy Hi, I followed as best I could and managed to turn sentence from file into an np.array
[0 6 396 32 605 289 10 0]
running model.predict on the array returns
[[0.44817322]
[0.62561291]
[0.29010171]
[0.61456531]
[0.11232003]
[0.01077116]
[0.58339131]
[0.44817322]]

as this was supposed to predict the sentiment of the sentence I through it should return a single value for the sentence as apposed to a value per word. I have clearly done something wrong I am just not sure what? any help you can offer would be appreciated.

DenymW on 29 Apr 2017

It's expecting a numpy array of batches, so feed it in as np.array([words])
(or whatever you call your word sequence variable.

On Fri, Apr 28, 2017, 11:18 PM DenymW notifications@github.com wrote:

@superMDguy https://github.com/superMDguy Hi, I followed as best I
could and maged to turn sentance from file into an np.array
[0 6 396 32 605 289 10 0]
running model.predict on the array returns
[[0.44817322]
[0.62561291]
[0.29010171]
[0.61456531]
[0.11232003]
[0.01077116]
[0.58339131]
[0.44817322]]

as this was supposed to predict the sentiment of the sentence I through it
should return a single value for the sentence as apposed to a value per
word. I have clearly done something wrong I am just not sure what? any help
you can offer would be appreciated.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/6425#issuecomment-298145950,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ASx3GM70Qoer8sGpzz-zp_QqN3gZ0PR1ks5r0roAgaJpZM4NLOUa
.

>

-Matthew

superMDguy on 29 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 28 Jul 2017

It's expecting a numpy array of batches, so feed it in as np.array([words]) (or whatever you call your word sequence variable.

On Fri, Apr 28, 2017, 11:18 PM DenymW @.*> wrote: @superMDguy https://github.com/superMDguy Hi, I followed as best I could and maged to turn sentance from file into an np.array [0 6 396 32 605 289 10 0] running model.predict on the array returns [[0.44817322] [0.62561291] [0.29010171] [0.61456531] [0.11232003] [0.01077116] [0.58339131] [0.44817322]] as this was supposed to predict the sentiment of the sentence I through it should return a single value for the sentence as apposed to a value per word. I have clearly done something wrong I am just not sure what? any help you can offer would be appreciated. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6425 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ASx3GM70Qoer8sGpzz-zp_QqN3gZ0PR1ks5r0roAgaJpZM4NLOUa .

…
-Matthew

Brother I followed the way u told. So that I can convert my sentence in imdb decoded sentence but it returns a number for each letter but in imdb there is a number for a word