Models: Retrain im2txt with custom database

Created on 5 Apr 2017 · 4Comments · Source: tensorflow/models

i am trying to retrain the im2txt model using a different database other than mscoco. i have created the tfrecords: 256 train, 4 val and 8 test. When i run train.py, i get the following error:

INFO:tensorflow:Starting Session.
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,2] = 19523 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = GatherTindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,2] = 19523 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = GatherTindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,4] = 22622 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = GatherTindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,4] = 22622 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = GatherTindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,4] = 22622 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = GatherTindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"]]
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[1,4] = 22622 is not in [0, 12000)
[[Node: seq_embedding/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, _class=["loc:@seq_embedding/map"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](seq_embedding/map/read, batch_and_pad:1)]]

Any ideas? Thank you
Tensorflow version: 1.0.1

awaiting response

Source

obip

Most helpful comment

@michaelisard That was the problem. I have changed the vocabulary size and the model is training. Thank you so much for the help.

obip on 6 Apr 2017

👍3

All 4 comments

It looks as if the vocabulary size is set to 12000 in https://github.com/tensorflow/models/blob/master/im2txt/im2txt/configuration.py#L53. Do you need to increase that for your dataset?

michaelisard on 5 Apr 2017

@michaelisard That was the problem. I have changed the vocabulary size and the model is training. Thank you so much for the help.

obip on 6 Apr 2017

👍3

Hello,

you have to create the tfrecords-files again, that is after changing the
vocabulary size.
Then you can train and will not encounter the error.

Hope this helps!
Obi

2017-05-03 16:01 GMT+02:00 DaveyTao notifications@github.com:

@obip https://github.com/obip Hello, friend: I encounter the same
problem. When I increase the vocabulary size, I got an error,
Invalid argument: Assign requires shapes of both tensors to match. lhs
shape= [40000] rhs shape= [12000]. Have you encountered this problem?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1301#issuecomment-298919582,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AVVp1yPssaco8tB-3Rnx90RUEBfOhelsks5r2IivgaJpZM4M0vnK
.

obip on 4 May 2017

@obip Hi, did you use a checkpoint for a pre-trained model on the MSCOCO dataset or did you start from zero? Because I am trying to fine-tune the pre-trained model with new data and am not sure what to do with the word_counts.txt file. Only use the one for the new vocabulary or merge it with the original one?