models 🚀 - Pretrained model for img2txt?

@cshallue: could you comment on this? Thanks.

concretevitamin on 29 Sep 2016

+1

siavashk on 30 Sep 2016

Sorry, we're not releasing a pre-trained version of this model at this time.

cshallue on 1 Oct 2016

👎22 😕3

here are links to a pre-trained model:

initial training (1m steps without finetuning inception). perplexity ~8.7
finetuned (1m more steps). perplexity ~8.4
word_counts.txt

psycharo on 5 Oct 2016

👍68 🎉37 ❤18 👎3

@psycharo thanks for sharing! Perhaps you could also share your word_counts.txt file. Different versions of the tokenizer can yield different results, so your model is specific to the word_counts.txt file that you used.

cshallue on 12 Oct 2016

👍3 👎2 🎉1

@psycharo my training is still training on our GPU instance. It seems it would take another two weeks to finish. I would appreciate it if you would also release the fine-tuned model.

siavashk on 12 Oct 2016

@psycharo Thanks for sharing your checkpoint!

When I try to use it I'm getting the error: "ValueError: No checkpoint file found in: None".
I don't have any trouble doing run_inference my own checkpoint files but I can't do it on yours. I've tried lots of things: adding a trailing "/", using absolute paths, relative paths, ..... Nothing seems to work.

Suggestions welcomed.
@cshallue - Any thoughts?

Thanks all.

Last login: Sat Oct 15 07:10:56 2016 from 3.202.121.241
user123@myhost:~$ ls -l /tmp/checkpoint_tmp/
total 175356
-rw-r--r-- 1 user123 user123  19629588 Oct 15 07:04 graph.pbtxt
-rw-r--r-- 1 user123 user123 149088120 Oct 15 07:04 model.ckpt-2000000
-rw-r--r-- 1 user123 user123  10675545 Oct 15 07:04 model.ckpt-2000000.meta
-rw-rw-r-- 1 user123 user123    156438 Oct 15 07:08 word_counts.txt
user123@myhost:~$  /data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference   --checkpoint_path=/tmp/checkpoint_tmp   --vocab_file=/tmp/checkpoint_tmp/word_counts.txt   --input_files=${IMAGE_FILE}
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
  File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 49, in main
    FLAGS.checkpoint_path)
  File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 118, in build_graph_from_config
    return self._create_restore_fn(checkpoint_path, saver)
  File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 92, in _create_restore_fn
    raise ValueError("No checkpoint file found in: %s" % checkpoint_path)
ValueError: No checkpoint file found in: None
user123@myhost:~$

ProgramItUp on 15 Oct 2016

@ProgramItUp Try the following: --checkpoint_path=/tmp/checkpoint_tmp/model.ckpt-2000000

When you pass a directory, it looks for a "checkpoint state" file in that directory, which is an index of all checkpoints in the directory. Your directory doesn't have a checkpoint state file, but you can just pass it the explicit filename.

cshallue on 15 Oct 2016

👍12 😄4

Getting better, but...

Traceback (most recent call last):
  File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 53, in main
    vocab = vocabulary.Vocabulary(FLAGS.vocab_file)
  File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/vocabulary.py", line 50, in __init__
    assert start_word in reverse_vocab
AssertionError

PredragBoksic on 15 Oct 2016

👍3

Looks like the word_counts.txt file above is not formatted as expected:

b'a' 969108
b'</S>' 586368
b'<S>' 586368
b'.' 440479
b'on' 213612
b'of' 202290
b'the' 196219
b'in' 182598
b'with' 152984
...

vocabulary.py expects:

a 969108
</S> 586368
<S> 586368
. 440479
on 213612
of 202290
the 196219
in 182598
with 152984
...

A quick fix is to reformat the word_counts.txt in that way. Or, you could replace line 49 of vocabulary.py with

reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]

In the long run, I'll come up with a way to make sure word_counts.txt is outputted the same for everyone.

cshallue on 15 Oct 2016

👍17 😄2

It works!

http://stablescoop.horseradionetwork.com/wp-content/uploads/2013/10/ep271.jpg

Captions for image cb340488986cc40f8ec610348b7f5a24.jpg:
  0) a woman is standing next to a horse . (p=0.000726)
  1) a woman is standing next to a horse (p=0.000638)
  2) a woman is standing next to a brown horse . (p=0.000373)

PredragBoksic on 15 Oct 2016

👍4 👎1

@PredragBoksic great!

@psycharo , what version of python did you use to generate the word_counts.txt file?

I expect the script to output lines of the form:

a 969108
</S> 586368
<S> 586368

not:

b'a' 969108
b'</S>' 586368
b'<S>' 586368

cshallue on 15 Oct 2016

I didn't generate the word_counts.txt file. I changed the line 49 as you suggested it, with:

    """ WORKAROUND for vocabulary file """
    """reverse_vocab = [line.split()[0] for line in reverse_vocab]"""
    reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]

I have Python 2.7.12 on KUbuntu 16.04 with CUDA 8.0 and CUDNN 5.1 and GTX970. I would not know how to do it in Python, because I program in Java usually. Do you need some code to change that file?

PredragBoksic on 15 Oct 2016

👍1

@PredragBoksic I'm asking the creator of that file. You can just keep using the workaround :)

cshallue on 15 Oct 2016

👍1

@cshallue python 3.5. I had to make a couple of dirty hacks to make it work on that version of python, this is why word_counts.txt looks different.

psycharo on 15 Oct 2016

👍1

@psycharo How many hours did this take to train? I think that people would appreciate what you shared more if you mentioned this.

PredragBoksic on 16 Oct 2016

@PredragBoksic
initial training took about 2-3 days, finetuning for 1m iterations took around 5-6 days. I used single GPU, Tesla P100.

psycharo on 16 Oct 2016

🎉14 👍4

@cshallue Thanks for the prompt replies. Your suggestions worked.

I was not able to follow the full execution path of the code:

Where would be the right place to put a bit of error checking to make sure that the files
--checkpoint_path, --vocab_file, --input_files exist and throw an error if they don't?

In the case of the checkpoint file it would be helpful to throw an error if "checkpoint state" is not found.
Where would this happen?

Thanks.

ProgramItUp on 16 Oct 2016

There are already error checks for all those things.

If no checkpoint state or no checkpoint file is found in --checkpoint_path, it will fail the check here.

If --vocab_file doesn't exist it will fail the check here.

If no files match --input_files then you will get the message "Running caption generation on 0 files matching..." and inference will exit: see here.

cshallue on 16 Oct 2016

I did not notice any meaningful error messages, for example when the image file was missing. I suppose that this functionality will be completed in the future.

PredragBoksic on 16 Oct 2016

@cshallue: I am running the finetuning step of the optimization. What I noticed was that the loss function is not changing much for the initial 22000 steps. The loss is pretty much stuck at 2.40.

I have attached the log file by pumping the stderr to a text file. Is the loss going to go significantly down in the remaining iterations? Or am I missing some "gotcha"?
log_finetune.txt

siavashk on 25 Oct 2016

@siavashk The loss reported by the training script is expected to be really noisy: it reports on single batches of only 32 examples.

Are you running the evaluation script on the validation files? We expect to see validation perplexity decreasing slowly. It decreases slowly because the model is already near optimal and because we use a smaller learning rate during finetuning.

cshallue on 25 Oct 2016

@cshallue Maybe I am overly anxious, 22000 steps is about 1% of the optimization. I am just worried that it has been three weeks since I started training this model, and it seems it is going to take another two weeks for it to converge.
I am not running the validation script, since the training itself is taking too long (it's been three weeks now and I am at 1 million iterations). I thought running an additional validation step would make this even longer.

siavashk on 25 Oct 2016

You won't be able to tell much from the training losses for a single batch any more. They will keep jumping around.

You could always just use the model in its current form. It will probably be sensible. There is not much improvement after 1M steps of fine tuning.

Or you could use the model shared in this thread above.

cshallue on 25 Oct 2016

@siavashk do we need to rerun the pre-training step if we use the word_counts.txt file from @psycharo or what is the correct workflow here?

hholst80 on 15 Nov 2016

@hholst80, I don't think you need to pre-train. Here is how I used @psycharo's pre-trained model:

Download the finetuned model
Download the word_counts.txt file
Run the evaluation script as described here
If you get an issue similar to this, do the patch as described by @cshallue in the same comment.

siavashk on 15 Nov 2016

👍11 🎉8 ❤5 😄1

@psycharo Thanks a lot! It saves much time!
Would you please share the latest checkpoint that you got?

zhoujunpei on 21 Nov 2016

I almost finished the training (3000 000) if somebody is interested. I'll train something with the inception resnet V2 later.

ogreports on 21 Nov 2016

👍11

@TRGNN
Could you please share the trained model after finish it? I think a great number of people is interested in it and would appreciate it. Thanks!

zhoujunpei on 21 Nov 2016

👍2

@TRGNN
Could you, please, also post your trained data with the inception resnet V2 ? Thank you very much

LupascuAndrei on 22 Nov 2016

You people (all) are fabulous!

I needed to edit the TensorFlow-provided im2txt scripts (and add to my $PYTHONPATH -- py27 venv -- via a *.pth file), as the paths in the script in the via github cloned repo (https://github.com/tensorflow/models/tree/master/im2txt) were not working for me. I did all of this without the use of bazel -- just straight-up edits in an editor and implementations in a terminal (linux).

I downloaded psycharo's pretrained model (thank you very much!), edited the vocabulary.py file as suggested by cshallue and -- presto! -- I'm successfully classifying images! :-)

Thank you to all involved. :-)

victoriastuart on 9 Dec 2016

@victoriastuart , I believe that Bazel works well if you clone the entire repository, enter the appropriate folder and use Bazel execution in that ../folder. It's counterintuitive.

PredragBoksic on 9 Dec 2016

@PredragBoksic: ahh good to know - thanks! I'm new to bazel ecosystem, and the instructions on the im2txt site are not as clear as they could be, in my opinion. Anyway, it's working and even better, I learned a lot while sorting it out! ;-)

victoriastuart on 9 Dec 2016

@siavashk hi: when run the evaluate script ,what is the train and eval dir?

bitwangdan on 12 Dec 2016

@ProgramItUp hi :before run the script use the pre-trained model , what i need to do?

bitwangdan on 12 Dec 2016

Thanks @psycharo @cshallue !!!

I could successfully run the model thanks to you guy 😄

tae-jun on 28 Dec 2016

I'm on Python 3.5 and the fix:
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]
did not work for me. The bytes were not being decoded into strings, so I was getting the same assertion error. The following does work for me:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]

Thanks for the model @psycharo !

srome on 2 Jan 2017

@TRGNN

Sir,
Did you ever post the model you trained onto the internet anywhere ?

samuel-davis on 4 Jan 2017

It's great to share @psycharo

inbreaks on 4 Jan 2017

It would be great if it supports TensorFlow Serving, just like the Inception model.

I may open another issue to track this issue if anyone is interested in this.

tobegit3hub on 11 Jan 2017

@outcastrift I just uploaded it. Sorry for the delay guys. The archive contain the model checkpoint @3000000 steps (1000000 without and 2000000 with inception training) and the word_count file.

https://drive.google.com/open?id=0B_qCJ40uBfjEWVItOTdyNUFOMzg

ogreports on 11 Jan 2017

👍13 ❤10

@TRGNN cool man

CanoeFZH on 16 Jan 2017

hi ,I meet a question while running my demo:
CRITICAL:tensorflow:Vocab file /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt not found.
Traceback (most recent call last):
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 53, in main
vocab = vocabulary.Vocabulary(FLAGS.vocab_file)
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/vocabulary.py", line 48, in __init__
reverse_vocab = list(f.readlines())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 128, in readlines
self._preread_check()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 73, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt

However, the word_counts.txt is in the direction......I don't know how to deal it. Please help me. Thank you!

adaxidedakaonang on 20 Jan 2017

@adaxidedakaonang yesterday I encountered some the same thing.
There seems to be a bug reading command line arguments. This is a real hack but one thing you can do for your demo is hard code the FLAGS variables in im2txt/run_inference.py with the full path and file names. Note: there are multiple copies of run_inference.py after running Bazel, change the original file do not try changing a copy in bazel-bin.
Make sense?

ProgramItUp on 20 Jan 2017

😄1

I have found this mistake, that's:
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt
This is 'models' rather than 'nmodels'. I input an 'n' mistakely ! However, I'm be grateful for you! Thanks!

adaxidedakaonang on 20 Jan 2017

Now I meet a question while running this code with os.popen, this is my code :
`import os
import subprocess

def getMessage(img):
jpg = img
cmdline = '''
CHECKPOINT_DIR="${HOME}/models-master/im2txt/im2txt/model/model.ckpt-3000000" & \

             VOCAB_FILE="${HOME}/models-master/im2txt/im2txt/data/word_counts.txt"  & \

             IMAGE_FILE="%s"  & \
             bazel build -c opt im2txt/run_inference  & \
             export CUDA_VISIBLE_DEVICES=""  & \
             bazel-bin/im2txt/run_inference \
             --checkpoint_path=/home/ubuntu/models-master/im2txt/im2txt/model/model.ckpt-3000000 \
             --vocab_file= /home/ubuntu/models-master/im2txt/im2txt/data/word_counts.txt\
              --input_files=%s



''' % (jpg, jpg)
print os.popen(cmdline)

getMessage(img='/home/ubuntu/models-master/im2txt/im2txt/data/dog.jpg')

I think you may know what this code mean: I try to run it in shell.
Note: the last three sentence: When I run it in shell directly it is this:
bazel-bin/im2txt/run_inference \

--checkpoint_path=${CHECKPOINT_DIR} \
--vocab_file=${VOCAB_FILE} \
--input_files=${IMAGE_FILE}
However, I run it by os.popen it begin to be like this:
bazel-bin/im2txt/run_inference --checkpoint_path=${HOME}/models-master/im2txt/im2txt/model/model.ckpt-3000000 --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}。
Do you know how to write it in 'cmdline'? Thank you~

adaxidedakaonang on 20 Jan 2017

If anyone gets error AttributeError: 'module' object has no attribute 'BasicLSTMCell', you can reset your git HEAD to the below commit. Seems the models repo has undergone lots of changes since December 2016.

$ git reset --hard 9997b250

ylashin on 22 Jan 2017

🎉2

@mathieuarbezhermoso vs @psycharo whose model should i use ??

Does anyone has a inception latest trained model ?

I dont have tesla's i cant train

nezcich on 3 Feb 2017

Hi,

You just need a decent nVidia card for training. No need for a Tesla. A
GeForce is good.

You can use the Inception V3 checkpoint I shared. I'll provide a link as
soon as I'll be home if needed.

Le 3 févr. 2017 15:39, "ricking06" notifications@github.com a écrit :

@mathieuarbezhermoso https://github.com/mathieuarbezhermoso vs @psycharo
https://github.com/psycharo whose model should i use ??

Does anyone has a inception latest trained model ?

I dont have tesla's i cant train

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/466#issuecomment-277262510,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APQ4WHxO5B42j7HS682HeChyLuTvejzAks5rYzw3gaJpZM4KIXMv
.

ogreports on 3 Feb 2017

i downloaded this model http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
i could not find the word count of this model.
Please share the link after you get home
thanks.

nezcich on 3 Feb 2017

tf.learn.latest_checkpoint returns a None .
I downloaded the @psycharo model into im2txt/model/train/
whats wrong ?

nezcich on 3 Feb 2017

Hi
I am trying to convert @mathieuarbezhermoso checkpoint into Const ops using freeze_graph.py
and it needs file .pb as input_graph , so how can I generate it ?
or if any one have it I`ll be thankful to you.

MoAbd on 5 Feb 2017

Hi!
Did anybody get following error when trying to run the checkpoint with Tensorflow1.0?

NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files im2txt/im2txt_pretrained2/model.ckpt-3000000
     [[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]

Thanks!

tintelle on 18 Feb 2017

@tintelle Looks the default variable names for the BasicLSTMCell were changed in TensorFlow1.0, and they no longer match the checkpoint. See the following thread for a pointer for renaming variables from checkpoints:

http://stackoverflow.com/questions/37086268/rename-variable-scope-of-saved-model-in-tensorflow

cshallue on 19 Feb 2017

👍3

@cshallue I'd really-really appreciate if you could post the steps to rename the variables wrt @tintelle's post. I'm not able to follow the solution mentioned on that stackoverflow thread.

anu-rock on 24 Feb 2017

Something like this should work:

new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(path_to_checkpoint)
for old_name in reader.get_variable_to_shape_map():
  new_name = ... # Rename as desired
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, path_to_new_checkpoint)

cshallue on 24 Feb 2017

This guide walks you through the major changes in the API and how to automatically upgrade your programs for TensorFlow 1.0. This guide not only steps you through the changes but also explains why we've made them.
https://www.tensorflow.org/install/migration

PredragBoksic on 24 Feb 2017

I haven't had time to look into renaming the variables yet, I fixed it in the end by installing Tensorflow0.12 in a separate environment. You can download the binary here: https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py2-none-any.whl and install it in your virtual environment like this:

export TF_BINARY_URL=path/to/tensorflow0.12
pip install --ignore-installed --upgrade $TF_BINARY_URL

tintelle on 24 Feb 2017

@cshallue I also met the trouble just as @tintelle , what makes me puzzled is how can i know the variable old_name and new_name for the BasicLSTMCell.
Thank you very much!

liyd on 1 Mar 2017

👍1

@mathieuarbezhermoso - Hey buddy. Thanks a tonne for sharing your trained model. If you got a chance to do further training after 3 million steps, would you mind sharing it?

anu-rock on 1 Mar 2017

Can some one help me figure out this issue ? the problems is @ tf.concat(initial_state, 1, name="initial_state"). I run the run_inference.py and I get this error.

Traceback (most recent call last):
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/run_inference.py", line 85, in
tf.app.run()
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/run_inference.py", line 50, in main
FLAGS.checkpoint_path)
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 115, in build_graph_from_config
self.build_model(model_config)
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/inference_wrapper.py", line 38, in build_model
model.build()
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/show_and_tell_model.py", line 359, in build
self.build_model()
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/show_and_tell_model.py", line 269, in build_model
tf.concat(initial_state, 1, name="initial_state")
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1075, in concat
dtype=dtypes.int32).get_shape(
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

bis-carbon on 1 Mar 2017

they changed the ordering of the parameters of the concat function in different Tensorflow versions. I'm using version 0.12, there it should be: tf.concat(1, initial_state, name="initial_state"). It's complaining because its expecting an int, but got the initial_state instead.

tintelle on 1 Mar 2017

@tintelle I fixed the issue by installing version 0.12 and reset my git HEAD to

$ git reset --hard 9997b250

Thank you for your quick reply.

bis-carbon on 2 Mar 2017

Hi!
Did anybody release a pre-trained model with Tensorflow1.0?
Thanks very much!!!

liyd on 6 Mar 2017

@bis-carbon - This method worked for me to fix the problem of "Tensor name "lstm/basic_lstm_cell/biases" not found in checkpoint files". I downgraded to Tensorflow 0.12 and reset to 9997b25. Now I have captions!

rtjarvis on 6 Mar 2017

@liyd I guess the pre-trained model above work as well for 1.0

I didn't try for im2txt, but it did work for inception.

There is a branch named update-models-1.0. Checkout that branch and try it 😄
It seems like im2txt also updated to TF 1.0 on the branch.

tae-jun on 6 Mar 2017

@tae-jun thank you for your help!
I encounter the problem just as @tintelle, I think this problem due to the default variable names in parameter model were changed in TensorFlow1.0, not because of course code. So update-models-1.0 may not work.

liyd on 6 Mar 2017

@bis-carbon would you please explain why we need to edit inception files, while we are using im2txt model

hajrakomal on 7 Mar 2017

Just wondering if anyone solved

NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files im2txt/im2txt_pretrained2/model.ckpt-3000000
     [[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]

in Tensorflow v1.0

Prakash19921206 on 12 Mar 2017

(tensorflow)user@main:~$ ls -l im2txt/model/train/
total 175200
-rw-r--r-- 1 user user  19629588 Oct  6 04:08 graph.pbtxt
-rw-r--r-- 1 user user 149088120 Oct 12 09:39 model.ckpt-2000000
-rw-r--r-- 1 user user  10675545 Oct 12 09:39 model.ckpt-2000000.meta

When I put only "graph.pbtxt", "model.ckpt-2000000", and "model.ckpt-2000000.meta" in train directory, it shows an error message that "No checkpoint file found in: None". So I tried "--checkpoint_path=im2txt/model/train/model.ckpt-2000000", but "bash: --checkpoint_path=im2txt/model/train/model.ckpt-2000000: No such file or directory" appears. What should I do? Did I do something wrong?

Actually, I tried to train but I thought that it would take too much time. So I stopped it when it was training 16700th one. And then I noticed that there were many files such as "checkpoint", "model.ckpt-16588.data-00000-of-00001", and etc.

(tensorflow)user@main:~$ ls -l im2txt/model/train/
total 1184160
-rw-rw-r-- 1 user user       457  3월  5 17:50 checkpoint
-rw-rw-r-- 1 user user   4056412  2월 26 23:39 events.out.tfevents.1488119947.main
-rw-rw-r-- 1 user user   4056412  3월  3 14:29 events.out.tfevents.1488518929.main
-rw-rw-r-- 1 user user   4056412  3월  3 14:39 events.out.tfevents.1488519554.main
-rw-rw-r-- 1 user user 265001124  3월  5 17:58 events.out.tfevents.1488526808.main
-rw-r--r-- 1 user user  19629588 10월  6 04:08 graph.pbtxt
-rw-rw-r-- 1 user user 149002244  3월  5 17:10 model.ckpt-16588.data-00000-of-00001
-rw-rw-r-- 1 user user     16876  3월  5 17:10 model.ckpt-16588.index
-rw-rw-r-- 1 user user   2170369  3월  5 17:10 model.ckpt-16588.meta
-rw-rw-r-- 1 user user 149002244  3월  5 17:20 model.ckpt-16634.data-00000-of-00001
-rw-rw-r-- 1 user user     16876  3월  5 17:20 model.ckpt-16634.index
-rw-rw-r-- 1 user user   2170369  3월  5 17:20 model.ckpt-16634.meta
-rw-rw-r-- 1 user user 149002244  3월  5 17:30 model.ckpt-16682.data-00000-of-00001
-rw-rw-r-- 1 user user     16876  3월  5 17:30 model.ckpt-16682.index
-rw-rw-r-- 1 user user   2170369  3월  5 17:30 model.ckpt-16682.meta
-rw-rw-r-- 1 user user 149002244  3월  5 17:40 model.ckpt-16730.data-00000-of-00001
-rw-rw-r-- 1 user user     16876  3월  5 17:40 model.ckpt-16730.index
-rw-rw-r-- 1 user user   2170369  3월  5 17:40 model.ckpt-16730.meta
-rw-rw-r-- 1 user user 149002244  3월  5 17:50 model.ckpt-16777.data-00000-of-00001
-rw-rw-r-- 1 user user     16876  3월  5 17:50 model.ckpt-16777.index
-rw-rw-r-- 1 user user   2170369  3월  5 17:50 model.ckpt-16777.meta
-rw-r--r-- 1 user user 149088120 10월 12 09:39 model.ckpt-2000000
-rw-r--r-- 1 user user  10675545 10월 12 09:39 model.ckpt-2000000.meta

Generating captions does work when I put them in the train directory. However, it seems like it uses only 16700 of data directly trained. It shows the same inaccurate results regardless of whether there are the above files such as "model.ckpt-2000000" or not.

That's the reason why I am confused. Do I need to additionally have some appropriate files such as "checkpoint"?

Any helps would be appreciated.
Thanks.

begongyal on 13 Mar 2017

A few people are having trouble using @psycharo 's trained checkpoint since the release of TF 1.0. Here are the steps you can follow.

Upgrade to the latest version of TensorFlow and fetch the latest version of this repository.

Firstly the word_counts.txt file provided above was generated with Python 3, so it wrote all the words like this: b'word'. You may need to rewrite that file. The following code worked for me on Python 2.7; you may have to tweak if you are using something different.

OLD_VOCAB_FILE = "word_counts.txt"
NEW_VOCAB_FILE = "word_counts2.txt"

with open(OLD_VOCAB_FILE) as f:
  lines = list(f.readlines())

def clean_line(line):
  tokens = line.split()
  return "%s %s" % (eval(tokens[0]), tokens[1])

newlines = [clean_line(line) for line in lines]

with open(NEW_VOCAB_FILE, "w") as f:
  for line in newlines:
    f.write(line + "\n")

Now we need to rename 2 of the variables in the checkpoint file.

OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000"
NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

Now it should work!

CHECKPOINT_PATH=".../model-renamed.ckpt-2000000"
VOCAB_FILE=".../word_counts2.txt"
IMAGE_FILE=".../COCO_val2014_000000224477.jpg"

# Build the inference binary.
bazel build -c opt im2txt/run_inference

# Run inference to generate captions.
bazel-bin/im2txt/run_inference \
  --checkpoint_path=${CHECKPOINT_PATH} \
  --vocab_file=${VOCAB_FILE} \
  --input_files=${IMAGE_FILE}

cshallue on 16 Mar 2017

👍23 ❤9 🎉2

@cshallue I'm following your steps. In the last step(# Build the inference binary. And # Run inference to generate captions.)Why i will get this wrong info?

INFO: Found 1 target...
Target //im2txt:run_inference up-to-date:
bazel-bin/im2txt/run_inference
INFO: Elapsed time: 0.290s, Critical Path: 0.02s
INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: /home/ljf/LiJunFeng/im2txt/word_counts2.txt
INFO:tensorflow:Created vocabulary with 11520 words
* Error in `/usr/bin/python2.7': double free or corruption (!prev): 0x0000000000d49010 *
Aborted (core dumped)

EDIT: After i changed my version of python(2.7.6->3.4.3), I solved the problem. But ,I got a new question like @lanewinfield
DataLossError (see above for traceback): Unable to open table file /home/ljf/LiJunFeng/im2txt/model.ckpt-2000000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_354 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_354/tensor_names, save/RestoreV2_354/shape_and_slices)]]

EDIT: I solved my recent problem.Although only these files exist(model-new.ckpt-2000000.data-00000-of-00001, model-new.ckpt-2000000.index, model-new.ckpt-2000000.meta), but we can still use this model-new.ckpt-2000000 to call.

FengLoveSS on 8 Apr 2017

@cshallue running your script to port the checkpoint file in turn generates three files: index, data and meta. Using the data (with and without renaming) file as the checkpoint for the tutorial does not work.

iamgroot42 on 8 Apr 2017

I'm so close!

Would anybody be willing to upload @psycharo's checkpoints with @cshallue's updates? Trying to run the update script ("Now we need to rename 2 of the variables in the checkpoint file.") for me throws an error, most likely because I'm running everything on a Raspberry Pi.

(For reference, this is what I'm getting:)

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted

EDIT: Nevermind, installed Tensorflow on my main machine and was able to run the command without issue. But, then I ran into the same issue as @iamgroot42—three separate files, none of which work.

And just to be clear, these are the files it's generating:

model-new.ckpt-2000000.meta
checkpoint
model-new.ckpt-2000000.data-00000-of-00001
model-new.ckpt-2000000.index

lanewinfield on 9 Apr 2017

@lanewinfield What is your error info? Do you see your question and my question is the same?

FengLoveSS on 9 Apr 2017

@FengLoveSS my error is different, but it's because I'm attempting to use the files exported by that script. Here's the meat of it:

DataLossError (see above for traceback): Unable to open table file /home/pi/mirror/models/model-new.ckpt-2000000: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

but that's just me renaming the model-new.ckpt-2000000.data-00000-of-00001 file to model-new.ckpt-2000000 and trying that. I assumed it wouldn't work, and it doesn't.

lanewinfield on 9 Apr 2017

If I use @mathieuarbezhermoso's 3m checkpoint, I get this error:

NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files /home/pi/mirror/models/model.ckpt-3000000
     [[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]

(And if I try @cshallue's script on it, it has the same issue as the 2m checkpoint—four files output that won't work)

lanewinfield on 9 Apr 2017

@lanewinfield don't rename or do anything to the files. Just pass the common prefix as the input to the evluation script (model-new.ckpt-2000000 in your case).

iamgroot42 on 9 Apr 2017

👍1

@iamgroot42 Do you have this situation?
INFO:tensorflow:Running caption generation on 0 files matching
But my path is right..

FengLoveSS on 9 Apr 2017

@FengLoveSS Nope. Most probably your file path is wrong. Are you sure the path you provided is correct? Try an absolute path to the image and see if it works?

iamgroot42 on 9 Apr 2017

@iamgroot42 My original path was wrong. I really carelessly

FengLoveSS on 9 Apr 2017

👍1

@iamgroot42 Thanks for your help. That was the ticket (well, just using the directory, as the checkpoint file points to the others).

lanewinfield on 9 Apr 2017

👍1

I would like to release my trained model. Can someone let me know what all files do I need to share with the community?
I use TF 1.0 GPU version.
@cshallue

KranthiGV on 18 Apr 2017

## UPDATE: The 2M finetuned model checkpoint is now available!
Here's a version trained on the latest TF 1.0 on a GPU.
https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
I would release the finetuned version in a few days.
Open an issue on the (repository page) or email me at kranthi.[email protected] in case you have a problem setting it up.
Thank you

KranthiGV on 19 Apr 2017

👍12 ❤4 🎉2 😄2

When trying to upgrade checkpoint file for compatibly with TF 1.0, when using the above code by @cshallue use the relative paths. Use of absolute paths gives out an error at
tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)

OLD_CHECKPOINT_FILE = "./model.ckpt-2000000"
NEW_CHECKPOINT_FILE = "./model.ckpt-2000000"

pcnfernando on 30 Apr 2017

I trained on the TF 1.0.1 and python2.7 without finetuned.
https://github.com/withyou1771/im2txt

Captions for image 1.jpg:
0) a cat laying on top of a grass covered field . (p=0.002806)
1) a black and white cat laying on top of a grass covered field . (p=0.000498)
2) a black and white cat laying on top of a green field . (p=0.000412)

withyou1771 on 11 May 2017

I have released a version trained on the latest TF 1.0 on a GPU.
It has both 1M without finetuning and 2M with finetuning model checkpoints.
https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
Open an issue on the (repository page) or email me at _kranthi.[email protected]_ in case
you have a problem setting it up.
Thank you!

KranthiGV on 13 May 2017

👍11 🎉9 ❤5

@begongyal The latest version of tensorflow creates three files for checkpoints by default. Please do not delete or remove anything in your train_dir, and use:

tf.flags.DEFINE_string("train_dir","YOUR DIR OF TRAIN (seems to be ~/im2txt/model/train/)"
                      "Directory for saving and loading model checkpoints.")

in im2txt/train.py, I managed to get rid of the error.

LEAAN on 29 May 2017

Hi, anyone knows how to convert those above pre-trained models to protobuf models (.pb)?
I want to use them for Tensorflow Mobile.

Also, I need some information (because I do not train the models) as follows:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/ios/camera/CameraExampleViewController.mm

From line 37 to line 44:

// These dimensions need to match those the model was trained with.
const int wanted_input_width = 224;
const int wanted_input_height = 224;
const int wanted_input_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
const std::string input_layer_name = "input";
const std::string output_layer_name = "softmax1";

Thanks in advanced!

pvthuy on 10 Aug 2017

👍1

@psycharo Thanks for sharing your checkpoint! excellent work!!

iAInNet on 6 Sep 2017

I have successfully used the 1M model (model.ckpt-1000000) However I'm still struggling to use the fine-tuned 2M or 3M posted here. I've tried the solutions already discussed, but with no luck.

I'm using: Tensorflow 1.3 (for gpu), CUDA 8, cudnn 5.1. (I have yet to try to downgrade to TF 1.0, could this work?).

When using for simplex using the fine-tuned 2M model, as described posted by @psycharo, I get the errors discussed earlier:

NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/kernel" not found in checkpoint files /home/ubuntu/im2txt/data/model.ckpt-2000000

I can fix this issue by running the following code:

OLD_CHECKPOINT_FILE = "model.ckpt-2000000"
NEW_CHECKPOINT_FILE = "model2.ckpt-2000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

However, when I try to run the evaluation using the new model2, I get the following error:

NotFoundError (see above for traceback): Key lstm/basic_lstm_cell/kernel not found in checkpoint

Here is the full stacktrace

INFO:tensorflow:Loading model from checkpoint: /home/ubuntu/im2txt/data/model2.ckpt-2000000
INFO:tensorflow:Restoring parameters from /home/ubuntu/im2txt/data/model2.ckpt-2000000
2017-09-07 14:38:17.078647: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key lstm/basic_lstm_cell/kernel not found in checkpoint
2017-09-07 14:38:17.100193: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key lstm/basic_lstm_cell/bias not found in checkpoint
Traceback (most recent call last):
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 89, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 66, in main
    restore_fn(sess)
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 96, in _restore_fn
    saver.restore(sess, checkpoint_path)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1560, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key lstm/basic_lstm_cell/kernel not found in checkpoint
  [[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]

Caused by op u'save/RestoreV2_381', defined at:
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 89, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 52, in main
    FLAGS.checkpoint_path)
  File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 116, in build_graph_from_config
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
    dtypes=dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key lstm/basic_lstm_cell/kernel not found in checkpoint
  [[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]

Anyone have any idea how to make the fine-tuned model work?

RBirkeland on 7 Sep 2017

👍5

Thank you..

Giribushan on 8 Oct 2017

👍1

If someone is looking for the reformatted words_count file, here it is words_count.txt

RazinShaikh on 15 Oct 2017

no,what l want to ask you is your github ” Implementation of GoogLeNet by chainer”-- googlenet/googlenet.py中的what is“nutszebra_chainer”,it have confused me long time,thank you again! here is your code websit: https://github.com/nutszebra/googlenet/blob/master/googlenet.py

发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用

发件人: Razin Shaikhnotifications@github.com
发送时间: 2017年10月15日 16:09
收件人: tensorflow/modelsmodels@noreply.github.com
抄送: Subscribedsubscribed@noreply.github.com
主题: Re: [tensorflow/models] Pretrained model for img2txt? (#466)

If someone is looking for the reformatted words_count file, here it is words_count.txthttp://s000.tinyupload.com/?file_id=00024493683399847909

―
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/models/issues/466#issuecomment-336693957, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AX91IJthW-f4tSibnAYgSxeylgX6-zxXks5ssb2vgaJpZM4KIXMv.

y734451909 on 15 Oct 2017

Has anyone figured out how to export the im2txt trained model as a TensorFlow SavedModelBundle to be served by Tensorflow Serving?

tyler-lanigan-hs on 5 Dec 2017

Has anyone meet the problem of UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte?

Traceback (most recent call last):
File "/Users/hanyu/Downloads/models-master2/research/im2txt/im2txt/run_inference.py", line 153, in
im2txt()
File "/Users/hanyu/Downloads/models-master2/research/im2txt/im2txt/run_inference.py", line 140, in im2txt
image = f.read()
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 125, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 93, in _prepare_value
return compat.as_str_any(val)
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 106, in as_str_any
return as_str(value)
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

yh0903 on 9 Dec 2017

👍2

I don't know why I send the command:
bazel-bin/im2txt/train --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" --train_dir="${MODEL_DIR}/train" --train_inception=false --number_of_steps=1000000

there have a error:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcufft.so.8.0. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_fft.cc:344] Unable to load cuFFT DSO.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
* Error in `/usr/bin/python': double free or corruption (!prev): 0x000000000231f8e0 *
I don't know the error

LogicHolmes on 10 Dec 2017

@yh0903 To solve the unicode error, make sure the file is being read in binary mode in run_inference.py :
with tf.gfile.GFile(filename, "rb") as f:

NavneethS on 21 Dec 2017

@psycharo hi, thank you for providing us with so great model. I want to ask you some question. Have you noticed how the performance changes when you finetuen the model. Is the whole models' performance increasing or first the model performance(cider or bleu) drops a little, then it gradually increase.

vanpersie32 on 25 Dec 2017

Hi all,
I try to use pretrained models by @psycharo . When I test the model to get softmax output and LSTM states I get an error: "Key lstm/logits/biases not found in checkpoint" .

Tensorflow version is 1.0.1, python 2.7

This is console output:

universal@universal-ubuntu:~/anaconda3/envs/MyGAN$ python test.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: universal-ubuntu
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: universal-ubuntu
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.111.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017
GCC version: gcc version 4.9.3 (Ubuntu 4.9.3-13ubuntu2)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.111.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.111.0
INFO:tensorflow:Loading model from checkpoint: /home/universal/anaconda3/envs/MyGAN/im2txt/model/pre-trained/model-new-renamed.ckpt-2000000
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key lstm/logits/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key lstm/logits/weights not found in checkpoint
Traceback (most recent call last):
File "test.py", line 185, in
restore_fn(sess)
File "test.py", line 64, in _restore_fn
saver.restore(sess, checkpoint_path)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key lstm/logits/biases not found in checkpoint
[[Node: save/RestoreV2_379 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_379/tensor_names, save/RestoreV2_379/shape_and_slices)]]

Caused by op u'save/RestoreV2_379', defined at:
File "test.py", line 173, in
restore_fn = _create_restore_fn(checkpoint_path) # (inception_variables, inception_checkpoint_file)
File "test.py", line 55, in _create_restore_fn
saver = tf.train.Saver()
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
self.build()
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1070, in build
restore_sequentially=self._restore_sequentially)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 675, in build
restore_sequentially, reshape)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 242, in restore_op
[spec.tensor.dtype])[0])
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
dtypes=dtypes, name=name)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key lstm/logits/biases not found in checkpoint
[[Node: save/RestoreV2_379 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_379/tensor_names, save/RestoreV2_379/shape_and_slices)]]

And this is my code for testing:

`

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import math
import os.path
import time


import numpy as np
import tensorflow as tf
import image_embedding
import image_processing
import inputs as input_ops
tf.logging.set_verbosity(tf.logging.INFO)

        # Dimensions of Inception v3 input images.
image_height = 299
image_width = 299
image_format = "jpeg"
train_inception=False
embedding_size = 512
vocab_size = 12000
num_lstm_units = 512
# To match the "Show and Tell" paper we initialize all variables with a
# random uniform initializer.
    # Scale used to initialize model variables.
initializer_scale = 0.08
initializer = tf.random_uniform_initializer(
        minval=-initializer_scale,
        maxval=initializer_scale)
    # Collection of variables from the inception submodel.
inception_variables = []
inception_checkpoint_file="/home/universal/anaconda3/envs/MyGAN/im2txt/model/inception_v3.ckpt"
checkpoint_path="/home/universal/anaconda3/envs/MyGAN/im2txt/model/pre-trained/model-new-renamed.ckpt-2000000"


def _create_restore_fn(checkpoint_path):
    """Creates a function that restores a model from checkpoint.

    Args:
      checkpoint_path: Checkpoint file or a directory containing a checkpoint
        file.
      saver: Saver for restoring variables from the checkpoint file.

    Returns:
      restore_fn: A function such that restore_fn(sess) loads model variables
        from the checkpoint file.

    Raises:
      ValueError: If checkpoint_path does not refer to a checkpoint file or a
        directory containing a checkpoint file.
    """

    saver = tf.train.Saver()

    if tf.gfile.IsDirectory(checkpoint_path):
        checkpoint_path = tf.train.latest_checkpoint(checkpoint_path)
        if not checkpoint_path:
            raise ValueError("No checkpoint file found in: %s" % checkpoint_path)

    def _restore_fn(sess):
        tf.logging.info("Loading model from checkpoint: %s", checkpoint_path)
        saver.restore(sess, checkpoint_path)
        tf.logging.info("Successfully loaded checkpoint: %s",
                        os.path.basename(checkpoint_path))

    return _restore_fn

def process_image(encoded_image, thread_id=0):
    """Decodes and processes an image string.

    Args:
      encoded_image: A scalar string Tensor; the encoded image.
      thread_id: Preprocessing thread id used to select the ordering of color
        distortions.

    Returns:
      A float32 Tensor of shape [height, width, 3]; the processed image.
    """
    return image_processing.process_image(encoded_image,
                                      is_training=False,
                                      height=image_height,
                                      width=image_width,
                                      thread_id=thread_id,
                                      image_format=image_format)


g = tf.Graph()
with g.as_default():
    image_feed = tf.placeholder(dtype=tf.string, shape=[], name="image_feed")
    input_feed = tf.placeholder(dtype=tf.int64,
                                shape=[None],  # batch_size
                                name="input_feed")
    # Process image and insert batch dimensions.
    # build_inputs
    images = tf.expand_dims(process_image(image_feed), 0)
    input_seqs = tf.expand_dims(input_feed, 1)

    # """Builds the image model subgraph and generates image embeddings.
    inception_output = image_embedding.inception_v3(
        images,
        trainable=train_inception,
        is_training=False)
    inception_variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="InceptionV3")

    # Map inception output into embedding space.
    with tf.variable_scope("image_embedding") as scope:
        image_embeddings = tf.contrib.layers.fully_connected(
            inputs=inception_output,
            num_outputs=embedding_size,
            activation_fn=None,
            weights_initializer=initializer,
            biases_initializer=None,
            scope=scope)

    # Save the embedding size in the graph.
    tf.constant(embedding_size, name="embedding_size")

    with tf.variable_scope("seq_embedding"), tf.device("/cpu:0"):
        embedding_map = tf.get_variable(
            name="map",
            shape=[vocab_size, embedding_size],
            initializer=initializer)
        seq_embeddings = tf.nn.embedding_lookup(embedding_map, input_seqs)

    # This LSTM cell has biases and outputs tanh(new_c) * sigmoid(o), but the
    # modified LSTM in the "Show and Tell" paper has no biases and outputs
    # new_c * sigmoid(o).
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(
        num_units=num_lstm_units, state_is_tuple=True)

    with tf.variable_scope("lstm", initializer=initializer) as lstm_scope:
        # Feed the image embeddings to set the initial LSTM state.
        zero_state = lstm_cell.zero_state(
            batch_size=image_embeddings.get_shape()[0], dtype=tf.float32)
        _, initial_state = lstm_cell(image_embeddings, zero_state)

        # Allow the LSTM variables to be reused.
        lstm_scope.reuse_variables()

        # In inference mode, use concatenated states for convenient feeding and
        # fetching.
        tf.concat(axis=1, values=initial_state, name="initial_state")

        # Placeholder for feeding a batch of concatenated states.
        state_feed = tf.placeholder(dtype=tf.float32,
                                    shape=[None, sum(lstm_cell.state_size)],
                                    name="state_feed")
        state_tuple = tf.split(value=state_feed, num_or_size_splits=2, axis=1)

        # Run a single LSTM step.
        lstm_outputs, state_tuple = lstm_cell(
            inputs=tf.squeeze(seq_embeddings, axis=[1]),
            state=state_tuple)

        # Concatentate the resulting state.
        tf.concat(axis=1, values=state_tuple, name="state")

        # Stack batches vertically.
        lstm_outputs = tf.reshape(lstm_outputs, [-1, lstm_cell.output_size])

        with tf.variable_scope("logits") as logits_scope:
            logits = tf.contrib.layers.fully_connected(
                inputs=lstm_outputs,
                num_outputs=vocab_size,
                activation_fn=None,
                weights_initializer=initializer,
                scope=logits_scope)

        tf.nn.softmax(logits, name="softmax")

    restore_fn = _create_restore_fn(checkpoint_path)  # (inception_variables, inception_checkpoint_file)

g.finalize()


input_files= "/media/universal/264CB8084CB7D0B3/MSCOCO/raw-data/train2014/COCO_train2014_000000000009.jpg"
filenames = []
for file_pattern in input_files.split(","):
    filenames.extend(tf.gfile.Glob(file_pattern))

with tf.Session(graph=g) as sess:
    # Load the model from checkpoint.
    restore_fn(sess)
    for filename in filenames:
        with tf.gfile.GFile(filename, "rb") as f:
            image = f.read()

            #partial_captions_list = partial_captions.extract()
            #input_feed = np.array([c.sentence[-1] for c in partial_captions_list])
            # build_inputs
            # Test feeding a batch of inputs and LSTM states to get softmax output and
            # LSTM states.
            input_feed = np.random.randint(0, 10, size=3)
            state_feed = np.random.rand(3, 1024)
            feed_dict = {"input_feed:0": input_feed, "lstm/state_feed:0": state_feed, "image_feed:0": image}

            lstm_outputs_out = sess.run([softmax, lstm_outputs], feed_dict=feed_dict)
    print(lstm_outputs_out)
    """"""

`

What has gone wrong?
Are there any ckpt file with these vars?

When I generate cattions by runnung run_inference.py file, everything is OK. But I need to create my own model based on Im2Txt so I want to know how it works.

Thank you in advance

ksenyakor on 11 Feb 2018

Hello,

I am running the script " bazel-binim2txt\run_inference --checkpoint_path=${CHECKPOINT_DIR} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}" using python 3.5.2 under windows 7.
Python crashs with the message "Python has stopped working", could you please advice what is wrong!
Thank you

JZakraoui on 4 May 2018

@victoriastuart
@cshallue

I am running the script " bazel-binim2txt\run_inference --checkpoint_path=${CHECKPOINT_DIR} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}" using python 3.5.2 (downloaded with Anaconda 3) under windows 7.
Python crashs with the message "Python has stopped working", could you please advice what is wrong!
I need only to caption some images and use a pretained model.
Thank you

JZakraoui on 4 May 2018

@JZakraoui

I work in Linux, not Windows. ;-)
I don't know your level of experience, but as a general suggestion I would suggest reading up on creating and using Python virtual environments (venv) anytime you are installing and working with new software/projects. In my opinion, it will save you a lot of headaches in the long-run (preserving, e.g., your system and it's "base" Python installation ...
Not to be dismissive, but "Python crashes with the message 'Python has stopped working' ...", by itself, is not very helpful:
- "My power went out! Why?" << Blown fuse? Powerline down? Hurricane? ...
- "My stomach hurts -- why?" << Indigestion? Hunger pangs? Stress? Ulcers? ...
Again -- as a general practice -- include the exact error message, and the preceding 10 or 50 or 100 lines of code/messages (whatever concisely encapsulates the issue, in your opinion) whenever you describe a problem plus relevant system details: operating system (as you did), programming language / environment, program versions ... anything relevant.
Not to ask the obvious, but did you "Google" this issue. Although often very archaic, error messages often indicate the precise nature of the issue, so searchin on that topic(s) leads to greater understanding of the problem.

Again (my opinion), indicating that you tried to understand your problem and that you searched for a solution carries much weight, when finally asking for help.
NEVER give up! Seriously: we ALL start somewhere! Tthings that seem really complicated at the time often seem much less complicated in hindsight, with aquired knlowledge and experience.

Just my thoughts; I do hope you sort this out! Post back here with additional detail, and perhaps someone can help. :-)

victoriastuart on 4 May 2018

@victoriastuart thank you
@psycharo @KranthiGV @cshallue
I am running the script
bazel-binim2txt\run_inference --checkpoint_path=%CHECKPOINT_PATH% --vocab_file=%VOCAB_FILE% --input_files=%IMAGE_FILE%

Python 3.5.5
tensorflow 1.8.0
windows 7(64 bit), CPU
@psycharo pre-trained model

I got the following error:
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "lstm/basic_lstm_cell/bias" not found in checkpoint files C:\Users\USER\Documents\models\pretrained1\model.ckpt-2000000

Any advice? thank you

JZakraoui on 9 May 2018

@JZakraoui Seems like variable names for basic_lstm_cell were changed again. You can change the variable name as pointed out by @cshallue. Copying his code however notice the variable names

OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000"
NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

It works for me with
Python 3.6.4
tensorflow 1.7.0
@psycharo's pre-trained model

vpaharia on 23 May 2018

👍6

Can confirm that @vpaharia latest fix works. Steps to follow on Python 3.5.5, TF-gpu 1.8:

Download @psycharo pre-trained model (e.g fine-tuned) and words_count.txt
fix words_count.txt by replacing line 49 in vocabulary.py with:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]
or use already fixed file that was provided by @RazinShaikh without changing the code
use the script provided by @vpaharia - replace paths for checkpoint files correctly (e.g. ./model.ckpt-2000000 if the files are in current directory).
run inference, example:
python3 im2txt/run_inference.py --checkpoint_path=models/model.ckpt-2000000 --vocab_file=models/word_counts.txt --input_files images/image1.jpg

In my case I created models directory where I extracted @psycharo learned models, I have also put the above mentioned script in this directory to fix the models (replaced paths with ./model.ckpt-2000000). I hope that this helps others, so that they don't have to look through all the posts :)

ds2268 on 7 Jun 2018

👍2 😄1

@cshallue Thank you so much for your help!

Here is a 5000000 step model using TF 1.9:
https://github.com/Gharibim/Tensorflow_im2txt_5M_Step

Gharibim on 16 Jul 2018

👍6 😄2

Hey, thank you for the checkpoint files! I was wondering if anyone managed to use one of them to fine-tune the model with new data? How would the word counts file need to look like? Does the newly created word counts file from the new dataset need to be merged with the one from MSCOCO?

I am currently running the fine tuning with a merged word counts file but encountering two problems:

1.) the captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2
1) day day day day day day day day day day day day . <S> . <S> <S> . (p=0.011221)

2.) I let the model fine-tune over night and somehow only the last 5 checkpoints got saved. Does anyone know how to prevent the overwriting of checkpoints and keep all of them?

Thank you in advance!

coliinkc on 28 Jun 2019

@JZakraoui Seems like variable names for basic_lstm_cell were changed again. You can change the variable name as pointed out by @cshallue. Copying his code however notice the variable names

OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000"
NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000"

import tensorflow as tf
vars_to_rename = {
    "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel",
    "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
  if old_name in vars_to_rename:
    new_name = vars_to_rename[old_name]
  else:
    new_name = old_name
  new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))

init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)

with tf.Session() as sess:
  sess.run(init)
  saver.save(sess, NEW_CHECKPOINT_FILE)

It works for me with
Python 3.6.4
tensorflow 1.7.0
@psycharo's pre-trained model

With this I could load the checkpoint file too, notice the variable names / values in the checkpoint file model.ckpt-2000000:

tensor_name: lstm/BasicLSTMCell/Linear/Bias
[-0.89432126 -0.34625703 0.16128121 ... 0.48277333 -0.5986251
1.2891939 ]
tensor_name: lstm/BasicLSTMCell/Linear/Matrix
[[ 0.16781631 -0.04221911 0.24709763 ... 0.04963883 -0.08704979
0.03227773]

sandipan on 4 Dec 2019

he captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2

I've noticed that small amount of steps generate pretty bad results. I get one word responses and I'm around 366906 steps. I'm going to continue running and see how the results improve.

I additionally had to follow the comment here: https://github.com/tensorflow/models/issues/7204#issuecomment-513319623

idmontie on 8 May 2020

👍1

he captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2

I've noticed that small amount of steps generate pretty bad results. I get one word responses and I'm around 366906 steps. I'm going to continue running and see how the results improve.

I additionally had to follow the comment here: #7204 (comment)

Hey! can you please tell us if the results improved? and how many steps did it take? I would be really grateful if you could share your log file as well! Thank you in advance!

Rawan19 on 31 May 2020

Models: Pretrained model for img2txt?

Please let us know which model this issue is about (specify the top-level directory)

Most helpful comment

All 111 comments

Related issues