models/img2txt
Can someone release a pre-trained model for the img2txt model trained on COCO? Would be great for someone here who doesn't have the computational resource yet to do a full training run. Thanks!
@cshallue: could you comment on this? Thanks.
+1
Sorry, we're not releasing a pre-trained version of this model at this time.
here are links to a pre-trained model:
@psycharo thanks for sharing! Perhaps you could also share your word_counts.txt file. Different versions of the tokenizer can yield different results, so your model is specific to the word_counts.txt file that you used.
@psycharo my training is still training on our GPU instance. It seems it would take another two weeks to finish. I would appreciate it if you would also release the fine-tuned model.
@psycharo Thanks for sharing your checkpoint!
When I try to use it I'm getting the error: "ValueError: No checkpoint file found in: None".
I don't have any trouble doing run_inference my own checkpoint files but I can't do it on yours. I've tried lots of things: adding a trailing "/", using absolute paths, relative paths, ..... Nothing seems to work.
Suggestions welcomed.
@cshallue - Any thoughts?
Thanks all.
Last login: Sat Oct 15 07:10:56 2016 from 3.202.121.241
user123@myhost:~$ ls -l /tmp/checkpoint_tmp/
total 175356
-rw-r--r-- 1 user123 user123 19629588 Oct 15 07:04 graph.pbtxt
-rw-r--r-- 1 user123 user123 149088120 Oct 15 07:04 model.ckpt-2000000
-rw-r--r-- 1 user123 user123 10675545 Oct 15 07:04 model.ckpt-2000000.meta
-rw-rw-r-- 1 user123 user123 156438 Oct 15 07:08 word_counts.txt
user123@myhost:~$ /data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference --checkpoint_path=/tmp/checkpoint_tmp --vocab_file=/tmp/checkpoint_tmp/word_counts.txt --input_files=${IMAGE_FILE}
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 49, in main
FLAGS.checkpoint_path)
File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 118, in build_graph_from_config
return self._create_restore_fn(checkpoint_path, saver)
File "/data/home/user123/tensorflow_models/models/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 92, in _create_restore_fn
raise ValueError("No checkpoint file found in: %s" % checkpoint_path)
ValueError: No checkpoint file found in: None
user123@myhost:~$
@ProgramItUp Try the following: --checkpoint_path=/tmp/checkpoint_tmp/model.ckpt-2000000
When you pass a directory, it looks for a "checkpoint state" file in that directory, which is an index of all checkpoints in the directory. Your directory doesn't have a checkpoint state file, but you can just pass it the explicit filename.
Getting better, but...
Traceback (most recent call last):
File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 53, in main
vocab = vocabulary.Vocabulary(FLAGS.vocab_file)
File "/home/gamma/bin/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/vocabulary.py", line 50, in __init__
assert start_word in reverse_vocab
AssertionError
Looks like the word_counts.txt file above is not formatted as expected:
b'a' 969108
b'</S>' 586368
b'<S>' 586368
b'.' 440479
b'on' 213612
b'of' 202290
b'the' 196219
b'in' 182598
b'with' 152984
...
vocabulary.py expects:
a 969108
</S> 586368
<S> 586368
. 440479
on 213612
of 202290
the 196219
in 182598
with 152984
...
A quick fix is to reformat the word_counts.txt in that way. Or, you could replace line 49 of vocabulary.py with
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]
In the long run, I'll come up with a way to make sure word_counts.txt is outputted the same for everyone.
It works!
http://stablescoop.horseradionetwork.com/wp-content/uploads/2013/10/ep271.jpg
Captions for image cb340488986cc40f8ec610348b7f5a24.jpg:
0) a woman is standing next to a horse . (p=0.000726)
1) a woman is standing next to a horse (p=0.000638)
2) a woman is standing next to a brown horse . (p=0.000373)
@PredragBoksic great!
@psycharo , what version of python did you use to generate the word_counts.txt file?
I expect the script to output lines of the form:
a 969108
</S> 586368
<S> 586368
not:
b'a' 969108
b'</S>' 586368
b'<S>' 586368
I didn't generate the word_counts.txt file. I changed the line 49 as you suggested it, with:
""" WORKAROUND for vocabulary file """
"""reverse_vocab = [line.split()[0] for line in reverse_vocab]"""
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]
I have Python 2.7.12 on KUbuntu 16.04 with CUDA 8.0 and CUDNN 5.1 and GTX970. I would not know how to do it in Python, because I program in Java usually. Do you need some code to change that file?
@PredragBoksic I'm asking the creator of that file. You can just keep using the workaround :)
@cshallue python 3.5. I had to make a couple of dirty hacks to make it work on that version of python, this is why word_counts.txt looks different.
@psycharo How many hours did this take to train? I think that people would appreciate what you shared more if you mentioned this.
@PredragBoksic
initial training took about 2-3 days, finetuning for 1m iterations took around 5-6 days. I used single GPU, Tesla P100.
@cshallue Thanks for the prompt replies. Your suggestions worked.
I was not able to follow the full execution path of the code:
Where would be the right place to put a bit of error checking to make sure that the files
--checkpoint_path, --vocab_file, --input_files exist and throw an error if they don't?
In the case of the checkpoint file it would be helpful to throw an error if "checkpoint state" is not found.
Where would this happen?
Thanks.
There are already error checks for all those things.
If no checkpoint state or no checkpoint file is found in --checkpoint_path, it will fail the check here.
If --vocab_file doesn't exist it will fail the check here.
If no files match --input_files then you will get the message "Running caption generation on 0 files matching..." and inference will exit: see here.
I did not notice any meaningful error messages, for example when the image file was missing. I suppose that this functionality will be completed in the future.
@cshallue: I am running the finetuning step of the optimization. What I noticed was that the loss function is not changing much for the initial 22000 steps. The loss is pretty much stuck at 2.40.
I have attached the log file by pumping the stderr to a text file. Is the loss going to go significantly down in the remaining iterations? Or am I missing some "gotcha"?
log_finetune.txt
@siavashk The loss reported by the training script is expected to be really noisy: it reports on single batches of only 32 examples.
Are you running the evaluation script on the validation files? We expect to see validation perplexity decreasing slowly. It decreases slowly because the model is already near optimal and because we use a smaller learning rate during finetuning.
@cshallue Maybe I am overly anxious, 22000 steps is about 1% of the optimization. I am just worried that it has been three weeks since I started training this model, and it seems it is going to take another two weeks for it to converge.
I am not running the validation script, since the training itself is taking too long (it's been three weeks now and I am at 1 million iterations). I thought running an additional validation step would make this even longer.
You won't be able to tell much from the training losses for a single batch any more. They will keep jumping around.
You could always just use the model in its current form. It will probably be sensible. There is not much improvement after 1M steps of fine tuning.
Or you could use the model shared in this thread above.
@siavashk do we need to rerun the pre-training step if we use the word_counts.txt
file from @psycharo or what is the correct workflow here?
@hholst80, I don't think you need to pre-train. Here is how I used @psycharo's pre-trained model:
@psycharo Thanks a lot! It saves much time!
Would you please share the latest checkpoint that you got?
I almost finished the training (3000 000) if somebody is interested. I'll train something with the inception resnet V2 later.
@TRGNN
Could you please share the trained model after finish it? I think a great number of people is interested in it and would appreciate it. Thanks!
@TRGNN
Could you, please, also post your trained data with the inception resnet V2 ? Thank you very much
You people (all) are fabulous!
I needed to edit the TensorFlow-provided im2txt scripts (and add to my $PYTHONPATH -- py27 venv -- via a *.pth file), as the paths in the script in the via github cloned repo (https://github.com/tensorflow/models/tree/master/im2txt) were not working for me. I did all of this without the use of bazel -- just straight-up edits in an editor and implementations in a terminal (linux).
I downloaded psycharo's pretrained model (thank you very much!), edited the vocabulary.py file as suggested by cshallue and -- presto! -- I'm successfully classifying images! :-)
Thank you to all involved. :-)
@victoriastuart , I believe that Bazel works well if you clone the entire repository, enter the appropriate folder and use Bazel execution in that ../folder. It's counterintuitive.
@PredragBoksic: ahh good to know - thanks! I'm new to bazel ecosystem, and the instructions on the im2txt site are not as clear as they could be, in my opinion. Anyway, it's working and even better, I learned a lot while sorting it out! ;-)
@siavashk hi: when run the evaluate script ,what is the train and eval dir?
@ProgramItUp hi :before run the script use the pre-trained model , what i need to do?
Thanks @psycharo @cshallue !!!
I could successfully run the model thanks to you guy 😄
I'm on Python 3.5 and the fix:
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]
did not work for me. The bytes were not being decoded into strings, so I was getting the same assertion error. The following does work for me:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]
Thanks for the model @psycharo !
@TRGNN
Sir,
Did you ever post the model you trained onto the internet anywhere ?
It's great to share @psycharo
It would be great if it supports TensorFlow Serving, just like the Inception model.
I may open another issue to track this issue if anyone is interested in this.
@outcastrift I just uploaded it. Sorry for the delay guys. The archive contain the model checkpoint @3000000 steps (1000000 without and 2000000 with inception training) and the word_count file.
https://drive.google.com/open?id=0B_qCJ40uBfjEWVItOTdyNUFOMzg
@TRGNN cool man
hi ,I meet a question while running my demo:
CRITICAL:tensorflow:Vocab file /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt not found.
Traceback (most recent call last):
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 53, in main
vocab = vocabulary.Vocabulary(FLAGS.vocab_file)
File "/home/ubuntu/models-master/im2txt/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/vocabulary.py", line 48, in __init__
reverse_vocab = list(f.readlines())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 128, in readlines
self._preread_check()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/lib/io/file_io.py", line 73, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt
However, the word_counts.txt is in the direction......I don't know how to deal it. Please help me. Thank you!
@adaxidedakaonang yesterday I encountered some the same thing.
There seems to be a bug reading command line arguments. This is a real hack but one thing you can do for your demo is hard code the FLAGS variables in im2txt/run_inference.py with the full path and file names. Note: there are multiple copies of run_inference.py after running Bazel, change the original file do not try changing a copy in bazel-bin.
Make sense?
I have found this mistake, that's:
tensorflow.python.framework.errors_impl.NotFoundError: /home/ubuntu/nmodels-master/im2txt/im2txt/data/word_counts.txt
This is 'models' rather than 'nmodels'. I input an 'n' mistakely ! However, I'm be grateful for you! Thanks!
Now I meet a question while running this code with os.popen, this is my code :
`import os
import subprocess
def getMessage(img):
jpg = img
cmdline = '''
CHECKPOINT_DIR="${HOME}/models-master/im2txt/im2txt/model/model.ckpt-3000000" & \
VOCAB_FILE="${HOME}/models-master/im2txt/im2txt/data/word_counts.txt" & \
IMAGE_FILE="%s" & \
bazel build -c opt im2txt/run_inference & \
export CUDA_VISIBLE_DEVICES="" & \
bazel-bin/im2txt/run_inference \
--checkpoint_path=/home/ubuntu/models-master/im2txt/im2txt/model/model.ckpt-3000000 \
--vocab_file= /home/ubuntu/models-master/im2txt/im2txt/data/word_counts.txt\
--input_files=%s
''' % (jpg, jpg)
print os.popen(cmdline)
getMessage(img='/home/ubuntu/models-master/im2txt/im2txt/data/dog.jpg')
I think you may know what this code mean: I try to run it in shell.
Note: the last three sentence: When I run it in shell directly it is this:
bazel-bin/im2txt/run_inference \
--checkpoint_path=${CHECKPOINT_DIR} \
--vocab_file=${VOCAB_FILE} \
--input_files=${IMAGE_FILE}
However, I run it by os.popen it begin to be like this:
bazel-bin/im2txt/run_inference --checkpoint_path=${HOME}/models-master/im2txt/im2txt/model/model.ckpt-3000000 --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}。
Do you know how to write it in 'cmdline'? Thank you~
If anyone gets error AttributeError: 'module' object has no attribute 'BasicLSTMCell'
, you can reset your git HEAD to the below commit. Seems the models repo has undergone lots of changes since December 2016.
$ git reset --hard 9997b250
@mathieuarbezhermoso vs @psycharo whose model should i use ??
Does anyone has a inception latest trained model ?
I dont have tesla's i cant train
Hi,
You just need a decent nVidia card for training. No need for a Tesla. A
GeForce is good.
You can use the Inception V3 checkpoint I shared. I'll provide a link as
soon as I'll be home if needed.
Le 3 févr. 2017 15:39, "ricking06" notifications@github.com a écrit :
@mathieuarbezhermoso https://github.com/mathieuarbezhermoso vs @psycharo
https://github.com/psycharo whose model should i use ??Does anyone has a inception latest trained model ?
I dont have tesla's i cant train
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/466#issuecomment-277262510,
or mute the thread
https://github.com/notifications/unsubscribe-auth/APQ4WHxO5B42j7HS682HeChyLuTvejzAks5rYzw3gaJpZM4KIXMv
.
i downloaded this model http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
i could not find the word count of this model.
Please share the link after you get home
thanks.
tf.learn.latest_checkpoint returns a None .
I downloaded the @psycharo model into im2txt/model/train/
whats wrong ?
Hi
I am trying to convert @mathieuarbezhermoso checkpoint into Const ops using freeze_graph.py
and it needs file .pb as input_graph , so how can I generate it ?
or if any one have it I`ll be thankful to you.
Hi!
Did anybody get following error when trying to run the checkpoint with Tensorflow1.0?
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files im2txt/im2txt_pretrained2/model.ckpt-3000000
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]
Thanks!
@tintelle Looks the default variable names for the BasicLSTMCell were changed in TensorFlow1.0, and they no longer match the checkpoint. See the following thread for a pointer for renaming variables from checkpoints:
http://stackoverflow.com/questions/37086268/rename-variable-scope-of-saved-model-in-tensorflow
@cshallue I'd really-really appreciate if you could post the steps to rename the variables wrt @tintelle's post. I'm not able to follow the solution mentioned on that stackoverflow thread.
Something like this should work:
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(path_to_checkpoint)
for old_name in reader.get_variable_to_shape_map():
new_name = ... # Rename as desired
new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
saver.save(sess, path_to_new_checkpoint)
This guide walks you through the major changes in the API and how to automatically upgrade your programs for TensorFlow 1.0. This guide not only steps you through the changes but also explains why we've made them.
https://www.tensorflow.org/install/migration
I haven't had time to look into renaming the variables yet, I fixed it in the end by installing Tensorflow0.12 in a separate environment. You can download the binary here: https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.0-py2-none-any.whl and install it in your virtual environment like this:
export TF_BINARY_URL=path/to/tensorflow0.12
pip install --ignore-installed --upgrade $TF_BINARY_URL
@cshallue I also met the trouble just as @tintelle , what makes me puzzled is how can i know the variable old_name and new_name for the BasicLSTMCell.
Thank you very much!
@mathieuarbezhermoso - Hey buddy. Thanks a tonne for sharing your trained model. If you got a chance to do further training after 3 million steps, would you mind sharing it?
Can some one help me figure out this issue ? the problems is @ tf.concat(initial_state, 1, name="initial_state"). I run the run_inference.py and I get this error.
Traceback (most recent call last):
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/run_inference.py", line 85, in
tf.app.run()
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/run_inference.py", line 50, in main
FLAGS.checkpoint_path)
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 115, in build_graph_from_config
self.build_model(model_config)
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/inference_wrapper.py", line 38, in build_model
model.build()
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/show_and_tell_model.py", line 359, in build
self.build_model()
File "/home/bisrat/startupml/tensorflow_models/im2txt/im2txt/show_and_tell_model.py", line 269, in build_model
tf.concat(initial_state, 1, name="initial_state")
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1075, in concat
dtype=dtypes.int32).get_shape(
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/bisrat/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
they changed the ordering of the parameters of the concat function in different Tensorflow versions. I'm using version 0.12, there it should be: tf.concat(1, initial_state, name="initial_state"). It's complaining because its expecting an int, but got the initial_state instead.
@tintelle I fixed the issue by installing version 0.12 and reset my git HEAD to
$ git reset --hard 9997b250
Thank you for your quick reply.
Hi!
Did anybody release a pre-trained model with Tensorflow1.0?
Thanks very much!!!
@bis-carbon - This method worked for me to fix the problem of "Tensor name "lstm/basic_lstm_cell/biases" not found in checkpoint files". I downgraded to Tensorflow 0.12 and reset to 9997b25. Now I have captions!
@liyd I guess the pre-trained model above work as well for 1.0
I didn't try for im2txt
, but it did work for inception
.
There is a branch named update-models-1.0
. Checkout that branch and try it 😄
It seems like im2txt
also updated to TF 1.0 on the branch.
@tae-jun thank you for your help!
I encounter the problem just as @tintelle, I think this problem due to the default variable names in parameter model were changed in TensorFlow1.0, not because of course code. So update-models-1.0 may not work.
@bis-carbon would you please explain why we need to edit inception files, while we are using im2txt model
Just wondering if anyone solved
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files im2txt/im2txt_pretrained2/model.ckpt-3000000
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]
in Tensorflow v1.0
(tensorflow)user@main:~$ ls -l im2txt/model/train/
total 175200
-rw-r--r-- 1 user user 19629588 Oct 6 04:08 graph.pbtxt
-rw-r--r-- 1 user user 149088120 Oct 12 09:39 model.ckpt-2000000
-rw-r--r-- 1 user user 10675545 Oct 12 09:39 model.ckpt-2000000.meta
When I put only "graph.pbtxt", "model.ckpt-2000000", and "model.ckpt-2000000.meta" in train directory, it shows an error message that "No checkpoint file found in: None". So I tried "--checkpoint_path=im2txt/model/train/model.ckpt-2000000", but "bash: --checkpoint_path=im2txt/model/train/model.ckpt-2000000: No such file or directory" appears. What should I do? Did I do something wrong?
Actually, I tried to train but I thought that it would take too much time. So I stopped it when it was training 16700th one. And then I noticed that there were many files such as "checkpoint", "model.ckpt-16588.data-00000-of-00001", and etc.
(tensorflow)user@main:~$ ls -l im2txt/model/train/
total 1184160
-rw-rw-r-- 1 user user 457 3월 5 17:50 checkpoint
-rw-rw-r-- 1 user user 4056412 2월 26 23:39 events.out.tfevents.1488119947.main
-rw-rw-r-- 1 user user 4056412 3월 3 14:29 events.out.tfevents.1488518929.main
-rw-rw-r-- 1 user user 4056412 3월 3 14:39 events.out.tfevents.1488519554.main
-rw-rw-r-- 1 user user 265001124 3월 5 17:58 events.out.tfevents.1488526808.main
-rw-r--r-- 1 user user 19629588 10월 6 04:08 graph.pbtxt
-rw-rw-r-- 1 user user 149002244 3월 5 17:10 model.ckpt-16588.data-00000-of-00001
-rw-rw-r-- 1 user user 16876 3월 5 17:10 model.ckpt-16588.index
-rw-rw-r-- 1 user user 2170369 3월 5 17:10 model.ckpt-16588.meta
-rw-rw-r-- 1 user user 149002244 3월 5 17:20 model.ckpt-16634.data-00000-of-00001
-rw-rw-r-- 1 user user 16876 3월 5 17:20 model.ckpt-16634.index
-rw-rw-r-- 1 user user 2170369 3월 5 17:20 model.ckpt-16634.meta
-rw-rw-r-- 1 user user 149002244 3월 5 17:30 model.ckpt-16682.data-00000-of-00001
-rw-rw-r-- 1 user user 16876 3월 5 17:30 model.ckpt-16682.index
-rw-rw-r-- 1 user user 2170369 3월 5 17:30 model.ckpt-16682.meta
-rw-rw-r-- 1 user user 149002244 3월 5 17:40 model.ckpt-16730.data-00000-of-00001
-rw-rw-r-- 1 user user 16876 3월 5 17:40 model.ckpt-16730.index
-rw-rw-r-- 1 user user 2170369 3월 5 17:40 model.ckpt-16730.meta
-rw-rw-r-- 1 user user 149002244 3월 5 17:50 model.ckpt-16777.data-00000-of-00001
-rw-rw-r-- 1 user user 16876 3월 5 17:50 model.ckpt-16777.index
-rw-rw-r-- 1 user user 2170369 3월 5 17:50 model.ckpt-16777.meta
-rw-r--r-- 1 user user 149088120 10월 12 09:39 model.ckpt-2000000
-rw-r--r-- 1 user user 10675545 10월 12 09:39 model.ckpt-2000000.meta
Generating captions does work when I put them in the train directory. However, it seems like it uses only 16700 of data directly trained. It shows the same inaccurate results regardless of whether there are the above files such as "model.ckpt-2000000" or not.
That's the reason why I am confused. Do I need to additionally have some appropriate files such as "checkpoint"?
Any helps would be appreciated.
Thanks.
A few people are having trouble using @psycharo 's trained checkpoint since the release of TF 1.0. Here are the steps you can follow.
Upgrade to the latest version of TensorFlow and fetch the latest version of this repository.
Firstly the word_counts.txt file provided above was generated with Python 3, so it wrote all the words like this: b'word'
. You may need to rewrite that file. The following code worked for me on Python 2.7; you may have to tweak if you are using something different.
OLD_VOCAB_FILE = "word_counts.txt"
NEW_VOCAB_FILE = "word_counts2.txt"
with open(OLD_VOCAB_FILE) as f:
lines = list(f.readlines())
def clean_line(line):
tokens = line.split()
return "%s %s" % (eval(tokens[0]), tokens[1])
newlines = [clean_line(line) for line in lines]
with open(NEW_VOCAB_FILE, "w") as f:
for line in newlines:
f.write(line + "\n")
Now we need to rename 2 of the variables in the checkpoint file.
OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000"
NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000"
import tensorflow as tf
vars_to_rename = {
"lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
"lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
if old_name in vars_to_rename:
new_name = vars_to_rename[old_name]
else:
new_name = old_name
new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
saver.save(sess, NEW_CHECKPOINT_FILE)
Now it should work!
CHECKPOINT_PATH=".../model-renamed.ckpt-2000000"
VOCAB_FILE=".../word_counts2.txt"
IMAGE_FILE=".../COCO_val2014_000000224477.jpg"
# Build the inference binary.
bazel build -c opt im2txt/run_inference
# Run inference to generate captions.
bazel-bin/im2txt/run_inference \
--checkpoint_path=${CHECKPOINT_PATH} \
--vocab_file=${VOCAB_FILE} \
--input_files=${IMAGE_FILE}
@cshallue I'm following your steps. In the last step(# Build the inference binary. And # Run inference to generate captions.)Why i will get this wrong info?
INFO: Found 1 target...
Target //im2txt:run_inference up-to-date:
bazel-bin/im2txt/run_inference
INFO: Elapsed time: 0.290s, Critical Path: 0.02s
INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: /home/ljf/LiJunFeng/im2txt/word_counts2.txt
INFO:tensorflow:Created vocabulary with 11520 words
* Error in `/usr/bin/python2.7': double free or corruption (!prev): 0x0000000000d49010 *
Aborted (core dumped)
EDIT: After i changed my version of python(2.7.6->3.4.3), I solved the problem. But ,I got a new question like @lanewinfield
DataLossError (see above for traceback): Unable to open table file /home/ljf/LiJunFeng/im2txt/model.ckpt-2000000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
[[Node: save/RestoreV2_354 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_354/tensor_names, save/RestoreV2_354/shape_and_slices)]]
EDIT: I solved my recent problem.Although only these files exist(model-new.ckpt-2000000.data-00000-of-00001, model-new.ckpt-2000000.index, model-new.ckpt-2000000.meta), but we can still use this model-new.ckpt-2000000 to call.
@cshallue running your script to port the checkpoint file in turn generates three files: index, data and meta. Using the data (with and without renaming) file as the checkpoint for the tutorial does not work.
I'm so close!
Would anybody be willing to upload @psycharo's checkpoints with @cshallue's updates? Trying to run the update script ("Now we need to rename 2 of the variables in the checkpoint file.") for me throws an error, most likely because I'm running everything on a Raspberry Pi.
(For reference, this is what I'm getting:)
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
EDIT: Nevermind, installed Tensorflow on my main machine and was able to run the command without issue. But, then I ran into the same issue as @iamgroot42—three separate files, none of which work.
And just to be clear, these are the files it's generating:
model-new.ckpt-2000000.meta
checkpoint
model-new.ckpt-2000000.data-00000-of-00001
model-new.ckpt-2000000.index
@lanewinfield What is your error info? Do you see your question and my question is the same?
@FengLoveSS my error is different, but it's because I'm attempting to use the files exported by that script. Here's the meat of it:
DataLossError (see above for traceback): Unable to open table file /home/pi/mirror/models/model-new.ckpt-2000000: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
but that's just me renaming the model-new.ckpt-2000000.data-00000-of-00001
file to model-new.ckpt-2000000
and trying that. I assumed it wouldn't work, and it doesn't.
If I use @mathieuarbezhermoso's 3m checkpoint, I get this error:
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/weights" not found in checkpoint files /home/pi/mirror/models/model.ckpt-3000000
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]
(And if I try @cshallue's script on it, it has the same issue as the 2m checkpoint—four files output that won't work)
@lanewinfield don't rename or do anything to the files. Just pass the common prefix as the input to the evluation script (model-new.ckpt-2000000 in your case).
@iamgroot42 Do you have this situation?
INFO:tensorflow:Running caption generation on 0 files matching
But my path is right..
@FengLoveSS Nope. Most probably your file path is wrong. Are you sure the path you provided is correct? Try an absolute path to the image and see if it works?
@iamgroot42 My original path was wrong. I really carelessly
@iamgroot42 Thanks for your help. That was the ticket (well, just using the directory, as the checkpoint
file points to the others).
I would like to release my trained model. Can someone let me know what all files do I need to share with the community?
I use TF 1.0 GPU version.
@cshallue
## UPDATE: The 2M finetuned model checkpoint is now available!
Here's a version trained on the latest TF 1.0 on a GPU.
https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
I would release the finetuned version in a few days.
Open an issue on the (repository page) or email me at kranthi.[email protected] in case you have a problem setting it up.
Thank you
When trying to upgrade checkpoint file for compatibly with TF 1.0, when using the above code by @cshallue use the relative paths. Use of absolute paths gives out an error at
tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
OLD_CHECKPOINT_FILE = "./model.ckpt-2000000"
NEW_CHECKPOINT_FILE = "./model.ckpt-2000000"
I trained on the TF 1.0.1 and python2.7 without finetuned.
https://github.com/withyou1771/im2txt
Captions for image 1.jpg:
0) a cat laying on top of a grass covered field . (p=0.002806)
1) a black and white cat laying on top of a grass covered field . (p=0.000498)
2) a black and white cat laying on top of a green field . (p=0.000412)
I have released a version trained on the latest TF 1.0 on a GPU.
It has both 1M without finetuning and 2M with finetuning model checkpoints.
https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
Open an issue on the (repository page) or email me at _kranthi.[email protected]_ in case
you have a problem setting it up.
Thank you!
@begongyal The latest version of tensorflow creates three files for checkpoints by default. Please do not delete or remove anything in your train_dir, and use:
tf.flags.DEFINE_string("train_dir","YOUR DIR OF TRAIN (seems to be ~/im2txt/model/train/)"
"Directory for saving and loading model checkpoints.")
in im2txt/train.py, I managed to get rid of the error.
Hi, anyone knows how to convert those above pre-trained models to protobuf models (.pb)?
I want to use them for Tensorflow Mobile.
Also, I need some information (because I do not train the models) as follows:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/ios/camera/CameraExampleViewController.mm
From line 37 to line 44:
// These dimensions need to match those the model was trained with.
const int wanted_input_width = 224;
const int wanted_input_height = 224;
const int wanted_input_channels = 3;
const float input_mean = 117.0f;
const float input_std = 1.0f;
const std::string input_layer_name = "input";
const std::string output_layer_name = "softmax1";
Thanks in advanced!
@psycharo Thanks for sharing your checkpoint! excellent work!!
I have successfully used the 1M model (model.ckpt-1000000) However I'm still struggling to use the fine-tuned 2M or 3M posted here. I've tried the solutions already discussed, but with no luck.
I'm using: Tensorflow 1.3 (for gpu), CUDA 8, cudnn 5.1. (I have yet to try to downgrade to TF 1.0, could this work?).
When using for simplex using the fine-tuned 2M model, as described posted by @psycharo, I get the errors discussed earlier:
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm_cell/kernel" not found in checkpoint files /home/ubuntu/im2txt/data/model.ckpt-2000000
I can fix this issue by running the following code:
OLD_CHECKPOINT_FILE = "model.ckpt-2000000"
NEW_CHECKPOINT_FILE = "model2.ckpt-2000000"
import tensorflow as tf
vars_to_rename = {
"lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/weights",
"lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/biases",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
if old_name in vars_to_rename:
new_name = vars_to_rename[old_name]
else:
new_name = old_name
new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
saver.save(sess, NEW_CHECKPOINT_FILE)
However, when I try to run the evaluation using the new model2, I get the following error:
NotFoundError (see above for traceback): Key lstm/basic_lstm_cell/kernel not found in checkpoint
Here is the full stacktrace
INFO:tensorflow:Loading model from checkpoint: /home/ubuntu/im2txt/data/model2.ckpt-2000000
INFO:tensorflow:Restoring parameters from /home/ubuntu/im2txt/data/model2.ckpt-2000000
2017-09-07 14:38:17.078647: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key lstm/basic_lstm_cell/kernel not found in checkpoint
2017-09-07 14:38:17.100193: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key lstm/basic_lstm_cell/bias not found in checkpoint
Traceback (most recent call last):
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 89, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 66, in main
restore_fn(sess)
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 96, in _restore_fn
saver.restore(sess, checkpoint_path)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key lstm/basic_lstm_cell/kernel not found in checkpoint
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]
Caused by op u'save/RestoreV2_381', defined at:
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 89, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/run_inference.py", line 52, in main
FLAGS.checkpoint_path)
File "/home/ubuntu/bazel-bin/im2txt/run_inference.runfiles/im2txt/im2txt/inference_utils/inference_wrapper_base.py", line 116, in build_graph_from_config
saver = tf.train.Saver()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1140, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1172, in build
filename=self._filename)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 688, in build
restore_sequentially, reshape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
dtypes=dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Key lstm/basic_lstm_cell/kernel not found in checkpoint
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_381/tensor_names, save/RestoreV2_381/shape_and_slices)]]
Anyone have any idea how to make the fine-tuned model work?
Thank you..
If someone is looking for the reformatted words_count file, here it is words_count.txt
no,what l want to ask you is your github ” Implementation of GoogLeNet by chainer”-- googlenet/googlenet.py中的what is“nutszebra_chainer”,it have confused me long time,thank you again! here is your code websit: https://github.com/nutszebra/googlenet/blob/master/googlenet.py
发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用
发件人: Razin Shaikhnotifications@github.com
发送时间: 2017年10月15日 16:09
收件人: tensorflow/modelsmodels@noreply.github.com
抄送: Subscribedsubscribed@noreply.github.com
主题: Re: [tensorflow/models] Pretrained model for img2txt? (#466)
If someone is looking for the reformatted words_count file, here it is words_count.txthttp://s000.tinyupload.com/?file_id=00024493683399847909
―
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/models/issues/466#issuecomment-336693957, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AX91IJthW-f4tSibnAYgSxeylgX6-zxXks5ssb2vgaJpZM4KIXMv.
Has anyone figured out how to export the im2txt trained model as a TensorFlow SavedModelBundle to be served by Tensorflow Serving?
Has anyone meet the problem of UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte?
Traceback (most recent call last):
File "/Users/hanyu/Downloads/models-master2/research/im2txt/im2txt/run_inference.py", line 153, in
im2txt()
File "/Users/hanyu/Downloads/models-master2/research/im2txt/im2txt/run_inference.py", line 140, in im2txt
image = f.read()
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 125, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 93, in _prepare_value
return compat.as_str_any(val)
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 106, in as_str_any
return as_str(value)
File "/Users/hanyu/anaconda/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I don't know why I send the command:
bazel-bin/im2txt/train --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" --train_dir="${MODEL_DIR}/train" --train_inception=false --number_of_steps=1000000
there have a error:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcufft.so.8.0. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_fft.cc:344] Unable to load cuFFT DSO.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
* Error in `/usr/bin/python': double free or corruption (!prev): 0x000000000231f8e0 *
I don't know the error
@yh0903 To solve the unicode error, make sure the file is being read in binary mode in run_inference.py
:
with tf.gfile.GFile(filename, "rb") as f:
@psycharo hi, thank you for providing us with so great model. I want to ask you some question. Have you noticed how the performance changes when you finetuen the model. Is the whole models' performance increasing or first the model performance(cider or bleu) drops a little, then it gradually increase.
Hi all,
I try to use pretrained models by @psycharo . When I test the model to get softmax output and LSTM states I get an error: "Key lstm/logits/biases not found in checkpoint" .
Tensorflow version is 1.0.1, python 2.7
This is console output:
universal@universal-ubuntu:~/anaconda3/envs/MyGAN$ python test.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: universal-ubuntu
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: universal-ubuntu
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.111.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.111 Tue Dec 19 23:51:45 PST 2017
GCC version: gcc version 4.9.3 (Ubuntu 4.9.3-13ubuntu2)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.111.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.111.0
INFO:tensorflow:Loading model from checkpoint: /home/universal/anaconda3/envs/MyGAN/im2txt/model/pre-trained/model-new-renamed.ckpt-2000000
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key lstm/logits/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key lstm/logits/weights not found in checkpoint
Traceback (most recent call last):
File "test.py", line 185, in
restore_fn(sess)
File "test.py", line 64, in _restore_fn
saver.restore(sess, checkpoint_path)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1428, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key lstm/logits/biases not found in checkpoint
[[Node: save/RestoreV2_379 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_379/tensor_names, save/RestoreV2_379/shape_and_slices)]]
Caused by op u'save/RestoreV2_379', defined at:
File "test.py", line 173, in
restore_fn = _create_restore_fn(checkpoint_path) # (inception_variables, inception_checkpoint_file)
File "test.py", line 55, in _create_restore_fn
saver = tf.train.Saver()
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1040, in __init__
self.build()
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1070, in build
restore_sequentially=self._restore_sequentially)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 675, in build
restore_sequentially, reshape)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 242, in restore_op
[spec.tensor.dtype])[0])
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
dtypes=dtypes, name=name)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/universal/anaconda3/envs/MyGAN/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): Key lstm/logits/biases not found in checkpoint
[[Node: save/RestoreV2_379 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_379/tensor_names, save/RestoreV2_379/shape_and_slices)]]
And this is my code for testing:
`
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import os.path
import time
import numpy as np
import tensorflow as tf
import image_embedding
import image_processing
import inputs as input_ops
tf.logging.set_verbosity(tf.logging.INFO)
# Dimensions of Inception v3 input images.
image_height = 299
image_width = 299
image_format = "jpeg"
train_inception=False
embedding_size = 512
vocab_size = 12000
num_lstm_units = 512
# To match the "Show and Tell" paper we initialize all variables with a
# random uniform initializer.
# Scale used to initialize model variables.
initializer_scale = 0.08
initializer = tf.random_uniform_initializer(
minval=-initializer_scale,
maxval=initializer_scale)
# Collection of variables from the inception submodel.
inception_variables = []
inception_checkpoint_file="/home/universal/anaconda3/envs/MyGAN/im2txt/model/inception_v3.ckpt"
checkpoint_path="/home/universal/anaconda3/envs/MyGAN/im2txt/model/pre-trained/model-new-renamed.ckpt-2000000"
def _create_restore_fn(checkpoint_path):
"""Creates a function that restores a model from checkpoint.
Args:
checkpoint_path: Checkpoint file or a directory containing a checkpoint
file.
saver: Saver for restoring variables from the checkpoint file.
Returns:
restore_fn: A function such that restore_fn(sess) loads model variables
from the checkpoint file.
Raises:
ValueError: If checkpoint_path does not refer to a checkpoint file or a
directory containing a checkpoint file.
"""
saver = tf.train.Saver()
if tf.gfile.IsDirectory(checkpoint_path):
checkpoint_path = tf.train.latest_checkpoint(checkpoint_path)
if not checkpoint_path:
raise ValueError("No checkpoint file found in: %s" % checkpoint_path)
def _restore_fn(sess):
tf.logging.info("Loading model from checkpoint: %s", checkpoint_path)
saver.restore(sess, checkpoint_path)
tf.logging.info("Successfully loaded checkpoint: %s",
os.path.basename(checkpoint_path))
return _restore_fn
def process_image(encoded_image, thread_id=0):
"""Decodes and processes an image string.
Args:
encoded_image: A scalar string Tensor; the encoded image.
thread_id: Preprocessing thread id used to select the ordering of color
distortions.
Returns:
A float32 Tensor of shape [height, width, 3]; the processed image.
"""
return image_processing.process_image(encoded_image,
is_training=False,
height=image_height,
width=image_width,
thread_id=thread_id,
image_format=image_format)
g = tf.Graph()
with g.as_default():
image_feed = tf.placeholder(dtype=tf.string, shape=[], name="image_feed")
input_feed = tf.placeholder(dtype=tf.int64,
shape=[None], # batch_size
name="input_feed")
# Process image and insert batch dimensions.
# build_inputs
images = tf.expand_dims(process_image(image_feed), 0)
input_seqs = tf.expand_dims(input_feed, 1)
# """Builds the image model subgraph and generates image embeddings.
inception_output = image_embedding.inception_v3(
images,
trainable=train_inception,
is_training=False)
inception_variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="InceptionV3")
# Map inception output into embedding space.
with tf.variable_scope("image_embedding") as scope:
image_embeddings = tf.contrib.layers.fully_connected(
inputs=inception_output,
num_outputs=embedding_size,
activation_fn=None,
weights_initializer=initializer,
biases_initializer=None,
scope=scope)
# Save the embedding size in the graph.
tf.constant(embedding_size, name="embedding_size")
with tf.variable_scope("seq_embedding"), tf.device("/cpu:0"):
embedding_map = tf.get_variable(
name="map",
shape=[vocab_size, embedding_size],
initializer=initializer)
seq_embeddings = tf.nn.embedding_lookup(embedding_map, input_seqs)
# This LSTM cell has biases and outputs tanh(new_c) * sigmoid(o), but the
# modified LSTM in the "Show and Tell" paper has no biases and outputs
# new_c * sigmoid(o).
lstm_cell = tf.contrib.rnn.BasicLSTMCell(
num_units=num_lstm_units, state_is_tuple=True)
with tf.variable_scope("lstm", initializer=initializer) as lstm_scope:
# Feed the image embeddings to set the initial LSTM state.
zero_state = lstm_cell.zero_state(
batch_size=image_embeddings.get_shape()[0], dtype=tf.float32)
_, initial_state = lstm_cell(image_embeddings, zero_state)
# Allow the LSTM variables to be reused.
lstm_scope.reuse_variables()
# In inference mode, use concatenated states for convenient feeding and
# fetching.
tf.concat(axis=1, values=initial_state, name="initial_state")
# Placeholder for feeding a batch of concatenated states.
state_feed = tf.placeholder(dtype=tf.float32,
shape=[None, sum(lstm_cell.state_size)],
name="state_feed")
state_tuple = tf.split(value=state_feed, num_or_size_splits=2, axis=1)
# Run a single LSTM step.
lstm_outputs, state_tuple = lstm_cell(
inputs=tf.squeeze(seq_embeddings, axis=[1]),
state=state_tuple)
# Concatentate the resulting state.
tf.concat(axis=1, values=state_tuple, name="state")
# Stack batches vertically.
lstm_outputs = tf.reshape(lstm_outputs, [-1, lstm_cell.output_size])
with tf.variable_scope("logits") as logits_scope:
logits = tf.contrib.layers.fully_connected(
inputs=lstm_outputs,
num_outputs=vocab_size,
activation_fn=None,
weights_initializer=initializer,
scope=logits_scope)
tf.nn.softmax(logits, name="softmax")
restore_fn = _create_restore_fn(checkpoint_path) # (inception_variables, inception_checkpoint_file)
g.finalize()
input_files= "/media/universal/264CB8084CB7D0B3/MSCOCO/raw-data/train2014/COCO_train2014_000000000009.jpg"
filenames = []
for file_pattern in input_files.split(","):
filenames.extend(tf.gfile.Glob(file_pattern))
with tf.Session(graph=g) as sess:
# Load the model from checkpoint.
restore_fn(sess)
for filename in filenames:
with tf.gfile.GFile(filename, "rb") as f:
image = f.read()
#partial_captions_list = partial_captions.extract()
#input_feed = np.array([c.sentence[-1] for c in partial_captions_list])
# build_inputs
# Test feeding a batch of inputs and LSTM states to get softmax output and
# LSTM states.
input_feed = np.random.randint(0, 10, size=3)
state_feed = np.random.rand(3, 1024)
feed_dict = {"input_feed:0": input_feed, "lstm/state_feed:0": state_feed, "image_feed:0": image}
lstm_outputs_out = sess.run([softmax, lstm_outputs], feed_dict=feed_dict)
print(lstm_outputs_out)
""""""
`
What has gone wrong?
Are there any ckpt file with these vars?
When I generate cattions by runnung run_inference.py file, everything is OK. But I need to create my own model based on Im2Txt so I want to know how it works.
Thank you in advance
Hello,
I am running the script " bazel-binim2txt\run_inference --checkpoint_path=${CHECKPOINT_DIR} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}" using python 3.5.2 under windows 7.
Python crashs with the message "Python has stopped working", could you please advice what is wrong!
Thank you
@victoriastuart
@cshallue
I am running the script " bazel-binim2txt\run_inference --checkpoint_path=${CHECKPOINT_DIR} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}" using python 3.5.2 (downloaded with Anaconda 3) under windows 7.
Python crashs with the message "Python has stopped working", could you please advice what is wrong!
I need only to caption some images and use a pretained model.
Thank you
@JZakraoui
I work in Linux, not Windows. ;-)
I don't know your level of experience, but as a general suggestion I would suggest reading up on creating and using Python virtual environments (venv) anytime you are installing and working with new software/projects. In my opinion, it will save you a lot of headaches in the long-run (preserving, e.g., your system and it's "base" Python installation ...
Not to be dismissive, but "Python crashes with the message 'Python has stopped working' ...", by itself, is not very helpful:
Again -- as a general practice -- include the exact error message, and the preceding 10 or 50 or 100 lines of code/messages (whatever concisely encapsulates the issue, in your opinion) whenever you describe a problem plus relevant system details: operating system (as you did), programming language / environment, program versions ... anything relevant.
Not to ask the obvious, but did you "Google" this issue. Although often very archaic, error messages often indicate the precise nature of the issue, so searchin on that topic(s) leads to greater understanding of the problem.
Again (my opinion), indicating that you tried to understand your problem and that you searched for a solution carries much weight, when finally asking for help.
NEVER give up! Seriously: we ALL start somewhere! Tthings that seem really complicated at the time often seem much less complicated in hindsight, with aquired knlowledge and experience.
Just my thoughts; I do hope you sort this out! Post back here with additional detail, and perhaps someone can help. :-)
@victoriastuart thank you
@psycharo @KranthiGV @cshallue
I am running the script
bazel-binim2txt\run_inference --checkpoint_path=%CHECKPOINT_PATH% --vocab_file=%VOCAB_FILE% --input_files=%IMAGE_FILE%
Python 3.5.5
tensorflow 1.8.0
windows 7(64 bit), CPU
@psycharo pre-trained model
I got the following error:
tensorflow.python.framework.errors_impl.NotFoundError: Tensor name "lstm/basic_lstm_cell/bias" not found in checkpoint files C:\Users\USER\Documents\models\pretrained1\model.ckpt-2000000
Any advice? thank you
@JZakraoui Seems like variable names for basic_lstm_cell were changed again. You can change the variable name as pointed out by @cshallue. Copying his code however notice the variable names
OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000"
NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000"
import tensorflow as tf
vars_to_rename = {
"lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel",
"lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias",
}
new_checkpoint_vars = {}
reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE)
for old_name in reader.get_variable_to_shape_map():
if old_name in vars_to_rename:
new_name = vars_to_rename[old_name]
else:
new_name = old_name
new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name))
init = tf.global_variables_initializer()
saver = tf.train.Saver(new_checkpoint_vars)
with tf.Session() as sess:
sess.run(init)
saver.save(sess, NEW_CHECKPOINT_FILE)
It works for me with
Python 3.6.4
tensorflow 1.7.0
@psycharo's pre-trained model
Can confirm that @vpaharia latest fix works. Steps to follow on Python 3.5.5, TF-gpu 1.8:
reverse_vocab = [eval(line.split()[0]).decode() for line in reverse_vocab]
python3 im2txt/run_inference.py --checkpoint_path=models/model.ckpt-2000000 --vocab_file=models/word_counts.txt --input_files images/image1.jpg
In my case I created models directory where I extracted @psycharo learned models, I have also put the above mentioned script in this directory to fix the models (replaced paths with ./model.ckpt-2000000). I hope that this helps others, so that they don't have to look through all the posts :)
@cshallue Thank you so much for your help!
Here is a 5000000 step model using TF 1.9:
https://github.com/Gharibim/Tensorflow_im2txt_5M_Step
Hey, thank you for the checkpoint files! I was wondering if anyone managed to use one of them to fine-tune the model with new data? How would the word counts file need to look like? Does the newly created word counts file from the new dataset need to be merged with the one from MSCOCO?
I am currently running the fine tuning with a merged word counts file but encountering two problems:
1.) the captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2
1) day day day day day day day day day day day day . <S> . <S> <S> . (p=0.011221)
2.) I let the model fine-tune over night and somehow only the last 5 checkpoints got saved. Does anyone know how to prevent the overwriting of checkpoints and keep all of them?
Thank you in advance!
@JZakraoui Seems like variable names for basic_lstm_cell were changed again. You can change the variable name as pointed out by @cshallue. Copying his code however notice the variable names
OLD_CHECKPOINT_FILE = ".../model.ckpt-2000000" NEW_CHECKPOINT_FILE = ".../model.ckpt-2000000" import tensorflow as tf vars_to_rename = { "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel", "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias", } new_checkpoint_vars = {} reader = tf.train.NewCheckpointReader(OLD_CHECKPOINT_FILE) for old_name in reader.get_variable_to_shape_map(): if old_name in vars_to_rename: new_name = vars_to_rename[old_name] else: new_name = old_name new_checkpoint_vars[new_name] = tf.Variable(reader.get_tensor(old_name)) init = tf.global_variables_initializer() saver = tf.train.Saver(new_checkpoint_vars) with tf.Session() as sess: sess.run(init) saver.save(sess, NEW_CHECKPOINT_FILE)
It works for me with
Python 3.6.4
tensorflow 1.7.0
@psycharo's pre-trained model
With this I could load the checkpoint file too, notice the variable names / values in the checkpoint file model.ckpt-2000000:
tensor_name: lstm/BasicLSTMCell/Linear/Bias
[-0.89432126 -0.34625703 0.16128121 ... 0.48277333 -0.5986251
1.2891939 ]
tensor_name: lstm/BasicLSTMCell/Linear/Matrix
[[ 0.16781631 -0.04221911 0.24709763 ... 0.04963883 -0.08704979
0.03227773]
he captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2
I've noticed that small amount of steps generate pretty bad results. I get one word responses and I'm around 366906 steps. I'm going to continue running and see how the results improve.
I additionally had to follow the comment here: https://github.com/tensorflow/models/issues/7204#issuecomment-513319623
he captions after 20,000 steps just consist of the same word repeated over and over, despite a very small loss of 0.2
I've noticed that small amount of steps generate pretty bad results. I get one word responses and I'm around 366906 steps. I'm going to continue running and see how the results improve.
I additionally had to follow the comment here: #7204 (comment)
Hey! can you please tell us if the results improved? and how many steps did it take? I would be really grateful if you could share your log file as well! Thank you in advance!
Most helpful comment
here are links to a pre-trained model: