Models: issue in Pre-trained model checkpoint version not work with tensorflow-1.2

Created on 17 Aug 2017 · 23Comments · Source: tensorflow/models

Hello all,

I am newbie in tensorflow. I use tensorflow-1.2 version and want to train my own data. And use the pre-train model ssd_mobilenet_v1_coco. At the training time i got the error:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /usr/local/lib/python2.7/dist-packages/tensorflow/models/model/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I read some related stack overflow post and written that:

add lines:
saver = tf.train.Saver(Write_version = saver_pb2.SaveDef.V1)
saver.save(sess,"./model.ckpt",global_step=step)

But, I am not getting where to add this lines in saver.py and Do I need to remove any line from this file.

Any help solving this problem would be appreciated.

Thanks in advance.

Source

rajvi3105

Most helpful comment

Just giving the path : fine_tune_checkpoint=file_path/model.ckpt, worked for me.

skulhare on 29 Aug 2017

👍3 ❤2

All 23 comments

Can you fill out the new issue template? In particular, are you running your own code or code from the repo?

skye on 17 Aug 2017

Hi skye,

I follow Mr. Dat Tran object detection blog, download the image dataset and .xml files from his repo. I followed the exact instructions given in blog, only I try to train model locally. But there I get the error as I mentioned.

Thank You.

rajvi3105 on 18 Aug 2017

I got the same error, also waiting for some advice

wyhgood on 22 Aug 2017

Actually I am reading that too.
Just specify model.ckpt in the .config file

shuuchen on 22 Aug 2017

model.ckpt is just renaming of model.ckpt.data-00000-of-00001 file.
But the problem is model.ckpt.data-00000-of-00001 is in old format and tensorflow-1.2 object-detection required new format.
Unfortunately, new formaat of model.ckpt.data-00000-of-00001 is not available till now.

rajvi3105 on 22 Aug 2017

👍4 ❤1

I am not very familiar with the object detection model or blog. Can you post the exact instructions for reproducing this problem? (This is also requested in the issue template.)

skye on 22 Aug 2017

Hi,
I am running into the same problem.
I am following this link https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_locally.md
to retrain the object detection model. I downloaded "rfcn_resnet101_coco" from here, which gave me four files:
-frozen_inference_graph.pb
-graph.pbtxt
-model.ckpt.data-00000-of-00001
-model.ckpt.meta
-model.ckpt.index
As suggested in the documentation , fine_tune_checkpoint should be path to "/usr/home/username/checkpoint/model.ckpt-#####". BUT If I give "fine_tune_checkpoint=file_path/model.ckpt.data-00000-of-00001", it throws this :
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ~/Documents/trained_models/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I am using TensorFlow version 1.3.0 on Ubuntu 16.04 with cudnn5.1.
Any help appreciated.

Thanks

skulhare on 26 Aug 2017

Just giving the path : fine_tune_checkpoint=file_path/model.ckpt, worked for me.

skulhare on 29 Aug 2017

👍3 ❤2

hi skulhae,

Which tensorflow version and OS are you using?

I already set the fine_tune_checkpoint = ${path_to_model.ckpt} not worked for me :(

rajvi3105 on 30 Aug 2017

I am using TensorFlow version 1.3.0 on Ubuntu 16.04 with cudnn5.1.

skulhare on 30 Aug 2017

Where did you get the model ckpt file from?

drpngx on 7 Sep 2017

@drpngx I have the same issue. I downloaded it from model zoo, ssd_mobilenet_v1_coco

iamtodor on 7 Mar 2018

@suharshs reassigning to you since it looks like you might have touched the model last?

drpngx on 7 Mar 2018

I haven't touched the ssd models so am unfamiliar with this issue :( @iamtodor what is the exact command you are running so we can repro? Thanks

suharshs on 7 Mar 2018

@suharshs Thank you for attantion. Here is my command.

python models/research/object_detection/train.py --logtostderr --pipeline_config_path=ssd_mobilenet/ssd.config --train_dir=data

My ssd.config.
Dir data contains following files:

$ ls data
class.pbtxt pipeline.config test.record train.record

Is there something else I could provide to you?

iamtodor on 12 Mar 2018

👍1

My files are like this ls $TRAIN_DIR

model.ckpt-4513245.data-00000-of-00001
model.ckpt-4513245.index
model.ckpt-4513245.meta

What works to me is to just set CKPT_FILE=${TRAIN_DIR}/model.ckpt-4513245
Then
python eval_image_classifier.py --alsologtostderr --checkpoint_path=${CKPT_FILE} --dataset_dir=${DATA_DIR} --dataset_name=mnist --dataset_split_name=test --model_name=lenet works

Tensorflow-gpu 1.4, Ubuntu 16.04

leyuan on 19 Jul 2018

👍2

@rajvi3105 Please consider using latest checkpoint files in Model Detection Zoo and steps in the blog post we published recently.

achowdhery on 1 Aug 2018

Closing this issue, feel free to reopen if encountering any issues with checkpoint files.

wt-huang on 8 Oct 2018

ssd_mobilenet_v1_coco_2018_01_28 don't know the checkpoints?

Shubhamrock428 on 8 Oct 2018

model.ckpt is just renaming of model.ckpt.data-00000-of-00001 file.
But the problem is model.ckpt.data-00000-of-00001 is in old format and tensorflow-1.2 object-detection required new format.
Unfortunately, new formaat of model.ckpt.data-00000-of-00001 is not available till now.

What is the situation now? Is this issue resolved? - Thanks in advance

pru007 on 24 Jan 2019

as skulhare mentioned you just need to address the pre-trained model file as "model.ckpt" not "model.ckpt.data-00000-of-00001" (although the actual file name is "model.ckpt.data-00000-of-00001"). So for instance if you're following the object detection tutorial for ssd_inception_v2_coco.config file:

fine_tune_checkpoint: "../training_demo/pre-trained-model/ssd_inception_v2_coco_2018_01_28/model.ckpt"

AliMiraftab on 11 Feb 2019

Hi, I am using checkpoint files in this path
models/conv3d_sep2/
|---> conv3d_sep2-00000005.data-00000-of-00001
|---> conv3d_sep2-00000005.index
|---> conv3d_sep2-00000005.meta)
But getting this error again.

Why even if there are checkpoint files I am getting this error? Please help me with any suggestions.

SamihaSara on 16 Jul 2020

I tried Training in Google colab
python train.py --train_dir=training/ --pipeline_config_path=ssd_mobilenet_v2_quantized_300x300_coco.config
but I am getting following error

Failed to get matching files on /content/models-master/research/object_detection/ssd_inception_v2_coco_2018_01_28/model.ckpt: Not found: /content/models-master/research/object_detection/ssd_inception_v2_coco_2018_01_28; No such file or directory