Models: `train_dir` assertion fail while training oxford pet on gcloud

Created on 27 Jun 2017 · 21Comments · Source: tensorflow/models

Describe the problem

I ran into a missing 'train_dir' issue while trying to train oxford pet dataset on gcloud according to the tutorial. I clearly have provided command line argument for 'train_dir' and I am able to run it locally just fine.
Attached is the error log from gcloud. Any help would be greatly appreciated!

Source code / logs

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 198, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 141, in main assert FLAGS.train_dir, 'train_dir is missing.' AssertionError: train_dir is missing.

awaiting response

Source

TomPyonsuke

Most helpful comment

I had the same problem because I was using copy-paste from some github page. You need to re-type the command letter by letter, and then it works!

elektronika-ba on 29 Dec 2019

😄1 👍1

All 21 comments

Hi @TomPyonsuke - can you copy your command line in?

jch1 on 27 Jun 2017

Hi! "zinc-guru-3900" is my bucket. Thanks!
gcloud ml-engine jobs submit training whoami_object_detection_date +%s --job-dir=gs://zinc-guru-3900/train --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz --module-name object_detection.train --region us-central1 --config object_detection/samples/cloud/cloud.yml -- \ --train_dir=gs://zinc-guru-3900/train --pipeline_config_path=gs://zinc-guru-3900/data/faster_rcnn_resnet101_pets.config

TomPyonsuke on 28 Jun 2017

screenshot from 2017-06-27 21-16-33
The problem seems to be with replica2

TomPyonsuke on 28 Jun 2017

I solved this issue by providing train directory path as the default for train_dir. It's still unclear to me why replicas are not taking command line arguments. Looks like a bug to me?
Thank you!

TomPyonsuke on 28 Jun 2017

@derekjchow PTAL

tombstone on 28 Jun 2017

@TomPyonsuke thanks for reporting this. Could you help us reproduce by providing some information

Could you provide us with your gcloud version (gcloud --version)?
Could you modify the train.py script to print sys.argv

derekjchow on 28 Jun 2017

i have the same problem. do you solve the problem yet?
afe437d9-deda-4933-848c-b7dc0c4f5147

cindysillence on 15 Jul 2017

I have the same problem how do you give the default path can you just copy paste the command you used to train
Thanks!

GulatiAditya on 18 Jul 2017

I have the same issue with the new way you train from the command line you can't add pipeline_config or train_dir in the new one, can someone ellaborate on how to do it. Thank You.

rexhinvorpsi on 11 Sep 2017

same with me~~~seems like can't add pipeline_config path from command line~~~how can I do~
(in anaconda prompt(windows 10)),please help ~~

ActionMichael on 24 Sep 2017

same here, I am in (anaconda2 env(Ubuntu 16.04)), anyone can help?

yhmybzc on 25 Sep 2017

@TomPyonsuke how did you provide train directory path as the default for train_dir? I tried to add path "object_detection/models/model/train" in train.py file. But then got the following issues:

Traceback (most recent call last):
File "object_detection/train.py", line 197, in
tf.app.run()
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 144, in main
model_config, train_config, input_config = get_configs_from_multiple_files()
File "object_detection/train.py", line 126, in get_configs_from_multiple_files
text_format.Merge(f.read(), train_config)
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 125, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "/home/yhmybzc/anaconda2/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: .

Any suggestions would be grateful~

yhmybzc on 27 Sep 2017

@TomPyonsuke "...-- \ --train_dir=gs://..."
This part seems wrong to me. Also do not leave a space after between =xxx as = xxx will fail with the same error. (and do not copy paste commands with spaces after the final \ like _ _ _, searched a while for this one)

rudevel on 17 Apr 2018

👍1

amrita@amrita-VirtualBox:~/Downloads/models/research/object_detection$ python3 train.py
WARNING:tensorflow:From /home/amrita/.local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "train.py", line 147, in
tf.app.run()
File "/home/amrita/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 68, in main
assert FLAGS.train_dir, 'train_dir is missing.'
AssertionError: train_dir is missing.
i am having this error,thanks.

Amri95 on 20 Apr 2018

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

Harshini-Gadige on 28 Sep 2018

👎3

I solved this issue by providing train directory path as the default for train_dir. It's still unclear to me why replicas are not taking command line arguments. Looks like a bug to me?
Thank you!

@TomPyonsuke Bro how did you make the train directory path as the default for train_dir?

Iqbal0007 on 18 Jun 2019

(tensorflow) D:\my-work\WiS - alert - 2\models\research\object_detection>python train.py --logtostderr --train_dir= D:/my-work/WiS - alert - 2 /models/research/object_detection/training/ --pipeline_config_path= D:/my-work/WiS - alert - 2 /models/research/object_detection/training/ssd_mobilenet_v1_coco.config
D:\installation\anaconda\envs\tensorflowlib\site-packages\h5py__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "train.py", line 164, in
tf.app.run()
File "D:\installation\anaconda\envs\tensorflowlib\site-packages\tensorflow\python\platformapp.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 88, in main
assert FLAGS.train_dir, 'train_dir is missing.'
AssertionError: train_dir is missing.