Models: `train_dir` assertion fail while training oxford pet on gcloud

Created on 27 Jun 2017  路  21Comments  路  Source: tensorflow/models

Describe the problem

I ran into a missing 'train_dir' issue while trying to train oxford pet dataset on gcloud according to the tutorial. I clearly have provided command line argument for 'train_dir' and I am able to run it locally just fine.
Attached is the error log from gcloud. Any help would be greatly appreciated!

Source code / logs

Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 198, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 141, in main assert FLAGS.train_dir, 'train_dir is missing.' AssertionError: train_dir is missing.

awaiting response

Most helpful comment

I had the same problem because I was using copy-paste from some github page. You need to re-type the command letter by letter, and then it works!

All 21 comments

Hi @TomPyonsuke - can you copy your command line in?

Hi! "zinc-guru-3900" is my bucket. Thanks!
gcloud ml-engine jobs submit training whoami_object_detection_date +%s --job-dir=gs://zinc-guru-3900/train --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz --module-name object_detection.train --region us-central1 --config object_detection/samples/cloud/cloud.yml -- \ --train_dir=gs://zinc-guru-3900/train --pipeline_config_path=gs://zinc-guru-3900/data/faster_rcnn_resnet101_pets.config

screenshot from 2017-06-27 21-16-33
The problem seems to be with replica2

I solved this issue by providing train directory path as the default for train_dir. It's still unclear to me why replicas are not taking command line arguments. Looks like a bug to me?
Thank you!

@derekjchow PTAL

@TomPyonsuke thanks for reporting this. Could you help us reproduce by providing some information

  1. Could you provide us with your gcloud version (gcloud --version)?
  2. Could you modify the train.py script to print sys.argv

i have the same problem. do you solve the problem yet?
afe437d9-deda-4933-848c-b7dc0c4f5147

I have the same problem how do you give the default path can you just copy paste the command you used to train
Thanks!

I have the same issue with the new way you train from the command line you can't add pipeline_config or train_dir in the new one, can someone ellaborate on how to do it. Thank You.

same with me~seems like can't add pipeline_config path from command line~how can I do~
(in anaconda prompt(windows 10)),please help ~
~

same here, I am in (anaconda2 env(Ubuntu 16.04)), anyone can help?

@TomPyonsuke how did you provide train directory path as the default for train_dir? I tried to add path "object_detection/models/model/train" in train.py file. But then got the following issues:

Traceback (most recent call last):
File "object_detection/train.py", line 197, in
tf.app.run()
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 144, in main
model_config, train_config, input_config = get_configs_from_multiple_files()
File "object_detection/train.py", line 126, in get_configs_from_multiple_files
text_format.Merge(f.read(), train_config)
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 125, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "/home/yhmybzc/anaconda2/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/home/yhmybzc/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: .

Any suggestions would be grateful~

@TomPyonsuke "...-- \ --train_dir=gs://..."
This part seems wrong to me. Also do not leave a space after between =xxx as = xxx will fail with the same error. (and do not copy paste commands with spaces after the final \ like _ _ _, searched a while for this one)

amrita@amrita-VirtualBox:~/Downloads/models/research/object_detection$ python3 train.py
WARNING:tensorflow:From /home/amrita/.local/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Traceback (most recent call last):
File "train.py", line 147, in
tf.app.run()
File "/home/amrita/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 68, in main
assert FLAGS.train_dir, 'train_dir is missing.'
AssertionError: train_dir is missing.
i am having this error,thanks.

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

I solved this issue by providing train directory path as the default for train_dir. It's still unclear to me why replicas are not taking command line arguments. Looks like a bug to me?
Thank you!

@TomPyonsuke Bro how did you make the train directory path as the default for train_dir?

(tensorflow) D:\my-work\WiS - alert - 2\models\research\object_detection>python train.py --logtostderr --train_dir= D:/my-work/WiS - alert - 2 /models/research/object_detection/training/ --pipeline_config_path= D:/my-work/WiS - alert - 2 /models/research/object_detection/training/ssd_mobilenet_v1_coco.config
D:\installation\anaconda\envs\tensorflowlib\site-packages\h5py__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "train.py", line 164, in
tf.app.run()
File "D:\installation\anaconda\envs\tensorflowlib\site-packages\tensorflow\python\platformapp.py", line 125, in run
_sys.exit(main(argv))
File "train.py", line 88, in main
assert FLAGS.train_dir, 'train_dir is missing.'
AssertionError: train_dir is missing.

Need Solution

I had the same problem because I was using copy-paste from some github page. You need to re-type the command letter by letter, and then it works!

I had the same problem because I was using copy-paste from some github page. You need to re-type the command letter by letter, and then it works!

worked for me as well

Did you solve this issue ?
I have the same problem can anyone help me please ?

I went inside the train.py file and saw the following:

Screen Shot 2020-05-12 at 9 34 33 AM

Make sure that you type the command exactly like that. In my case, I was not passing in the "train_dir" argument. Hope this helps.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nmfisher picture nmfisher  路  3Comments

dsindex picture dsindex  路  3Comments

frankkloster picture frankkloster  路  3Comments

chenyuZha picture chenyuZha  路  3Comments

kamal4493 picture kamal4493  路  3Comments