Models: Argument parsing error from cifat10_train.py

Created on 29 Sep 2017  路  9Comments  路  Source: tensorflow/models

After you had implemented ArgumentParse in https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py,
I can't set arguments with command line, which resulting in below error,

ubuntu@ip-172-31-31-109:~/models$ python3.6 tutorials/image/cifar10/cifar10_train.py -h
usage: cifar10_train.py [-h] [--batch_size BATCH_SIZE] [--data_dir DATA_DIR]
                        [--use_fp16 USE_FP16]

optional arguments:
  -h, --help            show this help message and exit
  --batch_size BATCH_SIZE
                        Number of images to process in a batch.
  --data_dir DATA_DIR   Path to the CIFAR-10 data directory.
  --use_fp16 USE_FP16   Train the model using fp16.

As you see, you can't set --train_dir or --num_gpus

Changes from git diff c96ef83658fffd25f961cfd7fd5e444f59868efa..HEAD ./tutorials/image/cifar10/

diff --git a/tutorials/image/cifar10/cifar10.py b/tutorials/image/cifar10/cifar10.py
index 7909b77..9d84b1b 100644
--- a/tutorials/image/cifar10/cifar10.py
+++ b/tutorials/image/cifar10/cifar10.py
@@ -35,6 +35,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function

+import argparse
 import os
 import re
 import sys
@@ -45,15 +46,19 @@ import tensorflow as tf

 import cifar10_input

-FLAGS = tf.app.flags.FLAGS
+parser = argparse.ArgumentParser()

 # Basic model parameters.
-tf.app.flags.DEFINE_integer('batch_size', 128,
-                            """Number of images to process in a batch.""")
-tf.app.flags.DEFINE_string('data_dir', '/tmp/cifar10_data',
-                           """Path to the CIFAR-10 data directory.""")
-tf.app.flags.DEFINE_boolean('use_fp16', False,
-                            """Train the model using fp16.""")
+parser.add_argument('--batch_size', type=int, default=128,
+                    help='Number of images to process in a batch.')
+
+parser.add_argument('--data_dir', type=str, default='/tmp/cifar10_data',
+                    help='Path to the CIFAR-10 data directory.')
+
+parser.add_argument('--use_fp16', type=bool, default=False,
+                    help='Train the model using fp16.')
+
+FLAGS = parser.parse_args()

But if you revert this change, it works as usual.

Most helpful comment

Sure. Tried to come up with a solution but didn't really have the time, so here is what I see.

As given in the scripts currently there is a call to

FLAGS = parser.parse_args()

both in cifar10.py and in all of the other files that include it. For example cifar10_train.py, which imports cifar10.py, the call to parse the args happens in the if __name__ == "__main__": condition, but in cifar10.py it happens globally at the top of the script. Couldn't figure out a good solution, because I'm not sure how the actual main() function ends up being called in these scripts in cifar10_train.py.

Anywho, as currently written, when the other scripts import cifar10.py, the call to parse_args happens on the import in cifar10.py first, so the FLAGS only recognizes the arguments --batch_size, --data_dir and --use_fp16. If these are given from command line they will work, e.g. if I wanted to change the batch size when trying to train a cifar10 model I can do

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py --batch_size=64
>> Downloading cifar-10-binary.tar.gz 0.5%
...

However, if I want to change one of the command line args added in the cifar10_train.py , like --max_steps=10000 this won't work, because the call to parse_args in cifar10_train.py will be ignored (only the first call to pars_args will be honored):

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py --max_steps=100
usage: cifar10_train.py [-h] [--batch_size BATCH_SIZE] [--data_dir DATA_DIR]
                        [--use_fp16 USE_FP16]
cifar10_train.py: error: unrecognized arguments: --max_steps=100

This is because this is happening from the first call to parse_args() in cifar10.py, but the argument for --max_steps has not been added yet in cifar10_train.py.

Probably the call to parse_args() should not be happening in the cifar10.py script, as other scripts want/need to add more command line arguments. However the solution is not as simple as removing the call to parse_args() from cifar10.py, as when you do that you get:

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py
Traceback (most recent call last):
  File "cifar10_train.py", line 130, in <module>
    tf.app.run()
  File "/home/dash/anaconda3/envs/py36tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "cifar10_train.py", line 121, in main
    cifar10.maybe_download_and_extract()
  File "/home/dash/repos/models/tutorials/image/cifar10/cifar10.py", line 388, in maybe_download_and_extract
    dest_directory = FLAGS.data_dir
NameError: name 'FLAGS' is not defined

So here in maybe_download_and_extrac() it is expecting the global FLAGS to be defined, but I removed this, so it has not been defined yet. I guess that FLAGS really should not be a global, and/or the logic has to be changed so that these function in cifar10.py are not called until after other scripts that import it might work to add the flags they need first.

All 9 comments

I can confirm, I just now tried to run the cifar10 multi gpu tutorial and the command line arguments for the num_gpus is not accepted

(py27tf) dharter@cheetah:~/repos/models/tutorials/image/cifar10$ CUDA_VISIBLE_DEVICES='0,1,2,3,4' python cifar10_multi_gpu_train.py --num_gpus=5
usage: cifar10_multi_gpu_train.py [-h] [--batch_size BATCH_SIZE]
                                  [--data_dir DATA_DIR] [--use_fp16 USE_FP16]
cifar10_multi_gpu_train.py: error: unrecognized arguments: --num_gpus=5

As a work around, if you edit the parser.add_argument and set the default for the command line argument you want, it will use your edited default. So these command line flags are being used, but obviously these commands to add the command line arguments for the parser are not added until after tensorflow actually checks the command line arguments.

The problem being, I believe, that the parser.parse_args() is called in cifar10.py, and imported in the other scripts, like cifar10_train.py. So the command line arguments are parsed, then the other scripts try to add the extra arguments after command line parseing. So need to move the parser.parse_args() out of cifar10.py and do it in each of the other scripts after all command line arguments added.

@itsmeolivia can you please comment on where argument parsing should be happening?

Sure. Tried to come up with a solution but didn't really have the time, so here is what I see.

As given in the scripts currently there is a call to

FLAGS = parser.parse_args()

both in cifar10.py and in all of the other files that include it. For example cifar10_train.py, which imports cifar10.py, the call to parse the args happens in the if __name__ == "__main__": condition, but in cifar10.py it happens globally at the top of the script. Couldn't figure out a good solution, because I'm not sure how the actual main() function ends up being called in these scripts in cifar10_train.py.

Anywho, as currently written, when the other scripts import cifar10.py, the call to parse_args happens on the import in cifar10.py first, so the FLAGS only recognizes the arguments --batch_size, --data_dir and --use_fp16. If these are given from command line they will work, e.g. if I wanted to change the batch size when trying to train a cifar10 model I can do

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py --batch_size=64
>> Downloading cifar-10-binary.tar.gz 0.5%
...

However, if I want to change one of the command line args added in the cifar10_train.py , like --max_steps=10000 this won't work, because the call to parse_args in cifar10_train.py will be ignored (only the first call to pars_args will be honored):

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py --max_steps=100
usage: cifar10_train.py [-h] [--batch_size BATCH_SIZE] [--data_dir DATA_DIR]
                        [--use_fp16 USE_FP16]
cifar10_train.py: error: unrecognized arguments: --max_steps=100

This is because this is happening from the first call to parse_args() in cifar10.py, but the argument for --max_steps has not been added yet in cifar10_train.py.

Probably the call to parse_args() should not be happening in the cifar10.py script, as other scripts want/need to add more command line arguments. However the solution is not as simple as removing the call to parse_args() from cifar10.py, as when you do that you get:

(py36tf) dash@kluge:~/repos/models/tutorials/image/cifar10$ python cifar10_train.py
Traceback (most recent call last):
  File "cifar10_train.py", line 130, in <module>
    tf.app.run()
  File "/home/dash/anaconda3/envs/py36tf/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "cifar10_train.py", line 121, in main
    cifar10.maybe_download_and_extract()
  File "/home/dash/repos/models/tutorials/image/cifar10/cifar10.py", line 388, in maybe_download_and_extract
    dest_directory = FLAGS.data_dir
NameError: name 'FLAGS' is not defined

So here in maybe_download_and_extrac() it is expecting the global FLAGS to be defined, but I removed this, so it has not been defined yet. I guess that FLAGS really should not be a global, and/or the logic has to be changed so that these function in cifar10.py are not called until after other scripts that import it might work to add the flags they need first.

I have faced the same issue. I have created the pull request #2567, that uses the global parser explicitly and allows me to pass both the --train_dir and --data_dir.

@itsmeolivia , could you please take a look at this issue.

@aselle @itsmeolivia I just want to bump this issue. Do we have a solution to this yet? Any help would be greatly appreciated :)

I bumped into this issue with python cifar10_multi_gpu_train.py. A work around that works for me is to modify this line in cifar10.py:

FLAGS = parser.parse_args()

as

FLAGS, _ = parser.parse_known_args()

Closing as this is resolved

Was this page helpful?
0 / 5 - 0 ratings