After training my model, I would like to evaluate it; however, I run into an argument parse error, as seen below. I am using the command lines from here and have slightly modified them where I am using a patience of 3, no-epoch-checkpoints, removed fp16, and distributed-world-size of 1 when training. I also changed the paths to reflect my own directory structure. These are the only changes I have made from the link, and I am sure that they are properly formatted. Any help is appreciated. :)
Traceback (most recent call last):
File "/home/e/miniconda3/envs/eshaan/bin/fairseq-eval-lm", line 11, in
load_entry_point('fairseq', 'console_scripts', 'fairseq-eval-lm')()
File "/srv/home/e/eshaan/fairseq/fairseq_cli/eval_lm.py", line 251, in cli_main
add_distributed_training_args(parser)
File "/srv/home/e/eshaan/fairseq/fairseq/options.py", line 356, in add_distributed_training_args
help='total number of GPUs across all nodes (default: all visible GPUs)')
File "/home/e/miniconda3/envs/eshaan/lib/python3.6/argparse.py", line 1352, in add_argument
return self._add_action(action)
File "/home/e/miniconda3/envs/eshaan/lib/python3.6/argparse.py", line 1556, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "/home/e/miniconda3/envs/eshaan/lib/python3.6/argparse.py", line 1366, in _add_action
self._check_conflict(action)
File "/home/e/miniconda3/envs/eshaan/lib/python3.6/argparse.py", line 1505, in _check_conflict
conflict_handler(action, confl_optionals)
File "/home/e/miniconda3/envs/eshaan/lib/python3.6/argparse.py", line 1514, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --distributed-world-size: conflicting option string: --distributed-world-size
I have tried retraining my model in case it was an issue with how my checkpoints were stored, despite how the output always said my distributed world size is 1. I have also looked at this similar error to make sure that no other python processes are running.
pip, source): sourceI encountered this bug as well. Seems like commenting out line 251 (add_distributed_training_args(parser)) in fairseq_cli/eval_lm.py fixes it.
Fixed by b2ee110c853c5effdd8d21f50a8437485bafb285
Hi Myle!
I think there might still be an issue here. When I run eval_lm with the argument "--distributed-world-size 1" it fails:
File "eval_lm.py", line 11, in
cli_main()
File "fairseq_cli/eval_lm.py", line 252, in cli_main
distributed_utils.call_main(args, main)
File "fairseq/distributed_utils.py", line 173, in call_main
main(args, kwargs)
TypeError: main() takes 1 positional argument but 2 were given
This should actually be fixed now :)
Most helpful comment
Hi Myle!
I think there might still be an issue here. When I run eval_lm with the argument "--distributed-world-size 1" it fails:
File "eval_lm.py", line 11, in
cli_main()
File "fairseq_cli/eval_lm.py", line 252, in cli_main
distributed_utils.call_main(args, main)
File "fairseq/distributed_utils.py", line 173, in call_main
main(args, kwargs)
TypeError: main() takes 1 positional argument but 2 were given