RUN_SLOW=1 USE_CUDA=1 pytest examples/seq2seq/test_finetune_trainer.py
=========================================================== test session starts ===========================================================
platform linux -- Python 3.7.4, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /home/shleifer/transformers_fork, inifile: pytest.ini
plugins: forked-1.1.3, hydra-core-1.0.0, xdist-1.31.0, requests-mock-1.8.0
collected 2 items
examples/seq2seq/test_finetune_trainer.py /home/shleifer/transformers_fork/src/transformers/training_args.py:339: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
FutureWarning,
F/home/shleifer/transformers_fork/src/transformers/training_args.py:339: FutureWarning: The `evaluate_during_training` argument is deprecated in favor of `evaluation_strategy` (which has more options)
FutureWarning,
F
================================================================ FAILURES =================================================================
__________________________________________________________ test_finetune_trainer __________________________________________________________
def test_finetune_trainer():
> output_dir = run_trainer(1, "12", MBART_TINY, 1)
examples/seq2seq/test_finetune_trainer.py:19:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
examples/seq2seq/test_finetune_trainer.py:105: in run_trainer
main()
examples/seq2seq/finetune_trainer.py:294: in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
src/transformers/trainer.py:583: in train
train_dataloader = self.get_train_dataloader()
src/transformers/trainer.py:386: in get_train_dataloader
train_sampler = self._get_train_sampler()
examples/seq2seq/seq2seq_trainer.py:108: in _get_train_sampler
self.args.per_device_train_batch_size, distributed=self.args.n_gpu > 1
examples/seq2seq/utils.py:156: in make_sortish_sampler
return DistributedSortishSampler(self, batch_size, shuffle=shuffle, **kwargs)
examples/seq2seq/utils.py:368: in __init__
num_replicas = dist.get_world_size()
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:582: in get_world_size
return _get_group_size(group)
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:196: in _get_group_size
_check_default_pg()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def _check_default_pg():
"""
Helper that checks if the default ProcessGroup has been initialized, with
assertion
"""
assert _default_pg is not None, \
> "Default process group is not initialized"
E AssertionError: Default process group is not initialized
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:187: AssertionError
_______________________________________________________ test_finetune_trainer_slow ________________________________________________________
@slow
def test_finetune_trainer_slow():
# TODO(SS): This will fail on devices with more than 1 GPU.
# There is a missing call to __init__process_group somewhere
> output_dir = run_trainer(eval_steps=2, max_len="128", model_name=MARIAN_MODEL, num_train_epochs=3)
examples/seq2seq/test_finetune_trainer.py:30:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
examples/seq2seq/test_finetune_trainer.py:105: in run_trainer
main()
examples/seq2seq/finetune_trainer.py:294: in main
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
src/transformers/trainer.py:583: in train
train_dataloader = self.get_train_dataloader()
src/transformers/trainer.py:386: in get_train_dataloader
train_sampler = self._get_train_sampler()
examples/seq2seq/seq2seq_trainer.py:108: in _get_train_sampler
self.args.per_device_train_batch_size, distributed=self.args.n_gpu > 1
examples/seq2seq/utils.py:156: in make_sortish_sampler
return DistributedSortishSampler(self, batch_size, shuffle=shuffle, **kwargs)
examples/seq2seq/utils.py:368: in __init__
num_replicas = dist.get_world_size()
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:582: in get_world_size
return _get_group_size(group)
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:196: in _get_group_size
_check_default_pg()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def _check_default_pg():
"""
Helper that checks if the default ProcessGroup has been initialized, with
assertion
"""
assert _default_pg is not None, \
> "Default process group is not initialized"
E AssertionError: Default process group is not initialized
../miniconda3/envs/nb/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py:187: AssertionError
========================================================= short test summary info =========================================================
FAILED examples/seq2seq/test_finetune_trainer.py::test_finetune_trainer - AssertionError: Default process group is not initialized
FAILED examples/seq2seq/test_finetune_trainer.py::test_finetune_trainer_slow - AssertionError: Default process group is not initialized
=========================================================== 2 failed in 11.51s ============================================================
@stas00 would you be interested in taking a look at this, possibly reusing the fix in https://github.com/huggingface/transformers/pull/7281 ?
If that doesn't work we can hack it like tests/test_trainer.py:
cc @patil-suraj
Yes, I will work on it today, Sam.
the other temp fix option is to use @require_non_multigpu
This is not the test's issue, but the script's one - this fails with the same error.
python examples/seq2seq/finetune_trainer.py --model_name_or_path sshleifer/tiny-mbart --data_dir examples/seq2seq/test_data/wmt_en_ro --output_dir /tmp/test_outputsarhj9od --overwrite_output_dir --n_train 8 --n_val 8 --max_source_length 12 --max_target_length 12 --val_max_target_length 12 --do_train --do_eval --do_predict --num_train_epochs 1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --learning_rate 3e-4 --warmup_steps 8 --evaluate_during_training --predict_with_generate --logging_steps 0 --save_steps 1 --eval_steps 1 --sortish_sampler --label_smoothing 0.1 --adafactor --task translation --tgt_lang ro_RO --src_lang en_XX
I just dumped the args the test was invoking.
AssertionError: Default process group is not initialized means that the distributed setup is not done.
I will look more into it tomorrow morning.
On the other hand - if we sort it out - perhaps we could do the same for distributed eval!? It'd be much much better to delegate to PL all that forking, etc.
If that doesn't work we can hack it like tests/test_trainer.py: line 245
Can you please clarify how do you think it could help? that line of code you quoted does nothing - it's just used for testing and it'll result in n_gpu=2 anyway. Perhaps you meant somewhere else in that file?
You need to launch with
python -m torch.distributed.launch --nproc_per_node=2 finetune_trainer.py
caught me up as well.
In which case, yes, this would be 100% the same as https://github.com/huggingface/transformers/pull/7281 - let's finish it first, then refactor all that new code and use it here.
until then you can use @require_non_multigpu so that it doesn't interfere.
I thought PL had a way of handling distributed internally w/o the user needing to call -m torch.distributed.launch - is it not working or I misread it?
These tests don't use PL.
Most helpful comment
Yes, I will work on it today, Sam.