I am running various architectures mentioned here in --arch option to benchmark
and I am using workpiece-tokenizer externally before pre-process step.
I was able to run following transformer based architectures with following command and able to inference as well.
transformer, transformer_iwslt_de_en, transformer_wmt_en_de, transformer_vaswani_wmt_en_de_big, transformer_vaswani_wmt_en_fr_big, transformer_wmt_en_de_big, transformer_wmt_en_de_big_t2t
Command I use ->
CUDA_VISIBLE_DEVICES=0,1,2,3 fairseq-train /home/translation_task/mr2en_token_data --arch transformer_wmt_en_de --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9,0.98)' --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 10000 --dropout 0.3 --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --update-freq 2 --max-source-positions 512 --max-target-positions 512 --skip-invalid-size-inputs-valid-test
Now I am trying to run following and not able to. Can someone suggest about the same.
--share-decoder-input-output-embed)Traceback (most recent call last):
File "/home/dh/anaconda3/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 354, in cli_main
nprocs=args.distributed_world_size,
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 321, in distributed_main
main(args, init_distributed=True)
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 51, in main
model = task.build_model(args)
File "/home/dh/swapnil/fairseq/fairseq/tasks/fairseq_task.py", line 185, in build_model
return models.build_model(args, self)
File "/home/dh/swapnil/fairseq/fairseq/models/__init__.py", line 48, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/dh/swapnil/fairseq/fairseq/models/masked_lm.py", line 116, in build_model
args.max_positions = args.tokens_per_sample
AttributeError: 'Namespace' object has no attribute 'tokens_per_sample'
Traceback (most recent call last):
File "/home/dh/anaconda3/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 354, in cli_main
nprocs=args.distributed_world_size,
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 321, in distributed_main
main(args, init_distributed=True)
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 89, in main
train(args, trainer, task, epoch_itr)
File "/home/dh/swapnil/fairseq/fairseq_cli/train.py", line 152, in train
log_output = trainer.train_step(samples)
File "/home/dh/swapnil/fairseq/fairseq/trainer.py", line 327, in train_step
sample, self.model, self.criterion, self.optimizer, ignore_grad
File "/home/dh/swapnil/fairseq/fairseq/tasks/fairseq_task.py", line 251, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/dh/swapnil/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 57, in forward
net_output = model(**sample['net_input'])
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/dh/swapnil/fairseq/fairseq/models/fairseq_model.py", line 385, in forward
return self.decoder(src_tokens, **kwargs)
File "/home/dh/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
TypeError: forward() got multiple values for argument 'prev_output_tokens'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dh/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/home/dh/anaconda3/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
pip, source): pipyou have preprocessed data with source and target to run machine translation architectures, however roberta requires different data preprocessing. You can check examples/ folder for tutorials on how to preprocess data for roberta
I want to run the translations only. And I wondering how to use roberta or bart for the same.
So once the preprocessing is done ... would there be changes in above command for roberta ?
Are you refering to this -> https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md
Also, what would be preprocessing steps for Bart ?
Couldn't find anything here -> https://github.com/pytorch/fairseq/tree/master/examples/bart
I want to run the translations only. And I wondering how to use roberta or bart for the same.
I'm interested in this too. Are there tutorials for using RoBERTa (as embeddings?) or BART pre-trained models along with train, test, and eval files to train and evaluate a model to do seq2seq translations?
I am running into the same issues trying to use pre-trained XLM-R for translation. I think the main problem is that Roberta and XLM-R are encoder-only architectures. I think the solution is to use XLM-R or Roberta as an encoder for feature-extraction with a newly initialized decoder.
BART is supposed to be built for translation according to its paper. And for Roberta it looks like a slim chance to use it for translation as it is not seq-2-seq.
@huihuifan I was able to run Roberta as its own not for translation. What would you suggest for BART.
What about using RoBERTa to generate embeddings for words to train seq2seq models?
What was your gpu configuration for BART ?
https://github.com/pytorch/fairseq/blob/master/examples/bart/README.cnn.md
did you guy follow this configuration ?
What was your gpu configuration for BART ?
https://github.com/pytorch/fairseq/blob/master/examples/bart/README.cnn.md
did you guy follow this configuration ?
I tried that demo, but step 2) tries to process train.source, val.source, train.target, and val.target. No files with those names exist in the CNN-Dailymail files in step 1). However, in the CNN-Dailymail subdirectory finished_files there are train.bin and val.bin. Looks like the demo is missing a step?
same problem here.. the example of using BART is so unclear. it would be much better if we can see some toy examples of using BART with simple input/output format for seq2seq tasks
To do the BART preprocessing, you have to look here https://github.com/pytorch/fairseq/issues/1391, specifically zhaoguangxiang's comment
Most helpful comment
I'm interested in this too. Are there tutorials for using RoBERTa (as embeddings?) or BART pre-trained models along with train, test, and eval files to train and evaluate a model to do seq2seq translations?