Fairseq: --share-all-embeddings requires a joined dictionary

Created on 5 Sep 2020 · 3Comments · Source: pytorch/fairseq

@edunov @myleott @ngoyal2707 I am trying to train a seq2seq model for translation purposes but am facing a problem when using the GPU for training. This is the command used to do the training:-

CUDA_VISIBLE_DEVICES=0 fairseq-train "/content/drive/My Drive/HashPro/New/" --fp16 --max-sentences 8 --lr 0.02 --clip-norm 0.1  \
  --optimizer sgd --dropout 0.2  \
  --arch bart_large --save-dir "/content/drive/My Drive/HashPro/Checkpoints"

And this is the error:-

2020-09-05 14:11:00 | INFO | fairseq_cli.train | Namespace(activation_fn='gelu', adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.0, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', clip_norm=0.1, cpu=False, criterion='cross_entropy', cross_self_attention=False, curriculum=0, data='/content/drive/My Drive/HashPro/New/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.2, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, localsgd_frequency=3, log_format=None, log_interval=100, lr=[0.02], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=8, max_sentences_valid=8, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, momentum=0.0, no_cross_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_seed_provided=True, no_token_positional_embeddings=False, nprocs_per_node=1, num_batch_buckets=0, num_workers=1, optimizer='sgd', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, relu_dropout=0.0, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='/content/drive/My Drive/HashPro/Checkpoints', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, weight_decay=0.0, zero_sharding='none')
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [input] dictionary: 21936 types
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [output] dictionary: 9216 types
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.input
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.output
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | /content/drive/My Drive/HashPro/New/ valid input-output 1 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/content/fairseq/fairseq_cli/train.py", line 343, in cli_main
    distributed_utils.call_main(args, main)
  File "/content/fairseq/fairseq/distributed_utils.py", line 187, in call_main
    main(args, **kwargs)
  File "/content/fairseq/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/content/fairseq/fairseq/tasks/translation.py", line 279, in build_model
    model = super().build_model(args)
  File "/content/fairseq/fairseq/tasks/fairseq_task.py", line 248, in build_model
    model = models.build_model(args, self)
  File "/content/fairseq/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/content/fairseq/fairseq/models/transformer.py", line 198, in build_model
    raise ValueError("--share-all-embeddings requires a joined dictionary")
ValueError: --share-all-embeddings requires a joined dictionary

From the docs, I can only glean that the "target_dictionary" and the "source_dictionary" is not the same. Apart from that, I could find no help from the internet. Since the error seems to be related to joined dictionaries, it seems that maybe there was a preprocessing step I missed. However, I have scanned all the arguments and they seem to be correct. Even then, here is the command for reference:-


%%bash
fairseq-preprocess --source-lang input --target-lang output \
  --trainpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe --bpe characters --validpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe \
  --destdir /content/drive/'My Drive'/HashPro/New/

Does anybody have any idea on how to fix this?

needs triage question

Source