Fairseq: --share-all-embeddings requires a joined dictionary

Created on 5 Sep 2020  路  3Comments  路  Source: pytorch/fairseq

@edunov @myleott @ngoyal2707 I am trying to train a seq2seq model for translation purposes but am facing a problem when using the GPU for training. This is the command used to do the training:-

CUDA_VISIBLE_DEVICES=0 fairseq-train "/content/drive/My Drive/HashPro/New/" --fp16 --max-sentences 8 --lr 0.02 --clip-norm 0.1  \
  --optimizer sgd --dropout 0.2  \
  --arch bart_large --save-dir "/content/drive/My Drive/HashPro/Checkpoints"

And this is the error:-

2020-09-05 14:11:00 | INFO | fairseq_cli.train | Namespace(activation_fn='gelu', adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.0, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', clip_norm=0.1, cpu=False, criterion='cross_entropy', cross_self_attention=False, curriculum=0, data='/content/drive/My Drive/HashPro/New/', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', dropout=0.2, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=False, finetune_from_model=None, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=True, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, localsgd_frequency=3, log_format=None, log_interval=100, lr=[0.02], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=0, max_sentences=8, max_sentences_valid=8, max_source_positions=1024, max_target_positions=1024, max_tokens=None, max_tokens_valid=None, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, momentum=0.0, no_cross_attention=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_seed_provided=True, no_token_positional_embeddings=False, nprocs_per_node=1, num_batch_buckets=0, num_workers=1, optimizer='sgd', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, profile=False, quant_noise_pq=0, quant_noise_pq_block_size=8, quant_noise_scalar=0, quantization_config_path=None, relu_dropout=0.0, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='/content/drive/My Drive/HashPro/Checkpoints', save_interval=1, save_interval_updates=0, scoring='bleu', seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, source_lang=None, stop_time_hours=0, target_lang=None, task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, tpu=False, train_subset='train', truncate_source=False, update_freq=[1], upsample_primary=1, use_bmuf=False, user_dir=None, valid_subset='valid', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, warmup_updates=0, weight_decay=0.0, zero_sharding='none')
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [input] dictionary: 21936 types
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | [output] dictionary: 9216 types
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.input
2020-09-05 14:11:00 | INFO | fairseq.data.data_utils | loaded 1 examples from: /content/drive/My Drive/HashPro/New/valid.input-output.output
2020-09-05 14:11:00 | INFO | fairseq.tasks.translation | /content/drive/My Drive/HashPro/New/ valid input-output 1 examples
Traceback (most recent call last):
  File "/usr/local/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
  File "/content/fairseq/fairseq_cli/train.py", line 343, in cli_main
    distributed_utils.call_main(args, main)
  File "/content/fairseq/fairseq/distributed_utils.py", line 187, in call_main
    main(args, **kwargs)
  File "/content/fairseq/fairseq_cli/train.py", line 68, in main
    model = task.build_model(args)
  File "/content/fairseq/fairseq/tasks/translation.py", line 279, in build_model
    model = super().build_model(args)
  File "/content/fairseq/fairseq/tasks/fairseq_task.py", line 248, in build_model
    model = models.build_model(args, self)
  File "/content/fairseq/fairseq/models/__init__.py", line 48, in build_model
    return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
  File "/content/fairseq/fairseq/models/transformer.py", line 198, in build_model
    raise ValueError("--share-all-embeddings requires a joined dictionary")
ValueError: --share-all-embeddings requires a joined dictionary

From the docs, I can only glean that the "target_dictionary" and the "source_dictionary" is not the same. Apart from that, I could find no help from the internet. Since the error seems to be related to joined dictionaries, it seems that maybe there was a preprocessing step I missed. However, I have scanned all the arguments and they seem to be correct. Even then, here is the command for reference:-


%%bash
fairseq-preprocess --source-lang input --target-lang output \
  --trainpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe --bpe characters --validpref /content/drive/'My Drive'/HashPro/tokenized/hashpro_hashes.bpe \
  --destdir /content/drive/'My Drive'/HashPro/New/

Does anybody have any idea on how to fix this?

needs triage question

Most helpful comment

Instead of creating separate source and target dictionaries, it creates one dictionary for both source and target

All 3 comments

You'll need to use the --joined-dictionary option when running fairseq-preprocess.

@lematt1991 mind explaining what --joined-dictionary does?

Instead of creating separate source and target dictionaries, it creates one dictionary for both source and target

Was this page helpful?
0 / 5 - 0 ratings