Fairseq: Cannot reduce “tokens-per-sample” in inference for the transformer language model

Created on 31 Aug 2020  ·  4Comments  ·  Source: pytorch/fairseq

:bug: Bug with tokens-per-sample in evaluation.

To Reproduce

fairseq-train --task language_modeling data-bin/wikitext-103 \
--save-dir path_to_model_dir --arch transformer_lm_wiki103 --max-update 286000 --max-lr 1.0 \
--t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 --warmup-updates 16000 \
--warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 --criterion adaptive_loss \
--max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
--sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d --fp16 

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-sentences 1 \
    --tokens-per-sample 512 \
    --context-window 0

In language model evaluation, I believe we can use a smaller tokens-per-sample compared to training. In this case, trained with 3072 but evaluated with 512. But then I found that it’s still using 3072-word segments in evaluation.

In [2]: sample
Out[2]:
{'id': tensor([34], device='cuda:0'),
 'nsentences': 1,
 'ntokens': 3072,
 'net_input': {'src_tokens': tensor([[   10,     4, 15793,  ...,     2,     2,    12]], device='cuda:0'),
  'src_lengths': tensor([3072], device='cuda:0')},
 'target': tensor([[    4, 15793,     5,  ...,     2,    12,    12]], device='cuda:0')}

If I force --max-tokens by:

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-tokens 512 \
    --tokens-per-sample 512 \
    --context-window 0

I get the following error:

  File "fairseq/data/data_utils_fast.pyx", line 50, in fairseq.data.data_utils_fast.batch_by_size_fast
    assert max_tokens <= 0 or sample_len <= max_tokens, (
AssertionError: sentence at index 79 of size 2881 exceeds max_tokens limit of 512!

So I suspect that --sample-break-mode none is not properly working during evaluation; it shouldn't impose respecting sentence boundaries.

Environment

  • fairseq Version (e.g., 1.0 or master): '0.9.0'
  • PyTorch Version (e.g., 1.0): '1.5.1+cu101'
  • OS (e.g., Linux): Ubuntu 18.04.3 LTS
  • How you installed fairseq (pip, source): Source
  • Build command you used (if compiling from source):
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
  • Python version: 3.7.6
bug

Most helpful comment

in theory tokens-per-sample is a property of the model. while specifying a smaller value should work, specifying a bigger value would probably not as positional embeddings etc would not have been seen during training. so its probably good that you can't override it from command line easily

All 4 comments

Actually, I realized that --tokens-per-sample was not updated during evaluation. To do that, we can do instead:

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-sentences 1 \
    --model-overrides '{"tokens_per_sample": 512} \
    --context-window 0

This looks like working.

glad you solved it

in theory tokens-per-sample is a property of the model. while specifying a smaller value should work, specifying a bigger value would probably not as positional embeddings etc would not have been seen during training. so its probably good that you can't override it from command line easily

This all makes sense! Thank you for your help.

Was this page helpful?
0 / 5 - 0 ratings