Fairseq: Cannot reduce “tokens-per-sample” in inference for the transformer language model

Created on 31 Aug 2020 · 4Comments · Source: pytorch/fairseq

:bug: Bug with `tokens-per-sample` in evaluation.

To Reproduce

fairseq-train --task language_modeling data-bin/wikitext-103 \
--save-dir path_to_model_dir --arch transformer_lm_wiki103 --max-update 286000 --max-lr 1.0 \
--t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 --warmup-updates 16000 \
--warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 --criterion adaptive_loss \
--max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
--sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d --fp16 

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-sentences 1 \
    --tokens-per-sample 512 \
    --context-window 0

In language model evaluation, I believe we can use a smaller tokens-per-sample compared to training. In this case, trained with 3072 but evaluated with 512. But then I found that it’s still using 3072-word segments in evaluation.

In [2]: sample
Out[2]:
{'id': tensor([34], device='cuda:0'),
 'nsentences': 1,
 'ntokens': 3072,
 'net_input': {'src_tokens': tensor([[   10,     4, 15793,  ...,     2,     2,    12]], device='cuda:0'),
  'src_lengths': tensor([3072], device='cuda:0')},
 'target': tensor([[    4, 15793,     5,  ...,     2,    12,    12]], device='cuda:0')}

If I force --max-tokens by:

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-tokens 512 \
    --tokens-per-sample 512 \
    --context-window 0

I get the following error:

  File "fairseq/data/data_utils_fast.pyx", line 50, in fairseq.data.data_utils_fast.batch_by_size_fast
    assert max_tokens <= 0 or sample_len <= max_tokens, (
AssertionError: sentence at index 79 of size 2881 exceeds max_tokens limit of 512!

So I suspect that --sample-break-mode none is not properly working during evaluation; it shouldn't impose respecting sentence boundaries.

Environment

fairseq Version (e.g., 1.0 or master): '0.9.0'
PyTorch Version (e.g., 1.0): '1.5.1+cu101'
OS (e.g., Linux): Ubuntu 18.04.3 LTS
How you installed fairseq (pip, source): Source
Build command you used (if compiling from source):

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

Python version: 3.7.6

bug

Source

jungokasai

Most helpful comment

in theory tokens-per-sample is a property of the model. while specifying a smaller value should work, specifying a bigger value would probably not as positional embeddings etc would not have been seen during training. so its probably good that you can't override it from command line easily

alexeib on 31 Aug 2020

👍2

All 4 comments

Actually, I realized that --tokens-per-sample was not updated during evaluation. To do that, we can do instead:

fairseq-eval-lm data-bin/wikitext-103 \
    --path path_to_model \
    --max-sentences 1 \
    --model-overrides '{"tokens_per_sample": 512} \
    --context-window 0

This looks like working.

jungokasai on 31 Aug 2020

glad you solved it

alexeib on 31 Aug 2020

👍1

alexeib on 31 Aug 2020

👍2

This all makes sense! Thank you for your help.

jungokasai on 1 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

RuntimeError: Creating MTGP constants failed. at /pytorch/aten/src/THC/THCTensorRandom.cu:33

kyquang97 · 3Comments

Any performance comparison between pre-norm and post-norm for Transformer on Machine Translation

gaopengcuhk · 3Comments

AdaFactor to save GPU memory?

AranKomat · 3Comments

Error during inference of model trained on fp16

Raghava14 · 3Comments

How to generate my own distillation dataset for Levenshtien Transformer

Ir1d · 3Comments

Fairseq: Cannot reduce “tokens-per-sample” in inference for the transformer language model

:bug: Bug with tokens-per-sample in evaluation.

To Reproduce

Environment

Most helpful comment

All 4 comments

Related issues

:bug: Bug with `tokens-per-sample` in evaluation.