tokens-per-sample in evaluation.fairseq-train --task language_modeling data-bin/wikitext-103 \
--save-dir path_to_model_dir --arch transformer_lm_wiki103 --max-update 286000 --max-lr 1.0 \
--t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 --warmup-updates 16000 \
--warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 --criterion adaptive_loss \
--max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
--sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d --fp16
fairseq-eval-lm data-bin/wikitext-103 \
--path path_to_model \
--max-sentences 1 \
--tokens-per-sample 512 \
--context-window 0
In language model evaluation, I believe we can use a smaller tokens-per-sample compared to training. In this case, trained with 3072 but evaluated with 512. But then I found that it’s still using 3072-word segments in evaluation.
In [2]: sample
Out[2]:
{'id': tensor([34], device='cuda:0'),
'nsentences': 1,
'ntokens': 3072,
'net_input': {'src_tokens': tensor([[ 10, 4, 15793, ..., 2, 2, 12]], device='cuda:0'),
'src_lengths': tensor([3072], device='cuda:0')},
'target': tensor([[ 4, 15793, 5, ..., 2, 12, 12]], device='cuda:0')}
If I force --max-tokens by:
fairseq-eval-lm data-bin/wikitext-103 \
--path path_to_model \
--max-tokens 512 \
--tokens-per-sample 512 \
--context-window 0
I get the following error:
File "fairseq/data/data_utils_fast.pyx", line 50, in fairseq.data.data_utils_fast.batch_by_size_fast
assert max_tokens <= 0 or sample_len <= max_tokens, (
AssertionError: sentence at index 79 of size 2881 exceeds max_tokens limit of 512!
So I suspect that --sample-break-mode none is not properly working during evaluation; it shouldn't impose respecting sentence boundaries.
pip, source): Sourcegit clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
Actually, I realized that --tokens-per-sample was not updated during evaluation. To do that, we can do instead:
fairseq-eval-lm data-bin/wikitext-103 \
--path path_to_model \
--max-sentences 1 \
--model-overrides '{"tokens_per_sample": 512} \
--context-window 0
This looks like working.
glad you solved it
in theory tokens-per-sample is a property of the model. while specifying a smaller value should work, specifying a bigger value would probably not as positional embeddings etc would not have been seen during training. so its probably good that you can't override it from command line easily
This all makes sense! Thank you for your help.
Most helpful comment
in theory tokens-per-sample is a property of the model. while specifying a smaller value should work, specifying a bigger value would probably not as positional embeddings etc would not have been seen during training. so its probably good that you can't override it from command line easily