Tensor2tensor: What is 'extra_length' used for in tensor2tensor/utils/decoding.py ?

Created on 10 Dec 2018  Â·  2Comments  Â·  Source: tensorflow/tensor2tensor

Hi all:

I'm running t2t-decode with cpu use the function 'decoding.decode_interactively' in a NMT task,
and when i set 'extra_length = 0', t2t-decode runs faster than 'extra_length = 100', but the results look the same.

def decode_hparams(overrides=""):
"""Hyperparameters for decoding."""
hp = tf.contrib.training.HParams(
save_images=False,
log_results=True,
extra_length=100,
batch_size=0,
beam_size=4,
alpha=0.6,
eos_penalty=0.0,
block_size=0,
guess_and_check_top_k=0,
guess_and_check_epsilon=-1,
return_beams=False,
write_beam_scores=False,
max_input_size=-1,
identity_output=False,
num_samples=-1,
delimiter="n",
decode_to_file=None,
decode_in_memory=False,
shards=1,
shard_id=0,
num_decodes=1,
force_decode_length=False,
display_decoded_images=False,
# Used for video decoding.
frames_per_second=10,
skip_eos_postprocess=False)
hp.parse(overrides)
return hp ``

Can anyone tell me : can i set extra_length = 0 and what is 'extra_length' used for ? thx so much!!

Most helpful comment

Transformer's decoder is based on self attention, so unlike RNN-based decoders it needs to know the maximum output-sequence length in advance. This maximum output-sequence length is computed as the input-sequence length plus extra_length. If you set extra_length=0, you can never get longer sequence than the input. For some language pairs (assuming you are doing MT) and especially some subword vocabulary sizes (e.g. separate src and trg vocabs, with trg vocab much bigger than src), this assumption may be reasonable, but in general it is not.

You can try setting alpha to a higher value than 0.6 (you can try e.g. 2.0 which is a way too high for reasonable translations, but will serve for this demonstration) and check once again what is the effect of setting extra_length=0 (it will make the translations shorter).

That said, for speed-critical decoding purposes, it may be a good idea to set extra_length to a value between 0 and 100 (after making sure it does not shorten any sentences from a reasonably large and representative dev set).

All 2 comments

Transformer's decoder is based on self attention, so unlike RNN-based decoders it needs to know the maximum output-sequence length in advance. This maximum output-sequence length is computed as the input-sequence length plus extra_length. If you set extra_length=0, you can never get longer sequence than the input. For some language pairs (assuming you are doing MT) and especially some subword vocabulary sizes (e.g. separate src and trg vocabs, with trg vocab much bigger than src), this assumption may be reasonable, but in general it is not.

You can try setting alpha to a higher value than 0.6 (you can try e.g. 2.0 which is a way too high for reasonable translations, but will serve for this demonstration) and check once again what is the effect of setting extra_length=0 (it will make the translations shorter).

That said, for speed-critical decoding purposes, it may be a good idea to set extra_length to a value between 0 and 100 (after making sure it does not shorten any sentences from a reasonably large and representative dev set).

Thank you very much for your response! @martinpopel .
I'm doing a MT task, which is English-Chinese. And I really care about the decoding speed(decode with cpu). May i ask you to give me a suggestion about the value of 'extra_length'? And why we set 'extra_length=100' as default because it seems to really affect the decoding speed.

As you said, i have a try with 'extra_length = 0' and 'alpha = 2 ', there is no change in decoding results, until i set 'extra_length = 0' and 'alpha = 8.0 ', the decoding result becomes incorrect and becomes longer. I will have more try. thanks again!

Was this page helpful?
0 / 5 - 0 ratings