Transformers: T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint?

Created on 23 Jun 2020  路  4Comments  路  Source: huggingface/transformers

As the paper described, T5 uses a relative attention mechanism and the answer for this issue says, T5 can use any sequence length were the only constraint is memory.

According to this, can I use T5 to summarize inputs that have more than 512 tokens in a sequence?

Most helpful comment

I hope we will soon have these models ready for summarization

All 4 comments

Yes you can, but you should be aware that memory requirements quadruple when doubling the input sequence length for "normal" self-attention (as in T5).

So you will quickly run out of memory.

Here a snippet that shows that you can run input ids longer than config.max_postion_embeddings.

import torch
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-base")
model.config.max_position_embeddings  # 512
input_ids = torch.tensor([600 * [0]])  # shape (1, 600)
model(input_ids, decoder_input_ids=input_ids)  # => no error

For more memory efficient models, you should take a look at Reformer and Longformer

I hope we will soon have these models ready for summarization

Thanks for the quick help.

So basically, the T5 model in hugging face can handled arbitrary sequence length outputs right?
So the second line (model.config.max_position_embeddings) basically shows the default max input seq length right ?

What do you think of the following code (Here I simply modify the tokenizer max_length):

model = T5ForConditionalGeneration.from_pretrained('t5-small')
 tokenizer = T5Tokenizer.from_pretrained('t5-small')
 t5_prepared_Text = "summarize: "+some_preprocess_text 
 tokenized_text = tokenizer.encode(t5_prepared_Text,  max_length=1024,return_tensors="pt")

 summary_ids = model.generate(tokenized_text,
                                    num_beams=4,
                                    no_repeat_ngram_size=2,
                                    min_length=30,
                                    max_length=100,
                                    early_stopping=True)


Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior?

Was this page helpful?
0 / 5 - 0 ratings