Transformers: T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint?

Created on 23 Jun 2020 · 4Comments · Source: huggingface/transformers

As the paper described, T5 uses a relative attention mechanism and the answer for this issue says, T5 can use any sequence length were the only constraint is memory.

According to this, can I use T5 to summarize inputs that have more than 512 tokens in a sequence?

Source

shamanez

Most helpful comment

I hope we will soon have these models ready for summarization

patrickvonplaten on 23 Jun 2020

❤2

All 4 comments

Yes you can, but you should be aware that memory requirements quadruple when doubling the input sequence length for "normal" self-attention (as in T5).

So you will quickly run out of memory.

Here a snippet that shows that you can run input ids longer than config.max_postion_embeddings.

import torch
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-base")
model.config.max_position_embeddings  # 512
input_ids = torch.tensor([600 * [0]])  # shape (1, 600)
model(input_ids, decoder_input_ids=input_ids)  # => no error

For more memory efficient models, you should take a look at Reformer and Longformer

patrickvonplaten on 23 Jun 2020

❤1

I hope we will soon have these models ready for summarization

patrickvonplaten on 23 Jun 2020

❤2

Thanks for the quick help.

So basically, the T5 model in hugging face can handled arbitrary sequence length outputs right?
So the second line (model.config.max_position_embeddings) basically shows the default max input seq length right ?

What do you think of the following code (Here I simply modify the tokenizer max_length):

model = T5ForConditionalGeneration.from_pretrained('t5-small')
 tokenizer = T5Tokenizer.from_pretrained('t5-small')
 t5_prepared_Text = "summarize: "+some_preprocess_text 
 tokenized_text = tokenizer.encode(t5_prepared_Text,  max_length=1024,return_tensors="pt")

 summary_ids = model.generate(tokenized_text,
                                    num_beams=4,
                                    no_repeat_ngram_size=2,
                                    min_length=30,
                                    max_length=100,
                                    early_stopping=True)

shamanez on 23 Jun 2020

Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior?

shamanez on 23 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings