Yes you can, but you should be aware that memory requirements quadruple when doubling the input sequence length for "normal" self-attention (as in T5).
So you will quickly run out of memory.
Here a snippet that shows that you can run input ids longer than config.max_postion_embeddings.
import torch
from transformers import T5ForConditionalGeneration
model = T5ForConditionalGeneration.from_pretrained("t5-base")
model.config.max_position_embeddings # 512
input_ids = torch.tensor([600 * [0]]) # shape (1, 600)
model(input_ids, decoder_input_ids=input_ids) # => no error
For more memory efficient models, you should take a look at Reformer and Longformer
I hope we will soon have these models ready for summarization
Thanks for the quick help.
So basically, the T5 model in hugging face can handled arbitrary sequence length outputs right?
So the second line (model.config.max_position_embeddings) basically shows the default max input seq length right ?
What do you think of the following code (Here I simply modify the tokenizer max_length):
model = T5ForConditionalGeneration.from_pretrained('t5-small')
tokenizer = T5Tokenizer.from_pretrained('t5-small')
t5_prepared_Text = "summarize: "+some_preprocess_text
tokenized_text = tokenizer.encode(t5_prepared_Text, max_length=1024,return_tensors="pt")
summary_ids = model.generate(tokenized_text,
num_beams=4,
no_repeat_ngram_size=2,
min_length=30,
max_length=100,
early_stopping=True)
Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior?
Most helpful comment
I hope we will soon have these models ready for summarization