Transformers: Generating from mT5

Created on 21 Nov 2020 · 16Comments · Source: huggingface/transformers

Environment info

transformers version: #9c0afdaf7b091c341072b432ad6ee17ba7a5016b
Platform: Google colab
Python version: 3.6.9
PyTorch version (GPU?): 1.7.0
No GPU

Who can help

mT5: @patrickvonplaten

Information

Generating from mT5-small gives (nearly) empty output:

from transformers import MT5ForConditionalGeneration, T5Tokenizer
model = MT5ForConditionalGeneration.from_pretrained("google/mt5-small")
tokenizer = T5Tokenizer.from_pretrained("google/mt5-small")
article = "translate to french: The capital of France is Paris."
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], return_tensors="pt")
output_ids = model.generate(input_ids=batch.input_ids, num_return_sequences=1, num_beams=8, length_penalty=0.1)
tokenizer.decode(output_ids[0])

>>> <pad> <extra_id_0></s>

Using the same input for T5 gives reasonable output:

from transformers import T5ForConditionalGeneration, T5Tokenizer
model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
article = "translate to french: The capital of France is Paris."
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], return_tensors="pt")
output_ids = model.generate(input_ids=batch.input_ids, num_return_sequences=1, num_beams=8, length_penalty=0.1)
tokenizer.decode(output_ids[0])

>>> <pad> La capitale de la France est Paris.</s>

My understanding is that mT5 is trained in the same way as T5, and should work in a very similar way?

Source

tomhosking

👍3

Most helpful comment

Hi, I slightly modified the script provided by @ patil-suraj to fine-tune [T5 on SQUAD] (https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) and after many epochs (I think I am missing anything/doing something wrong) I got 'decent' results fine-tuning mT5-small on tydiQA for multilingual QA https://huggingface.co/mrm8488/mT5-small-finetuned-tydiqa-for-xqa. The PR with the model card for more details is not approved yet.

just merged it :-) BTW, you can now directly create the model cards online - no need for PRs anymore ;-)

patrickvonplaten on 27 Nov 2020

👍3

All 16 comments

mT5 is not pretrained on downstream tasks like T5 was - see: https://huggingface.co/transformers/master/model_summary.html#mt5

So it not surprising that mT5 won't work well out-of-the-box without fine-tuning.

patrickvonplaten on 23 Nov 2020

🚀1

Ah, I hadn't realised that. But in that case, wouldn't the expected output be a reconstruction of the input?

tomhosking on 23 Nov 2020

Ah, I hadn't realised that. But in that case, wouldn't the expected output be a reconstruction of the input?

Hard to say if the model does not include any sentinel tokens (<extra_id_1>) and if one uses generate() instead of just the forward pass.... . Wolud be interesting to play around with the two pre-trained model variants though and see what differences they show...

patrickvonplaten on 23 Nov 2020

I agree that I would only get reconstruction if the decoding setup matched training :) Can you point me at any documentation that describes what special tokens are expected? I dug around in your implementation and the official repo but couldn't see anything. The output of tokenizer.prepare_seq2seq_batch() is the same for src and tgt as well (presumably because it uses the T5 tokenizer - does it not need its own?)

Edit: Looking again, it seems like the sentinel tokens are just the equivalent of [MASK]? In which case the model should be able to reconstruct the input if it has access to the full (un-noised) sequence.

tomhosking on 23 Nov 2020

Maybe these pointers help:

mT5 is pretrained exactly like T5 only without the downstream supersived training mixin. I think the T5 paper should explain in detail how this in done.

patrickvonplaten on 24 Nov 2020

Does anybody have some more pointers on how to use (train) the mT5 model that has been added to master for text generation? Anything explaining how the finetuning is done in practice using Huggingface Transformers would be greatly appreciated!

Rijgersberg on 26 Nov 2020

Hey @Rijgersberg, what exactly do you mean by text generation ? GPT2-like open-end text generation?

patrickvonplaten on 26 Nov 2020

Well not open-end text generation in the sense of "writing", but using text-to-text generation to perform all types of different NLP tasks with little to no training. Basically what the GPT-3-paper calls "few shot learning".

Specifically, I would be interested in replicating the WT5?! Training Text-to-Text Models to Explain their Predictions results in languages other than English. But I'm having some trouble understanding what the differences between the T5 and mT5 models in Transformers mean for accomplishing that task.

Rijgersberg on 26 Nov 2020

Hey @tomhosking how did you use MT5ForConditionalGeneration, T5Tokenizer
I used

pip install transformers

But it is showing

ImportError: cannot import name 'MT5ForConditionalGeneration'

How can we install it?🤔

parthplc on 27 Nov 2020

@parthplc You can specify version of package You would like to install. For me it was experimental: transformers==4.0.0rc1 and it works fine.

For training mT5 model for generating summary You can check out this post. It worked for me.

[edit]
I forgot to mention, the only modification You have to make is to replace T5ForConditionalGeneration with MT5ForConditionalGeneration.

adamwawrzynski on 27 Nov 2020

❤2 👍1

Well not open-end text generation in the sense of "writing", but using text-to-text generation to perform all types of different NLP tasks with little to no training. Basically what the GPT-3-paper calls "few shot learning".

Specifically, I would be interested in replicating the WT5?! Training Text-to-Text Models to Explain their Predictions results in languages other than English. But I'm having some trouble understanding what the differences between the T5 and mT5 models in Transformers mean for accomplishing that task.

In this case, I would just fine-tune mT5 with the normal causal language modeling objective meaning:

from transformers import MT5ForConditionalGeneration, T5Tokenizer 
mt5 = MT5ForConditionalGeneration.from_pretrained("google/mt5-base")
mt5_tok = T5Tokenizer.from_pretrained("google/mt5-base")

input_ids = mt5_tok("explain sentiment: I went to see this movie with my husband, and we both thought the acting was terrible!", return_tensors="pt").input_ids  # in the language of your choice
labels = mt5_tok("negative explanation: the acting was terrible.", return_tensors="pt").input_ids  # in the language of your choice

loss = mt5(input_ids=input_ids, labels=labels).loss

I took one of the visual examples of the paper you mentioned.

In short, there is no difference in how mt5 and t5 should be fine-tuned.

Also, @mrm8488 already successfully fine-tuned an mT5 model: https://twitter.com/mrm8488/status/1329478063768350723
sorry to ping you here @mrm8488 - but maybe you have some tips/tricks for mt5 fine-tuning?

Also pinging our T5 fine-tuning expert @patil-suraj

patrickvonplaten on 27 Nov 2020

Well not open-end text generation in the sense of "writing", but using text-to-text generation to perform all types of different NLP tasks with little to no training. Basically what the GPT-3-paper calls "few shot learning".

I'm not sure if you can use mT5 with no training (fine-tuning), since it was not pre-trained with any supervised objective like T5.

One experiment to try is to fine-tune mT5 on the english data and see if it works for your language without any language specific fine-tuning (In my experiments, T5 trained on English SQuAD for que gen gave interesting results for French and German without any language specific fine-tuning).

But for better results you should fine-tune mT5 on the language specific dataset.

And also as Patrick said, you can fine-tune mT5 and T5 the same way.
The major differences between mT5 and T5 are

mT5 is based on T51.1
pre-trained on 101 languages
no supervised pre-training

patil-suraj on 27 Nov 2020

Hi, I slightly modified the script provided by @patil-suraj to fine-tune [T5 on SQUAD] (https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) and after many epochs (I think I am missing anything/doing something wrong) I got 'decent' results fine-tuning mT5-small on tydiQA for multilingual QA https://huggingface.co/mrm8488/mT5-small-finetuned-tydiqa-for-xqa. The PR with the model card for more details is not approved yet.

mrm8488 on 27 Nov 2020

🎉1

Hi, I slightly modified the script provided by @ patil-suraj to fine-tune [T5 on SQUAD] (https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) and after many epochs (I think I am missing anything/doing something wrong) I got 'decent' results fine-tuning mT5-small on tydiQA for multilingual QA https://huggingface.co/mrm8488/mT5-small-finetuned-tydiqa-for-xqa. The PR with the model card for more details is not approved yet.

just merged it :-) BTW, you can now directly create the model cards online - no need for PRs anymore ;-)

patrickvonplaten on 27 Nov 2020

👍3

Well not open-end text generation in the sense of "writing", but using text-to-text generation to perform all types of different NLP tasks with little to no training. Basically what the GPT-3-paper calls "few shot learning".

I'm not sure if you can use mT5 with no training (fine-tuning), since it was not pre-trained with any supervised objective like T5.

One experiment to try is to fine-tune mT5 on the english data and see if it works for your language without any language specific fine-tuning (In my experiments, T5 trained on English SQuAD for que gen gave interesting results for French and German without any language specific fine-tuning).

But for better results you should fine-tune mT5 on the language specific dataset.

And also as Patrick said, you can fine-tune mT5 and T5 the same way.
The major differences between mT5 and T5 are

mT5 is based on T51.1

pre-trained on 101 languages

no supervised pre-training

hey @patil-suraj @mrm8488 how can we finetune mT5 for other languages. Let's suppose we have language translation problem for any language other than English and if we finetune using T5 tokenizer we would be replacing each word with unk tokens. how will it be fine-tuned? eg.

print(tokenizer.decode(data['source_ids']))
print(tokenizer.decode(data['target_ids']))

English to Hindi: Tell me the name of the ninth month.</s> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>
<unk> <unk> <unk> <unk> <unk> <unk> </s> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>

parthplc on 28 Nov 2020

@parthplc - I don't really understand your question. Since mT5 was trained on 101 languages it's tokenizer can obviously handle all those languages, e.g.:

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("google/mt5-small")
tok.decode(tok("Der Satz wird auch definiert als sprachliche Einheit, die aus Subjekt und Prädikat besteht. Dies soll auf Aristoteles zurückgehen. Entsprechend definiert die traditionelle Grammatik den Satz als bestehend aus: Satzaussage (Prädikat), Satzergänzung (Objekt) und Satzgegenstand (Subjekt).").input_ids)  # gives no <unk> symbols

Hopefully, this makes more sense now

patrickvonplaten on 28 Nov 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings