Transformers: Is there a helper script to preprocess data for T5 for masked language modeling?

Created on 17 Jun 2020 · 4Comments · Source: huggingface/transformers

Hi Team

Thanks for the wonderful HuggingFace library !

I am now working with T5 on my own dataset. I want to know if there is any helper script that can automatically take text and mask a random set of tokens and also generate the expected output sequence for the pretraining unsupervised language modeling task.

wontfix

Source

abhisheksgumadi

Most helpful comment

I am working on a script for T5 based upon the current run_language_modeling.py, maybe I can share that once I am done and someone can confirm if it works as expected?

amanpreet692 on 22 Jun 2020

👍4 👀1

All 4 comments

Not yet sadly - it's on my ToDo list. Hope to be able to work on it soon

patrickvonplaten on 22 Jun 2020

❤1

I am working on a script for T5 based upon the current run_language_modeling.py, maybe I can share that once I am done and someone can confirm if it works as expected?

amanpreet692 on 22 Jun 2020

👍4 👀1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.