Transformers: Is there a helper script to preprocess data for T5 for masked language modeling?

Created on 17 Jun 2020  路  4Comments  路  Source: huggingface/transformers

Hi Team

Thanks for the wonderful HuggingFace library !

I am now working with T5 on my own dataset. I want to know if there is any helper script that can automatically take text and mask a random set of tokens and also generate the expected output sequence for the pretraining unsupervised language modeling task.

wontfix

Most helpful comment

I am working on a script for T5 based upon the current run_language_modeling.py, maybe I can share that once I am done and someone can confirm if it works as expected?

All 4 comments

Not yet sadly - it's on my ToDo list. Hope to be able to work on it soon

I am working on a script for T5 based upon the current run_language_modeling.py, maybe I can share that once I am done and someone can confirm if it works as expected?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Hi, I'm working in the same task. Here you can see my code if it helps!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chuanmingliu picture chuanmingliu  路  3Comments

zhezhaoa picture zhezhaoa  路  3Comments

iedmrc picture iedmrc  路  3Comments

lcswillems picture lcswillems  路  3Comments

HansBambel picture HansBambel  路  3Comments