Transformers: Customize tokenizer in model card's widget

Created on 29 Oct 2020 · 5Comments · Source: huggingface/transformers

I trained a Chinese Roberta model. In the model card, the widget uses a tokenizer defined in config.json(RobertaTokenizer). But my model uses BertTokenizer. Can I customize the tokenizer in the widget of the model card just like I can choose any combination of model and tokenizer in a pipeline?

Source

Ethan-yt

Most helpful comment

Yes, this is possible. See https://github.com/huggingface/transformers/commit/ed71c21d6afcbfa2d8e5bb03acbb88ae0e0ea56a, you should add a tokenizer_class attribute to your config.json with the tokenizer class you want to use.

cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 🤭

julien-c on 29 Oct 2020

❤1 🎉1 👍1

All 5 comments

I tried to use BertModel instead of RobertaModel (copy weights from Roberta to Bert). But the position embedding is different. And the outputs are different... So I have to use this combination of RobertaModel and BertTokenizer. Is that mean I can't use the inference widget?

Ethan-yt on 29 Oct 2020

cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 🤭

julien-c on 29 Oct 2020

❤1 🎉1 👍1

Yes, this is possible. See ed71c21, you should add a tokenizer_class attribute to your config.json with the tokenizer class you want to use.

cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 🤭

Thank you! It works. I think you are right and I did not find this configuration in the documentation: https://huggingface.co/transformers/main_classes/configuration.html

Ethan-yt on 29 Oct 2020

Looks like that guy who made the PR did not document the new argument he added :-p