I trained a Chinese Roberta model. In the model card, the widget uses a tokenizer defined in config.json(RobertaTokenizer). But my model uses BertTokenizer. Can I customize the tokenizer in the widget of the model card just like I can choose any combination of model and tokenizer in a pipeline?
I tried to use BertModel instead of RobertaModel (copy weights from Roberta to Bert). But the position embedding is different. And the outputs are different... So I have to use this combination of RobertaModel and BertTokenizer. Is that mean I can't use the inference widget?
Yes, this is possible. See https://github.com/huggingface/transformers/commit/ed71c21d6afcbfa2d8e5bb03acbb88ae0e0ea56a, you should add a tokenizer_class attribute to your config.json with the tokenizer class you want to use.
cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 馃き
Yes, this is possible. See ed71c21, you should add a
tokenizer_classattribute to your config.json with the tokenizer class you want to use.cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 馃き
Thank you! It works. I think you are right and I did not find this configuration in the documentation: https://huggingface.co/transformers/main_classes/configuration.html
Looks like that guy who made the PR did not document the new argument he added :-p
arg, who does that guy think he is? 馃槀
Most helpful comment
Yes, this is possible. See https://github.com/huggingface/transformers/commit/ed71c21d6afcbfa2d8e5bb03acbb88ae0e0ea56a, you should add a
tokenizer_classattribute to your config.json with the tokenizer class you want to use.cc @sgugger @LysandreJik I have no idea if this is currently documented or just in the code 馃き