Would it be possible to use sentencepiece tokenizer in preprocessing the data?
Hey @nitinnairk it's definitely possible if you are pretraining from scratch.
The released pretrained model used GPT2 bpe dictionary so unfortunately you can't use sentencepiece tokenizer with the released model.
Is there an example to use sentencepiece tokenizer in preprocessing the data?
Most helpful comment
Is there an example to use sentencepiece tokenizer in preprocessing the data?