I would like to train a new BERT model.
There are some way to train BERT tokenizer (a.k.a. wordpiece tokenizer) ?
Follow sentencepiece github or Bert tensorflow GitHub. You will have some
feedback
On Wed, Dec 18, 2019 at 07:52 Younggyun Hahm notifications@github.com
wrote:
❓ Questions & Help
I would like to train a new BERT model.
There are some way to train BERT tokenizer (a.k.a. wordpiece tokenizer) ?—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/2210?email_source=notifications&email_token=AIEAE4BMLLHVIADDR5PGZ63QZFQ27A5CNFSM4J4DE7PKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBGFX2A,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIEAE4CUY2ESKVEH4IPDL63QZFQ27ANCNFSM4J4DE7PA
.
If you want to see some examples of custom implementation of tokenizers into Transformers' library, you can see how they have implemented Japanese Tokenizer.
In general, you can read more information about adding a new model into Transformers here.
Questions & Help
I would like to train a new BERT model.
There are some way to train BERT tokenizer (a.k.a. wordpiece tokenizer) ?
Checkout the tokenizers repo.
There's an example of how to train a WordPiece tokenizer: https://github.com/huggingface/tokenizers/blob/master/bindings/python/examples/train_bert_wordpiece.py
Most helpful comment
Checkout the
tokenizersrepo.There's an example of how to train a WordPiece tokenizer: https://github.com/huggingface/tokenizers/blob/master/bindings/python/examples/train_bert_wordpiece.py