I was weird how to generate the vocab file when specified --vocab_file to create_pretraining_data.py?
I noticed the released BERT model indeed include the vocab file? so, how's you guy generate it via for instance, enlish Wikipedia dump file? as I am going to do the pre-training from scratch. Appreciate your help!
Thanks,
Allen Zhang
We couldn't include that code, see this section of the README for alternatives.
@allenzhang010 have you work out the solution about this?
you can use this to create your vocabulary
https://github.com/kwonmha/bert-vocab-builder
Most helpful comment
@allenzhang010 have you work out the solution about this?