Bert: how to generate vocab file that BERT model was trained on?

Created on 6 Nov 2018  路  3Comments  路  Source: google-research/bert

I was weird how to generate the vocab file when specified --vocab_file to create_pretraining_data.py?

I noticed the released BERT model indeed include the vocab file? so, how's you guy generate it via for instance, enlish Wikipedia dump file? as I am going to do the pre-training from scratch. Appreciate your help!

Thanks,
Allen Zhang

Most helpful comment

@allenzhang010 have you work out the solution about this?

All 3 comments

We couldn't include that code, see this section of the README for alternatives.

@allenzhang010 have you work out the solution about this?

you can use this to create your vocabulary
https://github.com/kwonmha/bert-vocab-builder

Was this page helpful?
0 / 5 - 0 ratings

Related issues

okgrammer picture okgrammer  路  4Comments

alter-bug-tracer picture alter-bug-tracer  路  3Comments

quincyliang picture quincyliang  路  4Comments

hmxv2 picture hmxv2  路  4Comments

dangal95 picture dangal95  路  3Comments