Bert: how to decide value of the dupe_factor when run create_pretraining_data.py

Created on 6 Dec 2018  ·  6Comments  ·  Source: google-research/bert

I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?

Thank you.

Most helpful comment

I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.

@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp

sorry for my belated reply,

@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.

All 6 comments

Same question..

me too...

I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?

Thank you.

@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp

Same question..

I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.

@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp

sorry for my belated reply,

@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.

I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.

@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp

sorry for my belated reply,

@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.

Ok, thank you for your reply! @nlp4whp

Was this page helpful?
0 / 5 - 0 ratings

Related issues

quincyliang picture quincyliang  ·  4Comments

sharavsambuu picture sharavsambuu  ·  3Comments

allenzhang010 picture allenzhang010  ·  3Comments

LucasLLC picture LucasLLC  ·  3Comments

wangwei7175878 picture wangwei7175878  ·  4Comments