I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.
Same question..
me too...
I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.
@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp
Same question..
I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp
sorry for my belated reply,
@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.
I tried to run create_pretraining_data.py to generate input file. I found dupe_factor is 5 in the example of README and in create_pretraining_data.py its default value is 10. How can I set its value, and what's its influence on the model and result?
Thank you.@iShaka Have you solved the problem @eric-haibin-lin @nlp4whp
sorry for my belated reply,
@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.
Ok, thank you for your reply! @nlp4whp
Most helpful comment
sorry for my belated reply,
@jingcheng-du
I feel using default value is enough, it just decides how many times your docs will be copied and shuffled.