I ran create_pretraining_data.py for large data which created from the entire text of enwiki dump, in a docker container which limited 70G memory.
However, OOM seemed to happen while running it, and its process was killed by OOM killer of 70G limit.
(But there was no error message, just was killed.)
How to handle it?
P.S.
I checked that *** Writing to output files *** message was outputted, so maybe the problem is on write_instance_to_example_files function.
You should shard the input data (text.txt_00000, text.txt_00001), run the script for each shard (tf_examples.tfrecord_00000, tf_examples.tf_record_00001), and then pass in a file glob (e.g., tf_examples.tfrecord*) to run_pretraining.py.
Most helpful comment
You should shard the input data (
text.txt_00000,text.txt_00001), run the script for each shard (tf_examples.tfrecord_00000,tf_examples.tf_record_00001), and then pass in a file glob (e.g.,tf_examples.tfrecord*) torun_pretraining.py.