Hello! Regarding https://github.com/huggingface/transformers/tree/master/examples#gpt-2gpt-and-causal-language-modeling, would you mind sharing what hyper-parameters you use to get this result ? How many epochs, what's the batch size? etc...
@xihui-wu To get the hyperparameters specific to the model (in this case gpt2), you can check the config file of gpt2 with the code below:
from transformers import GPT2Config
print(GPT2Config())
Some higher level hyperparameters are still not included here (e.g. "epochs"). These can be set explicitly as arguments when running the CLI run_language_modeling.py; otherwise, the default values are used.
You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.
Hope this helps!
@xihui-wu To get the hyperparameters specific to the model (in this case
gpt2), you can check the config file ofgpt2with the code below:from transformers import GPT2Config print(GPT2Config())Some higher level hyperparameters are still not included here (e.g. "epochs"). These can be set explicitly as arguments when running the CLI
run_language_modeling.py; otherwise, the default values are used.You can find the hyperparameters and their default values at the beginning of the
mainfunction in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.Hope this helps!
Thanks a lot @enzoampil! Do you know what hyper-parameters to get the result: "This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches a score of ~20 perplexity once fine-tuned on the dataset." ?
@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.
You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.
For reference, this is the code snippet that fine-tunes gpt2 with the default hyperparameters.
export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw
python run_language_modeling.py \
--output_dir=output \
--model_type=gpt2 \
--model_name_or_path=gpt2 \
--do_train \
--train_data_file=$TRAIN_FILE \
--do_eval \
--eval_data_file=$TEST_FILE
@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.
You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.
For reference, this is the code snippet that fine-tunes
gpt2with the default hyperparameters.export TRAIN_FILE=/path/to/dataset/wiki.train.raw export TEST_FILE=/path/to/dataset/wiki.test.raw python run_language_modeling.py \ --output_dir=output \ --model_type=gpt2 \ --model_name_or_path=gpt2 \ --do_train \ --train_data_file=$TRAIN_FILE \ --do_eval \ --eval_data_file=$TEST_FILE
I got GPU memory error with k80 on this, what's the batch_size and how can I configure?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.
You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.
For reference, this is the code snippet that fine-tunes
gpt2with the default hyperparameters.export TRAIN_FILE=/path/to/dataset/wiki.train.raw export TEST_FILE=/path/to/dataset/wiki.test.raw python run_language_modeling.py \ --output_dir=output \ --model_type=gpt2 \ --model_name_or_path=gpt2 \ --do_train \ --train_data_file=$TRAIN_FILE \ --do_eval \ --eval_data_file=$TEST_FILEI got GPU memory error with k80 on this, what's the batch_size and how can I configure?
You can use a per_device_train_batch_size=1, worked for me on a K80
Most helpful comment
@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.
For reference, this is the code snippet that fine-tunes
gpt2with the default hyperparameters.