Transformers: Share more details on fine-tuning GPT-2 on WikiText-2 ?

Created on 17 Apr 2020 · 6Comments · Source: huggingface/transformers

Hello! Regarding https://github.com/huggingface/transformers/tree/master/examples#gpt-2gpt-and-causal-language-modeling, would you mind sharing what hyper-parameters you use to get this result ? How many epochs, what's the batch size? etc...

wontfix

Source

xihui-wu

Most helpful comment

@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

For reference, this is the code snippet that fine-tunes gpt2 with the default hyperparameters.

export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_language_modeling.py \
    --output_dir=output \
    --model_type=gpt2 \
    --model_name_or_path=gpt2 \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE

enzoampil on 20 Apr 2020

👍3

All 6 comments

@xihui-wu To get the hyperparameters specific to the model (in this case gpt2), you can check the config file of gpt2 with the code below:

from transformers import GPT2Config
print(GPT2Config())

Some higher level hyperparameters are still not included here (e.g. "epochs"). These can be set explicitly as arguments when running the CLI run_language_modeling.py; otherwise, the default values are used.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

Hope this helps!

enzoampil on 18 Apr 2020

@xihui-wu To get the hyperparameters specific to the model (in this case gpt2), you can check the config file of gpt2 with the code below:
from transformers import GPT2Config
print(GPT2Config())
Some higher level hyperparameters are still not included here (e.g. "epochs"). These can be set explicitly as arguments when running the CLI run_language_modeling.py; otherwise, the default values are used.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

Hope this helps!

Thanks a lot @enzoampil! Do you know what hyper-parameters to get the result: "This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches a score of ~20 perplexity once fine-tuned on the dataset." ?

xihui-wu on 20 Apr 2020

@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

For reference, this is the code snippet that fine-tunes gpt2 with the default hyperparameters.

export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_language_modeling.py \
    --output_dir=output \
    --model_type=gpt2 \
    --model_name_or_path=gpt2 \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE

enzoampil on 20 Apr 2020

👍3

@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

For reference, this is the code snippet that fine-tunes gpt2 with the default hyperparameters.
export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_language_modeling.py \
    --output_dir=output \
    --model_type=gpt2 \
    --model_name_or_path=gpt2 \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE

I got GPU memory error with k80 on this, what's the batch_size and how can I configure?

lyan62 on 18 Jun 2020

👍2

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 18 Aug 2020

@xihui-wu This result comes from running the default training script with no explicitly specified hyperparameters; therefore, the default hyperparameters will apply.

You can find the hyperparameters and their default values at the beginning of the main function in run_language_modeling.py. For example, the default epoch count (num_train_epochs) is 1.

For reference, this is the code snippet that fine-tunes gpt2 with the default hyperparameters.
export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_language_modeling.py \
    --output_dir=output \
    --model_type=gpt2 \
    --model_name_or_path=gpt2 \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE
I got GPU memory error with k80 on this, what's the batch_size and how can I configure?

You can use a per_device_train_batch_size=1, worked for me on a K80