I'm trying to evaluate GPT-2 model during fine tuning process, and I'm able to calculate the loss at each epoch, but do not know how accuracy can be calculated or how to give a score to the model. Would like to get some suggestions as help.
A link to original question on Stack Overflow: https://stackoverflow.com/questions/60483956/how-to-perform-accuracy-testing-on-text-generation-task
A common way of evaluating LMs is to measure their Perplexity.
Say you want to finetune GPT2 on your dataset D.
Define train, val and test datasets (maybe something around 75%, 10%, 15%.
Measure the perplexity on train and val after each epoch. Compare train and eval curves for overfitting.
There are a ton of other evaluation measures that might be better for your task - Google will be your best friend :-)
Most helpful comment
A common way of evaluating LMs is to measure their Perplexity.
Say you want to finetune GPT2 on your dataset D.
Define train, val and test datasets (maybe something around 75%, 10%, 15%.
Measure the perplexity on train and val after each epoch. Compare train and eval curves for overfitting.
There are a ton of other evaluation measures that might be better for your task - Google will be your best friend :-)