I run the finetuning as instructed in the example "LM Fine-tuning"
python run_lm_finetuning.py \
--bert_model bert-base-uncased \ .
--output_dir models \
...
As a result the fine-tuned model is now in models/pytorch_model.bin
But how do I use it to classify? The example doesn't mention that.
I don't find any parameter to feed the finetuned model to be used.
I can run classifiying with only pretrained model as this:
export GLUE_DIR=~/git/GLUE/glue_data/
python run_classifier.py \
--task_name SST-2 \
--do_train \
--do_eval \
--do_lower_case \
--data_dir ~/git/x/data/input-sst2/ \
--bert_model bert-base-uncased \
--max_seq_length 128 \
--train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir out/
The instruction on google-bert says
"Once you have trained your classifier you can use it in inference mode by using the --do_predict=true command."
If I try that, it gives:
"run_classifier.py: error: unrecognized arguments: --do_predict=true"
maybe you can take a look at the "cache_dir" argument. For the run_classifier.py file, it is located at line 498
I tried with --cache_dir , giving the fine-tunings output directory as cache_dir.
I added these 2 files to the directory: bert_config.json and vocab.txt from the original bert_basic_uncased
(finetune out folder has finetuned model pytorch_model.bin file, which I am not sure if it used at all)
It gave exactly same accuracy (by 16 digits) as direct train/eval run_classifier.py directly with bert_basic_uncased. It would seem that with --cache_dir, it saves the original bert_base_uncased to that given --cache_dir. I am not sure if there is another difference.
(I am runnin a 3-label classifier, for which I used the SST-2 from GLUE as basis, saved data in same format and added 3rd label to code in run_classifier.py
python run_classifier.py \
--task_name SST-2 \
--do_train \
--do_eval \
--do_lower_case \
--data_dir ~/git/x/data/input-sst2/ \
--bert_model bert-base-uncased \
--cache_dir out_finetune_140/ \
--max_seq_length 140 \
--train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir out_testcache/
I can get classifier running on finetuned model by replacing bert-base-uncased with output folder of fine-tuning: --bert_model out_finetune_140/ \ (and adding bert_config.json , vocab.txt to that folder)
But as a result, eval_accuracy went down from 0.918 to 0.916.
(I wonder is it correct to use vocab.txt and bert_config.json from original bert_base_uncased, or would fine-tuned model need updated ones?)
step1:
python run_lm_finetuning.py \
--bert_model bert-base-uncased \
--do_lower_case \
--do_train \
--train_file ~/git/xdata/lm-file.txt \
--output_dir out_finetune_140/ \
--num_train_epochs 2.0 \
--learning_rate 3e-5 \
--train_batch_size 16 \
--max_seq_length 140 \
step2:
python run_classifier.py \
--task_name SST-2 \
--do_train \
--do_eval \
--do_lower_case \
--data_dir ~/git/x/data/input-sst2/ \
--bert_model out_finetune_140/ \
--max_seq_length 140 \
--train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir out_finetune+class_140/
Hi, I think I find a workaround for this issue. You can first load the original model, and then insert this line into your python file (for example, after line 607 and 610 in run_classifier.py):
model.load_state_dict(torch.load("output_dir/pytorch_model.bin"))
then the model will be your customized fine-tuned model. And there is no need to change anything else (for example, the config file or vocab file)
I would suggest that you add a separate logic to load your fine-tuned model and perform prediction. Your code will be very similar to the eval but you won't need (actually won't have access to) labels during prediction and hence no need of the accuracy code in eval etc. Simply collect your predictions in a list and write to a file called "pred_results.txt".
I added some new flags ("do_pred" and "model_path"), modified the eval logic little bit to ignore labels, and wrote outputs to a file. Things are working for me.
Hi LeenaShekhar
Would you mind showing the code you wrote to perform predictions of a trained model?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.