Allennlp: Making gradients flow after loading archive(trained model) in evaluation phase

Created on 1 Jul 2019 · 5Comments · Source: allenai/allennlp

Hi, I'm currently working on BERT Language model. I trained my model based on BERT and saved my model as an archive. Then in the evaluation(prediction) phase, I loaded my trained model with "allennlp.models.archival.load_archive" and predicted the test set. But what I want here is that the gradient flows from the evaluation dataset and weights are finetuned accordingly. (such as from Mikolov's "Recurrent Neural Network Based Language Model"). This question might be silly, but I wonder if there is a way that I can make gradient flow to the archive model(trained model)? (as in the prediction phase, not as in training phase)

Source

JJumSSu

All 5 comments

@JJumSSu I'm a bit confused on exactly what you are asking. I understand your a writing a model that uses BERT, but could you restate your question and why you want to do this?

schmmd on 2 Jul 2019

👍1

@schmmd
Sorry for the confusing question. I am trying to train the language model. I trained a model and is stored in archive form(tar.gz) currently. Then, we usually use a predictor to evaluate our model on the test dataset. What I want here is that besides just evaluating on the test set, I want my model to be "retrained" as well from the test set. (This method was first proposed by Mikolov in his paper "Recurrent Neural Network based Language Model" and is known as dynamic evaluation) So once the perplexity for an instance is measured, the learning will occur on that instance and move on to the next instance. (ex. as for "I like you", the perplexity for the target word "like" is computed and also gradient will flow from the word "like" and move on to the next word "you".)

So basically, my question is that can I retrain my model dynamically as it is evaluated at the same time?

JJumSSu on 2 Jul 2019

Thanks--I understand your question but I don't know the answer. I'll try to find someone on the team who does.

It's certainly possible but I don't think we have anything specifically built to support that use case.

schmmd on 3 Jul 2019

👍1

This is a reasonable but atypical thing to do. E.g., if you're trying to compare to prior work that doesn't do this, it may not be a fair comparison, and it makes your evaluation dependent on the order in which you see your test instances, which is a little weird.

This isn't something that we're likely to support in the library, but you can do it yourself without too much trouble. You'll probably need to call self.train() inside of model.forward(), because we typically call model.eval() before doing evaluation. Except you'll want to keep track of whether you're in evaluation mode or not, so you can just use the perplexity loss for continuing training during evaluation. You'll also need to modify the evaluation code, to compute gradients and update parameters.