Thank you for the great implementation.
I have observed that when I try to finetune a model loaded from one of the saved checkpoints, the training error goes back to it's initial levels.
I have tried this by changing the argument 'weights'. Could you please point towards what I am doing wrong.
Can you share the commands you're running? And what do you mean the training error goes back to initial levels? Do you mean the loss values?
Yes, the loss values are returning back to initial levels.
And, I have just changed the 'weights' argument from 'imagenet' to the path of the checkpoint.
I've done that a few times as well, I don't see the same behaviour as what you're describing. I do see an increase in the loss when I restart the training procedure, but that is because when you initialize the weights like this, it doesn't restore the training state. Therefore it starts with an uninitialised training state, which is probably suboptimal for the phase the training process is in, therefore it's taking illogical updates during the backpropagation.
So, the change in weights due to the illogical updates would reduce if I use a smaller learning rate?
The issue I am facing is quite similar to the one described here, https://github.com/keras-team/keras/issues/2378.
Thank you for your time.
That's one option, another would be to use load_model instead. The saved model should have the state of the optimizer saved in it already, so using load_model would also load the optimizers' state.
Thank You.
@hgaiser agree with @chaitons - thank you for your implementation!
With regards to the Transfer Learning: looking at the code, we are starting with the ImageNet pre-trained weights by default, correct? and if so, what are the layers which we are freezing, if any, during the train.py execution? I see in the model summary, that there are 106,240 non-trainable parameters (when running the resnet50 backbone), but what part of the RetinaNet model do these correspond to? Could you share some suggestions or point at the code how would you recommend doing Transfer Learning with resnet50 backbone? Thank you in advance, much appreciated!
@hgaiser agree with @chaitons - thank you for your implementation!
With regards to the Transfer Learning: looking at the code, we are starting with the ImageNet pre-trained weights by default, correct? and if so, what are the layers which we are freezing, if any, during the train.py execution? I see in the model summary, that there are 106,240 non-trainable parameters (when running the resnet50 backbone), but what part of the RetinaNet model do these correspond to? Could you share some suggestions or point at the code how would you recommend doing Transfer Learning with resnet50 backbone? Thank you in advance, much appreciated!
The non-trainable parameters are probably from BatchNormalization. They are the only parameters that are frozen during training. Currently there is no option to disable training on the backbone, but you could easily copy the code from resnet and add something like:
for l in resnet.layers:
l.trainable = False
I believe https://github.com/fizyr/keras-retinanet/pull/241 will add some functionality that you might find interesting. Using that --snapshot argument you can resume training (including resuming the optimizer state). An unfortunate requirement is that the snapshots be created after that PR, and that multi-gpu is disabled. I'm looking into a fix for that.
Also, presumably https://github.com/fizyr/keras-retinanet/pull/240 was causing some issues in your case as well. That should be fixed now :)
@hgaiser great, thanks a lot! #240 is really an important thing which I am going to try out right away - so far I could not make the network train well, despite having a quite strong GPU working on it for hours - this could have been the reason! I will let you guys know the results.
Also thanks for the Transfer Learning reply, one follow up Q: So the current implementation uses the adam with reduced from defaults lr and clipnorm in order to do not overwhelm the imagenet default weights right away... correct? but say you decided to do a Transfer Learning, preserving the imagenet's pretrained feature-detector capabilities, and freeze a portion of the ResNet50 backbone for this. Depending on how new classes vary from the ImageNet (similar - somewhat different - very different), what portion of the backbone would you freeze?
Also thanks for the Transfer Learning reply, one follow up Q: So the current implementation uses the adam with reduced from defaults lr and clipnorm in order to do not overwhelm the imagenet default weights right away... correct?
Correct, empirically @yhenon found that these settings work quite well. Higher learning rates or higher clipnorm values caused unstable training.
but say you decided to do a Transfer Learning, preserving the imagenet's pretrained feature-detector capabilities, and freeze a portion of the ResNet50 backbone for this. Depending on how new classes vary from the ImageNet (similar - somewhat different - very different), what portion of the backbone would you freeze?
I believe it's normal to freeze all layers of ResNet50. If your application significantly differs from ImageNet you could also try to freeze only the first two or three stages, so that the higher level features (is it a face, a dog, a car, etc.) can be re-learned. To be honest, I usually let it train those layers anyway.
I'm assuming this issue is fixed with the two PRs. If it isn't, feel free to re-open it.
@hgaiser thank you!
@hgaiser , immediately got a better result after the #240 fix, got some true positives even after 1st epoch at --batch-size 8 and --step 1250! good job!
Most helpful comment
@hgaiser , immediately got a better result after the #240 fix, got some true positives even after 1st epoch at --batch-size 8 and --step 1250! good job!