Machinelearning: ImageClassifier with ExponentialLRDecay: metrics not updated/calculated during validation

Created on 7 Feb 2020  路  7Comments  路  Source: dotnet/machinelearning

System information

  • OS version: Windows 10 Pro 18363
  • .NET Version: Core 2.1
  • Platform: x64
  • ML.NET version: 0.15-preview

Issue

  • What did you do?
    I am trying to train an image classifier that makes use of ExponentialLRDecay.
    I would like to see the metrics for training and validation for each epoch.

    var options = new ImageClassificationTrainer.Options()
    {
    LearningRateScheduler = new ExponentialLRDecay(),
    ValidationSetFraction = 0.1f,
    MetricsCallback = (metrics) => Console.WriteLine(metrics  + $"   CrossEntropy: {metrics.Train.CrossEntropy}, LearningRate: {metrics.Train.LearningRate})
    };
    var model = mlContext.MulticlassClassification.Trainers.ImageClassification(options);
    

    (I added + $" CrossEntropy: {metrics.Train.CrossEntropy}, LearningRate: {metrics.Train.LearningRate}", because the crossentropy and learning rate are not printed by default for the validation set.)

  • What happened?

    • The learning rate is not updated for the validation set (seen on every even row in the image).
    • The cross-entropy is not calculated for the validation set (seen on every even row in the image).
    • The learning rate is not updated after the second epoch like the default value of 2 for numEpochsPerDecay in ExponentialLRDecay(), but after the first instead (seen on the third row in the image). After that, the learning rate is correctly updated every 2 epochs. I'm not sure if this is the expected behavior.

    image

  • What did you expect?

    • I expected a decaying learning rate in the validation step, equal to the one in the training step.
    • I expected the cross-entropy to be calculated in the validation step. The model with the highest Accuracy and lowest CrossEntropy are the best, so if 2 models perform equally well in terms of accuracy, the one with the lowest cross-entropy on the validation set should be picked.
    • Further, I expected the learning rate to start decaying after the 2nd epoch.
P1 bug image

All 7 comments

After looking into some blog posts, I found that my point about the learning rate decaying too soon might be invalid.
Mostly, the learning rate indeed decays at epoch n-1 when starting to count epochs from 0 and numEpochsPerDecay=n.
So epoch 0 has the original learning rate and epoch 1 is already decayed when numEpochsPerDecay=2, or epoch 0-8 have the original learning rate and epoch 9 is decayed first when numEpochsPerDecay=10.
This is not really intuitive to me, but it might be correct!

I created a minimal working example based on the DeepLearning_ImageClassification_Training example, that reproduces the bug here.
I updated to version 1.5.0, changed the ImageClassificationTrainer.Options, and added a custom MetricsCallback.

  • The initial learning rate is still returned for the validation set. (In version 1.4.0, it did return the correct learning rate.)
  • The cross-entropy is still not calculated for the validation set.
  • The learning rate might decay as intended. (In version 1.4.0, it did decay after n epochs, instead of n-1, but that might not have been as intended.)

Hi @gartangh ,

Thank you so much for bringing this issue to our attention, and providing a good repro. I successfully replicated your repro with our local ML.NET build and confirmed the issue.

I figured out that the problem with cross entropy during validation is that it simply isn't being updated in validation metrics, as it is missing in the below code snippet:
https://github.com/dotnet/machinelearning/blob/bb13d629000c218136e741b643767cf45ae12fc4/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs#L1043-L1048

The reason why learning rate is not decreasing during validation is because learning rate schedulers, which ExponentialLRDecay is one, are not currently being used in validation training. The null in the parameters below for TrainAndEvaluateClassificationLayerCore is where a learning rate scheduler would go:
https://github.com/dotnet/machinelearning/blob/bb13d629000c218136e741b643767cf45ae12fc4/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs#L1031-L1041

I don't see a reason why we don't support learning rate schedulers during validation, and as a result believe the learning rate during validation should also be decreasing due to the ExponentialLRDecay being used, and not remain stable (which it currently is at 0.01). Is my intuition correct @antoniovs1029 @harishsk ?

I'll be making a PR to address this issue and add tests to verify the changes soon.

In addition, I see that you add to manually log the learning rate and cross-entropy to see this bug. I see that these metrics are not reported during validation by default:
https://github.com/dotnet/machinelearning/blob/bb13d629000c218136e741b643767cf45ae12fc4/src/Microsoft.ML.Vision/ImageClassificationTrainer.cs#L170-L179
I wonder if this gives a hint on whether or not learning rates should be updated during validation with ExponentialLRDecay. I'm in favor of printing these metrics during validation as well.

Hi @mstfbl ,

You're welcome. I'm happy that I could be of help.

My guess is that the main part of this issue could be solved by copying that part of the code from training to validation?

About the learning rate during validation: one option would be to remove the LearningRate field from ImageClassificationMetrics.Train during validation, as it is not actually used. It was just very confusing to me that for training, the decay was visible, while for validation, it remained fixed at the initial value.
The other option would be as you say, but then there must be made sure that the learning rate does not decay twice as fast due to the extra usage during validation.

About your addition: if the cross-entropy is correctly updated, it must certainly be reported as well. It would be really awesome if the output signature would be exactly the same, i.e. also reporting the learning rate.
Then, the ToString method could look like this:

public override string ToString() 
{ 
    return $"Phase: Training, Dataset used: {DatasetUsed.ToString(),10}, Batch Processed Count: {BatchProcessedCount,3}, Learning Rate: {LearningRate,10} " + 
    $"Epoch: {Epoch,3}, Accuracy: {Accuracy,10}, Cross-Entropy: {CrossEntropy,10}"; 
} 

Hi @gartangh,

Thank you once again for notifying us about this issue. My now merged PR #5255 added cross entropy metric support for validation training. I did not add learning decay support for validation, as during validation there is no real training being done, so learning rate decay is not applicable here.

Hi @mstfbl

I just verified your changes by building the master branch locally.
This was the output:
image
This looks really good!
The cross-entropy and learning rate decay seem to work correctly.
Printing the learning rate during validation would probably be too confusing, so I understand why it is not used.

Thank you very much for your work.

Of course, anytime @gartangh ! Please feel free to make more issues/feature requests if you see any in the future, and I hope you continue to enjoy using ML.NET!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ddobric picture ddobric  路  4Comments

aslotte picture aslotte  路  3Comments

rebecca-burwei picture rebecca-burwei  路  3Comments

lionelquirynen picture lionelquirynen  路  3Comments

sethreidnz picture sethreidnz  路  3Comments