The Padam updater was recently described here: https://arxiv.org/pdf/1806.06763.pdf
It is an extension of Adam/AMSGrad that claims improved performance (accuracy) like SGD while still maintaining high convergence rates of Adam/AMSGrad. Mathematically, it's basically a blending of SGD and AMSGrad.
Implementing this isn't a high priority for the core team. If anyone wants to tackle this, there are configuration and implementation classes here (we'll need one of each for Padam):
https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config
https://github.com/deeplearning4j/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning
For some extra reference: https://github.com/deeplearning4j/deeplearning4j/issues/5843#issuecomment-414965829 and some few comments downstream.
I'd like to give this a shot! As it's my first time contributing to DL4J, do you have any advice/suggestions for me?
@AlexDBlack
Are we sure about creating new classes for Padam?
As it's just a slight modification over Amsgrad, will it not be better if we provide support for Padam via AmsGrad by simply adding additional fields wherever required?
We can extend AmsGrad, of course. That's what OOP is for :)
Yeah, I'm ok with a separate class (extending AMSGrad if that makes sense).
Though we might end up with a little redundancy, I think I'd prefer a dedicated class for it for usability reasons - i.e., it'll be easier to find as a dedicated class rather than as an option in AMSGrad.
@AlexDBlack
Hi, I added the required classes. It is safe to merge:
Unified Commit
@saudet
I did not add the predicate for the range of param, instead logged a warning. Will need your help with predicates.
Haven't requested a pull as I haven't tested the code yet. Couldn't build the project (tried a lot of things) using IntelliJ on Macos. Can someone point me to a thorough readme/guide for the same?
@achalagarwal It's name is "Preconditions" actually, just do something like this:
Preconditions.checkArgument(bias != null, "LayerNorm: Use constructor without bias argument if bias is null / not available.");
Use Maven on the command line with mvn clean install -Dmaven.test.skip before trying it in an IDE.
The build was successful but I had to skip a couple of projects due to network issues (HTTP requests failed)
On Ubuntu:
mvn clean install -Dmaven.test.skip -pl '!:deeplearning4j-dataimport-solrj, !:deeplearning4j-modelexport-solr
@AlexDBlack
Now, how do you suggest I validate the correctness of Padam? Do you want me to build a model and replicate results from a publication? This will take a lot of time. Are there relevant tests for the linalg/learning modules? I could not find any.
cc: @saudet
@achalagarwal we have updater tests here, adding to that would be good:
https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/nn/updater/TestUpdaters.java
We'll carefully review the implementation too once you've opened a pull request. That should be good enough I think.